Hi @kushkhosla, before looking at the scaling issue, I decided to try benchmarking the different simulators. I used the following IPython script:
import pennylane as qml
from pennylane import numpy as np
ENCODING_SIZE = 10
NUM_QUBITS = 10
def circuit(x, thetas):
for i in range(ENCODING_SIZE):
qml.RX(x[i], wires=i)
for i in range(NUM_QUBITS - 1):
qml.CNOT(wires=[i, i + 1])
for i in range(NUM_QUBITS):
qml.RX(thetas[i], wires=i)
for i in range(NUM_QUBITS, 2 * NUM_QUBITS):
qml.RY(thetas[i], wires=(i - NUM_QUBITS))
return tuple(qml.expval.PauliZ(wires=i) for i in range(NUM_QUBITS))
x = np.random.random([ENCODING_SIZE])
thetas = np.random.random(2 * NUM_QUBITS)
devices = [
qml.device("default.qubit", wires=NUM_QUBITS),
qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS),
qml.device("forest.wavefunction", wires=NUM_QUBITS),
qml.device("forest.qvm", device="{}q-qvm".format(NUM_QUBITS)),
qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS)),
qml.device("qiskit.basicaer", wires=NUM_QUBITS),
qml.device("qiskit.aer", wires=NUM_QUBITS),
qml.device("projectq.simulator", wires=NUM_QUBITS),
# qml.device("microsoft.QubitSimulator", wires=NUM_QUBITS),
]
print("Encoding size: {}".format(ENCODING_SIZE))
print("Number of qubits: {}".format(NUM_QUBITS))
for dev in devices:
print("\nDevice: {}".format(dev.name))
qnode = qml.QNode(circuit, dev)
%timeit qnode(x, thetas)
Running this script with ipython timing.ipy
, gives the following results:
Encoding size: 10
Number of qubits: 10
Device: Default qubit PennyLane plugin
2.35 s ± 236 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Device: pyQVM NumpyWavefunction Simulator Device
293 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Device: Forest Wavefunction Simulator Device
350 ms ± 65.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Device: Forest QVM Device (10q-qvm, 1024 shots)
5.6 s ± 92.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Device: Forest pyQVM Device (10q-pyqvm, 1024 shots)
6.71 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Device: Qiskit Basic Aer (1024 shots)
179 ms ± 4.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Device: Qiskit Aer (1024 shots)
162 ms ± 5.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Device: ProjectQ PennyLane plugin (1024 shots)
60.5 ms ± 26.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
(NB: I modified the print statement above to give more device information. The devices that specify shots
are hardware simulators, so increasing the number of shots increases accuracy, but also increases runtime.)
A couple of things to note:
-
The
default.qubit
device is quite slow. This is not intentional, but at the same time, thedefault.qubit
device is not meant for production code — it is a reference plugin designed to show developers how a PennyLane plugin is coded. -
We recommend instead that you use a plugin for an external high-performance qubit simulator. From the rough benchmarking above, it appears that for 10 qubits, the Rigetti Forest pyQVM simulator destroys the competition
So you should see some significant improvements using
qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS), shots=1024)
Alternatively, you can use the NumPy wavefunction simulator for exact expectation values:
qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS)
Both of these devices can be installed via
git clone https://github.com/rigetti/pennylane-forest cd pennylane-forest pip install -e .
In terms of the scaling, note that the above times t are for a single circuit evaluation. To determine the gradient for M free parameters, PennyLane must query the quantum device 2M times; so the expected time taken per optimization step should be \sim 2Mt.
For more details on why this is the case, see @nathan’s great answer here:
Note that we are working on alleviating the optimization runtime scaling! This will likely be through a combination of:
-
Extending PennyLane to perform the gradient computations for each parameter in parallel, and
-
Implementing efficiency gains that can be achieved assuming the underlying device is a simulator (and not hardware).