Speeding up grad computation

josh · May 27, 2019, 2:15pm

Hi @kushkhosla, before looking at the scaling issue, I decided to try benchmarking the different simulators. I used the following IPython script:

import pennylane as qml
from pennylane import numpy as np

ENCODING_SIZE = 10
NUM_QUBITS = 10

def circuit(x, thetas):
    for i in range(ENCODING_SIZE):
        qml.RX(x[i], wires=i)
    for i in range(NUM_QUBITS - 1):
        qml.CNOT(wires=[i, i + 1])
    for i in range(NUM_QUBITS):
        qml.RX(thetas[i], wires=i)
    for i in range(NUM_QUBITS, 2 * NUM_QUBITS):
        qml.RY(thetas[i], wires=(i - NUM_QUBITS))
    return tuple(qml.expval.PauliZ(wires=i) for i in range(NUM_QUBITS))


x = np.random.random([ENCODING_SIZE])
thetas = np.random.random(2 * NUM_QUBITS)

devices = [
    qml.device("default.qubit", wires=NUM_QUBITS),
    qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS),
    qml.device("forest.wavefunction", wires=NUM_QUBITS),
    qml.device("forest.qvm", device="{}q-qvm".format(NUM_QUBITS)),
    qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS)),
    qml.device("qiskit.basicaer", wires=NUM_QUBITS),
    qml.device("qiskit.aer", wires=NUM_QUBITS),
    qml.device("projectq.simulator", wires=NUM_QUBITS),
    # qml.device("microsoft.QubitSimulator", wires=NUM_QUBITS),
]

print("Encoding size: {}".format(ENCODING_SIZE))
print("Number of qubits: {}".format(NUM_QUBITS))

for dev in devices:
    print("\nDevice: {}".format(dev.name))
    qnode = qml.QNode(circuit, dev)
    %timeit qnode(x, thetas)

Running this script with ipython timing.ipy, gives the following results:

Encoding size: 10
Number of qubits: 10

Device: Default qubit PennyLane plugin
2.35 s ± 236 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: pyQVM NumpyWavefunction Simulator Device
293 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest Wavefunction Simulator Device
350 ms ± 65.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest QVM Device (10q-qvm, 1024 shots)
5.6 s ± 92.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest pyQVM Device (10q-pyqvm, 1024 shots)
6.71 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Device: Qiskit Basic Aer (1024 shots)
179 ms ± 4.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Qiskit Aer (1024 shots)
162 ms ± 5.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Device: ProjectQ PennyLane plugin (1024 shots)
60.5 ms ± 26.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(NB: I modified the print statement above to give more device information. The devices that specify shots are hardware simulators, so increasing the number of shots increases accuracy, but also increases runtime.)

A couple of things to note:

The default.qubit device is quite slow. This is not intentional, but at the same time, the default.qubit device is not meant for production code — it is a reference plugin designed to show developers how a PennyLane plugin is coded.
We recommend instead that you use a plugin for an external high-performance qubit simulator. From the rough benchmarking above, it appears that for 10 qubits, the Rigetti Forest pyQVM simulator destroys the competition So you should see some significant improvements using
```
 qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS), shots=1024)
```
Alternatively, you can use the NumPy wavefunction simulator for exact expectation values:
```
 qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS)
```
Both of these devices can be installed via
```
 git clone https://github.com/rigetti/pennylane-forest
 cd pennylane-forest
 pip install -e .
```

In terms of the scaling, note that the above times t are for a single circuit evaluation. To determine the gradient for M free parameters, PennyLane must query the quantum device 2M times; so the expected time taken per optimization step should be \sim 2Mt.

For more details on why this is the case, see @nathan’s great answer here:

Note that we are working on alleviating the optimization runtime scaling! This will likely be through a combination of:

Extending PennyLane to perform the gradient computations for each parameter in parallel, and
Implementing efficiency gains that can be achieved assuming the underlying device is a simulator (and not hardware).

Topic		Replies	Views
Qualcs device and qml.grad() timings PennyLane Help	6	664	March 10, 2022
Batching Inputs to Quantum Circuit PennyLane Help	14	3348	September 10, 2019
Cost functions: Multiple wire measurements + backends for Hermitians PennyLane Help	8	2012	June 11, 2019
How to use Dask to parallelize QNode computations? PennyLane Help	26	1622	August 10, 2021
TypeError while computing the gradient PennyLane Help	20	3536	May 18, 2021

Speeding up grad computation

Related topics