Speeding up grad computation

Hi @kushkhosla, before looking at the scaling issue, I decided to try benchmarking the different simulators. I used the following IPython script:

import pennylane as qml
from pennylane import numpy as np

ENCODING_SIZE = 10
NUM_QUBITS = 10

def circuit(x, thetas):
    for i in range(ENCODING_SIZE):
        qml.RX(x[i], wires=i)
    for i in range(NUM_QUBITS - 1):
        qml.CNOT(wires=[i, i + 1])
    for i in range(NUM_QUBITS):
        qml.RX(thetas[i], wires=i)
    for i in range(NUM_QUBITS, 2 * NUM_QUBITS):
        qml.RY(thetas[i], wires=(i - NUM_QUBITS))
    return tuple(qml.expval.PauliZ(wires=i) for i in range(NUM_QUBITS))


x = np.random.random([ENCODING_SIZE])
thetas = np.random.random(2 * NUM_QUBITS)

devices = [
    qml.device("default.qubit", wires=NUM_QUBITS),
    qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS),
    qml.device("forest.wavefunction", wires=NUM_QUBITS),
    qml.device("forest.qvm", device="{}q-qvm".format(NUM_QUBITS)),
    qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS)),
    qml.device("qiskit.basicaer", wires=NUM_QUBITS),
    qml.device("qiskit.aer", wires=NUM_QUBITS),
    qml.device("projectq.simulator", wires=NUM_QUBITS),
    # qml.device("microsoft.QubitSimulator", wires=NUM_QUBITS),
]

print("Encoding size: {}".format(ENCODING_SIZE))
print("Number of qubits: {}".format(NUM_QUBITS))

for dev in devices:
    print("\nDevice: {}".format(dev.name))
    qnode = qml.QNode(circuit, dev)
    %timeit qnode(x, thetas)

Running this script with ipython timing.ipy, gives the following results:

Encoding size: 10
Number of qubits: 10

Device: Default qubit PennyLane plugin
2.35 s ± 236 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: pyQVM NumpyWavefunction Simulator Device
293 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest Wavefunction Simulator Device
350 ms ± 65.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest QVM Device (10q-qvm, 1024 shots)
5.6 s ± 92.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Forest pyQVM Device (10q-pyqvm, 1024 shots)
6.71 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Device: Qiskit Basic Aer (1024 shots)
179 ms ± 4.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Device: Qiskit Aer (1024 shots)
162 ms ± 5.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Device: ProjectQ PennyLane plugin (1024 shots)
60.5 ms ± 26.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(NB: I modified the print statement above to give more device information. The devices that specify shots are hardware simulators, so increasing the number of shots increases accuracy, but also increases runtime.)

A couple of things to note:

  • The default.qubit device is quite slow. This is not intentional, but at the same time, the default.qubit device is not meant for production code — it is a reference plugin designed to show developers how a PennyLane plugin is coded.

  • We recommend instead that you use a plugin for an external high-performance qubit simulator. From the rough benchmarking above, it appears that for 10 qubits, the Rigetti Forest pyQVM simulator destroys the competition :slightly_smiling_face: So you should see some significant improvements using

     qml.device("forest.qvm", device="{}q-pyqvm".format(NUM_QUBITS), shots=1024)
    

    Alternatively, you can use the NumPy wavefunction simulator for exact expectation values:

     qml.device("forest.numpy_wavefunction", wires=NUM_QUBITS)
    

    Both of these devices can be installed via

     git clone https://github.com/rigetti/pennylane-forest
     cd pennylane-forest
     pip install -e .
    

In terms of the scaling, note that the above times t are for a single circuit evaluation. To determine the gradient for M free parameters, PennyLane must query the quantum device 2M times; so the expected time taken per optimization step should be \sim 2Mt.

For more details on why this is the case, see @nathan’s great answer here:


Note that we are working on alleviating the optimization runtime scaling! This will likely be through a combination of:

  • Extending PennyLane to perform the gradient computations for each parameter in parallel, and

  • Implementing efficiency gains that can be achieved assuming the underlying device is a simulator (and not hardware).

2 Likes