Lightning.qubit extremely slow compared to default.qubit with VQE

Hi again!

In addition to the different results with PennyLane 0.26.0 and 0.27.0 that I mentioned in another post (Different results with VQE on versions 0.26.0 and 0.27.0) I have noticed a huge drop in performance when using lightning.qubit instead of default.qubit with VQE.

My code is the following (the geometry values are just for testing):

import pennylane as qml
from pennylane import numpy as np

seed = 1234
np.random.seed(seed) 

symbols  = ['H', 'H', 'H']
geometry = np.array([-0.0399, -0.0038, 0.0, 1.5780, 0.8540, 0.0, 2.7909, -0.5159, 0.0], requires_grad = False)
electrons = 2
charge = 1

%time H, qubits = qml.qchem.molecular_hamiltonian(symbols, geometry, charge=charge)

hf_state = qml.qchem.hf_state(electrons, qubits)

singles, doubles = qml.qchem.excitations(electrons, qubits)

s_wires, d_wires = qml.qchem.excitations_to_wires(singles, doubles)

dev = qml.device("default.qubit", wires=qubits)

@qml.qnode(dev)
def circuit(params, wires, s_wires, d_wires, hf_state):
    qml.UCCSD(params, wires, s_wires, d_wires, hf_state)
    return qml.expval(H)

params = np.random.random(len(singles) + len(doubles))
print(params)

optimizer = qml.GradientDescentOptimizer(stepsize=0.1)

for n in range(11):
    %time params, energy = optimizer.step_and_cost(circuit, params,wires=range(qubits), s_wires=s_wires, d_wires=d_wires, hf_state=hf_state)
    if n % 2 == 0:
        print("step = {:},  E = {:.8f} Ha".format(n, energy))

I obtain the following results:

CPU times: user 690 ms, sys: 10.8 ms, total: 701 ms
Wall time: 1.97 s
[0.78535858 0.77997581 0.27259261 0.27646426 0.80187218 0.95813935
0.87593263 0.35781727]
CPU times: user 572 ms, sys: 2.63 ms, total: 575 ms
Wall time: 1.15 s
step = 0, E = -0.73428150 Ha
CPU times: user 672 ms, sys: 6.1 ms, total: 678 ms
Wall time: 1.37 s
CPU times: user 566 ms, sys: 1.18 ms, total: 567 ms
Wall time: 1.13 s
step = 2, E = -0.76374619 Ha
CPU times: user 664 ms, sys: 18.3 ms, total: 682 ms
Wall time: 1.37 s
CPU times: user 672 ms, sys: 0 ns, total: 672 ms
Wall time: 1.36 s
step = 4, E = -0.79437752 Ha
CPU times: user 564 ms, sys: 0 ns, total: 564 ms
Wall time: 1.15 s
CPU times: user 676 ms, sys: 0 ns, total: 676 ms
Wall time: 1.36 s
step = 6, E = -0.82591780 Ha
CPU times: user 566 ms, sys: 0 ns, total: 566 ms
Wall time: 728 ms
CPU times: user 689 ms, sys: 20.8 ms, total: 710 ms
Wall time: 1.26 s
step = 8, E = -0.85804157 Ha
CPU times: user 581 ms, sys: 1.7 ms, total: 583 ms
Wall time: 1.41 s
CPU times: user 692 ms, sys: 5.81 ms, total: 697 ms
Wall time: 1.68 s
step = 10, E = -0.89036698 Ha

If I use lightning.qubit instead of default.qubit, I obtain the following:

CPU times: user 655 ms, sys: 34.3 ms, total: 690 ms
Wall time: 2.13 s
[0.78535858 0.77997581 0.27259261 0.27646426 0.80187218 0.95813935
0.87593263 0.35781727]
CPU times: user 1min 8s, sys: 8.05 s, total: 1min 16s
Wall time: 14.4 s
step = 0, E = -0.73428150 Ha
CPU times: user 1min 9s, sys: 7.19 s, total: 1min 16s
Wall time: 9.82 s
CPU times: user 1min 24s, sys: 11.2 s, total: 1min 36s
Wall time: 11.4 s
step = 2, E = -0.76374619 Ha
CPU times: user 1min 33s, sys: 11 s, total: 1min 44s
Wall time: 10.1 s
CPU times: user 1min 17s, sys: 14.1 s, total: 1min 31s
Wall time: 8.19 s
step = 4, E = -0.79437752 Ha
CPU times: user 1min 28s, sys: 11.3 s, total: 1min 39s
Wall time: 8.99 s
CPU times: user 1min 38s, sys: 12.9 s, total: 1min 51s
Wall time: 9.71 s
step = 6, E = -0.82591780 Ha
CPU times: user 1min 31s, sys: 15.8 s, total: 1min 47s
Wall time: 8.77 s
CPU times: user 1min 40s, sys: 21.1 s, total: 2min 1s
Wall time: 9.96 s
step = 8, E = -0.85804157 Ha
CPU times: user 1min 31s, sys: 14.2 s, total: 1min 45s
Wall time: 8.78 s
CPU times: user 1min 18s, sys: 11 s, total: 1min 29s
Wall time: 8.5 s
step = 10, E = -0.89036698 Ha

The wall time is much higher with lightning.qubit than with default.qubit. But what is even more surprising to me is that lightning.qubit uses several cores as it can be seen seen by the total time, which is close to 2 minutes (I also checked with “top” that, in fact, the process was using multiple cores). So what with default.qubit takes less than 1s of total time, with lightning.qubit takes up to 2 minutes. This happens to me both on a multi-core server and on Google Colab (although the loss in performance seems to be lower with Google Colab, where I think that only one core is used) and both with PennyLane 0.26.0 and with PennyLane 0.27.0.

Am I doing something wrong?

Thanks in advance!

Hello combarro,

Wow, you’re finding a lot of issues today! This seems like a job for the performance team to look into. Thank you for all your useful feedback!

Thanks! Looking forward to your updates.