Also, is it 2 new circuits per parameter if that parameter only corresponds to one gate? If it isn’t, then that means it scales as 2*number of gates with parameters (applying the parameter shift rule to all gates containing that parameter then applying the product rule)
Speeding up grad computation
Ah, that might be the key. In order to perform arbitrary measurements of Hermitian observable A on a hardware device, this is the process that needs to be undertaken:

Calculate the unitary matrix U, comprised of the orthonormal eigenvectors of A down the columns.

Apply the unitary U^\dagger to the quantum state \psi\rangle, just prior to measurement.

From the resulting PauliZ measurement samples, calculate the probability of measuring each computational basis state \langle i  U^\dagger \psi\rangle^2.

The resulting expectation value is given by:
\langle \psi  A  \psi \rangle = \sum_i \lambda_i \langle i  U^\dagger \psi\rangle^2
where \lambda_i are the eigenvalues of A.
So some of the problem comes from the number of gates. Though the number of parameters is O(10) the number of gates using these parameters is O(100).
The exponential issue with the pyqvm is less obvious to me. Maybe it comes from computing U^(dag)? I appreciate the description above but can’t immediately see why it would slow down the computation so much.
default is default.qubit
pyqvm is forest.qvm
entire is evaluating the whole hamiltonian
piecewise is splitting into pieces and evaluating different circuits for ‘collisions’ between expectations/variables
I also parallelized and wrote my own gradient computation, given the large overhead of the number of gates, so these speeds will be v machine dependent. I still think there is something funny going on with the arbitrary hermitian evals in the pyqvm so will be interested to hear what you guys come out with. Thx for all the info so far
This is my thought as well, however the scaling of np.linalg.eigh
would need to be investigated to see if this is in fact the cause. Another reason could be how PyQuil implements arbitrary unitaries — whether it’s via direct matrix multiplication, or whether they decompose the unitary first (which might have some cost associated with it).
The plots look great! We are always looking to speed up the optimization and work on the performance of PennyLane, so if you have noticed anything while writing your own parallelized gradient computation, please feel free to submit a PR to the PennyLane GitHub repository