Pennylane optimizers for parameter search

Hi,
I’ve been looking at the built-in Gradient Descent optimizer and some of its variations (Adam, Adagrad etc).

It is not clear to me how the gradient of the circuit is computed.
Suppose I want to use a real device. Is gradient computed using the parameter shift rule or finite difference? (my circuit is a bunch of RZ and Ising Coupling XX gates).

In either case, it seems that for each parameter of a circuit there would be 2 queries to a device? Is this correct?

My second question is about the cost function (ExpvalCost) from which the gradient is computed. Let’s say I want to compute the expectation <f(theta)| H | f(theta)> where H is a Hamiltonian. The matrix representing the Hamiltonian grows exponentially with the number of qubits, so does the number of terms in <f(theta)| H | f(theta)> if expressed as an inner product of vectors.
Does ExpvalCost keeps track of exponentially many terms for differentiation? If not, could you give a simple example of how it works?

Thank you!

Hi @Einar_Gabbassov,

Thanks for the questions! :slightly_smiling_face:

Is gradient computed using the parameter shift rule or finite difference? (my circuit is a bunch of RZ and Ising Coupling XX gates).

On a real device, the parameter-shift rule is used whenever possible, else a fall back to the finite differences method happens. PennyLane checks if the parametrized gates that take trainable parameters have gradients where the parameter-shift rules apply and if that’s the case for all of them, then the parameter-shift rule can be used overall.

For RZ and Ising-XX, that should be fine, provided that qml.RZ and qml.IsingXX (or equivalent operations that are also differentiable) are used.

In either case, it seems that for each parameter of a circuit there would be 2 queries to a device? Is this correct?

Yes, that’s expected. With the parameter-shift rule, we’d have 2*p many quantum circuit evaluations, where p is the number of trainable parameters.

Does ExpvalCost keeps track of exponentially many terms for differentiation? If not, could you give a simple example of how it works?

ExpvalCost will use the qml.Hamiltonian class under the hood, that indeed, keeps track of the Hamiltonian terms as Pauli words and also stores their corresponding coefficients:

\hat{H} = \sum_{I} C_I \hat{P}_I

where P_I is a Pauli-word (e.g., see the introduction of https://arxiv.org/pdf/1907.03358.pdf for this formalism).

The inner product of the Hamiltonian is then built up by using the coefficients and the inner product of each term. PennyLane takes a hardware-agnostic approach here and assumes that computing the inner product of the complete Hamiltonian is not feasible on the target device. Therefore, the matrix of the Hamiltonian is not constructed when computing the inner product using qml.ExpvalCost.

Using the optimize=True option, we can decrease the number of terms. The linked paper describes one of such techniques from the literature.

Hope this helps!

1 Like