Hi,

I’ve been looking at the built-in Gradient Descent optimizer and some of its variations (Adam, Adagrad etc).

It is not clear to me how the gradient of the circuit is computed.

Suppose I want to use a real device. Is gradient computed using the parameter shift rule or finite difference? (my circuit is a bunch of RZ and Ising Coupling XX gates).

In either case, it seems that for **each** parameter of a circuit there would be 2 queries to a device? Is this correct?

My second question is about the cost function (ExpvalCost) from which the gradient is computed. Let’s say I want to compute the expectation <f(theta)| H | f(theta)> where H is a Hamiltonian. The matrix representing the Hamiltonian grows exponentially with the number of qubits, so does the number of terms in <f(theta)| H | f(theta)> if expressed as an inner product of vectors.

Does ExpvalCost keeps track of exponentially many terms for differentiation? If not, could you give a simple example of how it works?

Thank you!