Parameter-shift rules

Hi Amir,

Yes, the use of the word “analytic” may be a bit confusing. Let me try to clarify.

One method to compute gradients is finite difference. Here we approximate the gradient of a function for example as

\frac{d f(x)}{dx}\approx \frac{f(x+\epsilon)-f(x-\epsilon)}{2\epsilon}.

I believe this is what Josh meant by numerical differentiation. Note that it gives an approximation of the derivative, whose quality depends on \epsilon. That’s why it can be unstable.

Another method is the parameter-shift rule. The idea here is to express the gradient as a linear combination of the function evaluated at two separate points. Consider f(x)=\sin(x). Its derivative, \cos(x), can be written as

\cos(x) = \frac{\sin(x+\pi/4)-\sin(x-\pi/4)}{\sqrt{2}},

so in this case we have

\frac{d f(x)}{dx}=\frac{f(x+\pi/4)-f(x-\pi/4)}{\sqrt{2}}.

Notice that the “shift” from x is not tiny, it’s \pi/4 and the equation is exact, i.e., it is not an approximation. A similar trick can be used for gradients of quantum gates. This is what Josh meant regarding exact, analytic gradients.

Finally, symbolic differentiation is just that, it’s literally writing the derivative as a mathematical formula.

To answer your final question, typically you can’t do this. If x is a parameter of the circuit, then you need to change the parameter to x+s and x-s separately, and individually calculate expectation values. The only exception I can think of is if you can “absorb” the shift into an observable. So if you can define an observable O_1 whose expectation value is f(x+s) and another observable O_2 whose expectation value is f(x-s), where the expectation is taken with respect to the circuit in the same configuration, then from linearity you can define the observable O_3 = O_1-O_2 whose expectation value is f(x+s)-f(x-s), which you can calculate or estimate directly. I’m not sure this is even possible.

Hope that helps!