Why does the parameter-shift rule run on both the positive and negative shifts?

In the derivation of the parameter shift rules in the original paper, the Hermitian generator G of any unitary U(\theta)=\exp[-i\theta G] satisfies

U(\pm\frac{\pi}{4r})=\frac{1}{\sqrt{2}}(I\mp\frac{i}{r}G)

So, this should allow us to write the gradient as a single pass of the normal circuit corresponding to I and a pass of one of the two shifted circuits, right? I get why using both the shifted circuits works, but isn’t it computationally more efficient to use the run of the unshifted circuit we are already doing, plus only one shifted circuit?

Hi @somearthling,

It would be computationally more efficient to do what you mention, unfortunately you would get the gradient of the function at a different point. The property that we have is that the gradient of the function at point theta is equal to the evaluation of the function with a positive and negative shift. This means that if you move this shift you will end up calculating the gradient of the function at a shifted point too.

In the PennyLane website you will find a detailed explanation of the parameter-shift rule. This can give you a good insight onto this rule.

As a simple example, let’s say your function is a sine function. The gradient of a sine function is the cosine function. The cosine function can be written as a subtraction between two shifted sine functions. However the shift cannot happen anywhere, the shift needs to happen around theta, so you need to add and subtract a constant “s”, which in this case is pi/4.

I hope this explanation and the one on the PennyLane website are helpful. Please let us know if you still have questions about this.

Hi Catalina,

Switching around the expression I wrote to get G as a function of U, we would get

\nabla_\theta U(\theta) = -irG U(\theta)\\ =\mp r\left[\sqrt{2}U\left(\theta\pm\frac{\pi}{4r}\right)-U(\theta)\right]

which should mean we can use either of the two shifts and the original at \theta, right? This should give us the gradient at \theta.

Hi @somearthling,

Thanks for the question! :slightly_smiling_face:

\nabla_\theta U(\theta)= -irGU(\theta)

Could you further describe your line of thinking here?

In specific:

  1. How come r appears on the right-hand side as a term?

Equation (5) from the paper describes the derivative of a gate generated by a Hermitian operator G:

\partial_\mu \mathcal{G} = -iGe^{-i\mu G}

Using the definition of \mathcal{G}(\mu)=e^{-i\mu G} on the right hand side, equation (5) is equal to:

\partial_\mu \mathcal{G} = -iG\mathcal{G}(\mu)

  1. If the equation for the gradient of the unitary that you propose does hold, what circuit would we run on real quantum hardware to get the gradient of the variational circuit? I.e., how can you substitute the proposed gradient of the gate into equation (3) of the paper such it’s something we can compute the gradient on real quantum hardware? If you could re-express the gradient of the variational circuit (\partial _\mu f instead of the gradient of a single gate), that would help with the understanding. :slightly_smiling_face:

Also, assume that the quantum hardware can run the circuit of f, but there should be no further assumptions on additionally supported gates.