Hi @nyquant!

PennyLane does not use finite differences to compute the gradient. Instead, it uses the parameter-shift rule.

The parameter-shift rule provides **exact** gradients, whereas numerical differentiation such as finite-difference provides only an **approximation** to the gradient.

Furthermore, the parameter-shift rule is simply written as a linear combination of quantum circuit evaluations, allowing it to be executed on near term hardware. We even have a degree of freedom in choosing *where* in the parameter space we perform these evaluations. Typically we maximise the distance between evaluations on near-term noisy quantum devices, as this allows us to compute expectation values further apart in parameter-space and to limit the effects of shot-noise.

As an example, we can compute the parameter-shift gradient rule for a simple function such as f(x)=\sin(x), and compare this to finite-differences

**Exact gradient: parameter-shift rule**

We know that the gradient is given by f′(x)=\cos(x). By making use of the trig identity

\cos(x)=\frac{\sin(x+s)−\sin(x−s)}{2\sin(s)}

we can now write the gradient as

f′(x)=\frac{f(x+s)−f(x−s)}{2\sin(s)}

That is, the *exact gradient* of f(x) is given by evaluating the function at the points x+s and x−s, and computing a linear combination. This is exact *for any value of s* .

**Compare this instead to finite-differences:**

f′(x)\approx \frac{\sin(x+h)−\sin(x−h)}{h}+O(h^2)

This is an *approximation* that is only valid for h\ll 1 . Furthermore, numerical differentiation is prone to numerical instability, unlike the parameter-shift rule which is exact.

So as you can see from the example, the parameter-shift rule takes into account structural information about f(x) to allow the exact gradient of f(x) to be computed by simply taking additional function evaluations.

However, this requires f(x) to satisfy particular conditions, so is not universal!