Hi @nyquant!
PennyLane does not use finite differences to compute the gradient. Instead, it uses the parameter-shift rule.
The parameter-shift rule provides exact gradients, whereas numerical differentiation such as finite-difference provides only an approximation to the gradient.
Furthermore, the parameter-shift rule is simply written as a linear combination of quantum circuit evaluations, allowing it to be executed on near term hardware. We even have a degree of freedom in choosing where in the parameter space we perform these evaluations. Typically we maximise the distance between evaluations on near-term noisy quantum devices, as this allows us to compute expectation values further apart in parameter-space and to limit the effects of shot-noise.
As an example, we can compute the parameter-shift gradient rule for a simple function such as f(x)=\sin(x), and compare this to finite-differences
Exact gradient: parameter-shift rule
We know that the gradient is given by f′(x)=\cos(x). By making use of the trig identity
\cos(x)=\frac{\sin(x+s)−\sin(x−s)}{2\sin(s)}
we can now write the gradient as
f′(x)=\frac{f(x+s)−f(x−s)}{2\sin(s)}
That is, the exact gradient of f(x) is given by evaluating the function at the points x+s and x−s, and computing a linear combination. This is exact for any value of s .
Compare this instead to finite-differences:
f′(x)\approx \frac{\sin(x+h)−\sin(x−h)}{h}+O(h^2)
This is an approximation that is only valid for h\ll 1 . Furthermore, numerical differentiation is prone to numerical instability, unlike the parameter-shift rule which is exact.
So as you can see from the example, the parameter-shift rule takes into account structural information about f(x) to allow the exact gradient of f(x) to be computed by simply taking additional function evaluations.
However, this requires f(x) to satisfy particular conditions, so is not universal!