It definitely is a bit unintuitive!

However, it is worth noting that this result does not come purely from the quantum mechanics; this is a result that holds in classical settings whenever we use stochastic gradient descent (SGD).

In SGD, we typically consider a cost function of the following form:

C(w) = \frac{1}{N} \sum_{n=0}^N C_n(w),

where we are summing over multiple ‘observations’ C_i. (Note that this cost function has the same form as computing expectation values of quantum observables!).

In typical gradient descent, we perform iterative update steps on the parameters w to minimize the cost function:

w \rightarrow w - \eta \nabla C(w) = w - \frac{\eta}{N} \sum_{n=0}^N \nabla C_n(w).

However, the beauty of stochastic gradient descent is that we can replace the gradient of the cost function **with the gradient of a single observation, randomly chosen for each update step**, and convergence to a local minimum is still guaranteed (assuming the learning rate \eta decreases appropriately):

w \rightarrow w - \eta \nabla C_n(w).

This is a really nice result, especially if the full gradient \nabla C(w) is expensive to compute.

In Stochastic gradient descent for hybrid quantum-classical optimization, the authors generalize this result to the quantum case (after noting that C can be interpreted as an expectation value over finite shots, and the stochasticity coming from the process of measuring the quantum system):

So to me a single shot is essentially useless as I only retrieve either a +1 or −1 … I guess I am missing something here… thanks for the response again!

So here, the single shot expectation value is useful in the specific case of *quantum gradient descent*.

However, convergence will definitely be faster by increasing the shot number! It gives us a nice strategy though; we can begin the minimization process by starting with single shots in order to get a better approximate solution. As our approximation improves, we can increase the number of shots to fine-tune and converge towards the local minimum.

Note that this doesn’t apply to *evaluating* the expectation value in a general setting, as you correctly point out. We will need to increase the number of shots to get a better estimation.