Hey @amir and welcome!

What this “innocent” comment is pointing at is indeed quite a deep research question. Let me try to explain:

Classical automatic differentiation, i.e. the technique with which backpropagation in neural nets is implemented on computers, is very efficient because it shares intermediate results when computing the partial derivatives of parameters theta1,…,thetaD. This means it does not have to run the circuit “backwards” D times, but less often.

In quantum computing, the way information is processed makes this much harder, if not impossible. We cannot store the intermediate quantum state and then just use it for different computations, this is forbidden by the no-cloning theorem of quantum mechanics. While research may still come up with clever ways to cut down on the 2D circuits that need to be evaluated to compute the partial derivative of D parameters with parameter-shift rules, it is unlikely that we get the efficiency of classical automatic differentiation.

However, a quantum node in a hybrid computation can be *part* of classical automatic differentiation like any arithmetic function (such as a sine or addition), since it can provide its gradients. In some sense, one should therefore view it as a *nuclear building block* of the overall computational graph, rather than an element that one can differentiate through like a neural net…

Hope this clarifies the very compact statement?