Hybrid computation backpropagation

Can you expand on the intuition behind this. My assumption is that the complexity of backpropagation through the qnode is significantly large enough to warrant no consideration mid-model, and thus qnodes should be placed prior to differentiable calculations. Is there some scale of qubits on which gradient calculations of quantum computations can be calculated efficiently. In some ways this pertains to this post here Quantum Advantage by Quantum Simulators

If we can show exponential speed up, with limited simulated qubits on certain inputs, would it be possible to parallelize multiple quantum circuits so that the differentiation process scales linearly with the constant exponential of the limited number of qubits. O(k2^n), where n is relatively small, and k represents the number parallel computations on n qubits present in the computation graph.