Hybrid computation backpropagation

On the documentation for “Hybrid computation” there is a section near the end that states:

“The ability to backpropagate through hybrid computations does not mean that one can backpropagate (i.e., compute errors) through a quantum computation .”

I’m trying to understand this better. What is the significance of the “through a quantum computation” in this statement?

Hey @amir and welcome!

What this “innocent” comment is pointing at is indeed quite a deep research question. Let me try to explain:

Classical automatic differentiation, i.e. the technique with which backpropagation in neural nets is implemented on computers, is very efficient because it shares intermediate results when computing the partial derivatives of parameters theta1,…,thetaD. This means it does not have to run the circuit “backwards” D times, but less often.

In quantum computing, the way information is processed makes this much harder, if not impossible. We cannot store the intermediate quantum state and then just use it for different computations, this is forbidden by the no-cloning theorem of quantum mechanics. While research may still come up with clever ways to cut down on the 2D circuits that need to be evaluated to compute the partial derivative of D parameters with parameter-shift rules, it is unlikely that we get the efficiency of classical automatic differentiation.

However, a quantum node in a hybrid computation can be part of classical automatic differentiation like any arithmetic function (such as a sine or addition), since it can provide its gradients. In some sense, one should therefore view it as a nuclear building block of the overall computational graph, rather than an element that one can differentiate through like a neural net…

Hope this clarifies the very compact statement?

Thanks. Yes, it does.

Can you expand on the intuition behind this. My assumption is that the complexity of backpropagation through the qnode is significantly large enough to warrant no consideration mid-model, and thus qnodes should be placed prior to differentiable calculations. Is there some scale of qubits on which gradient calculations of quantum computations can be calculated efficiently. In some ways this pertains to this post here Quantum Advantage by Quantum Simulators

If we can show exponential speed up, with limited simulated qubits on certain inputs, would it be possible to parallelize multiple quantum circuits so that the differentiation process scales linearly with the constant exponential of the limited number of qubits. O(k2^n), where n is relatively small, and k represents the number parallel computations on n qubits present in the computation graph.

Hey @TylerRoost! I’m not sure l understand your comment. Let me try:

By “complexity of backpropagation through the qnode”, do you mean the number of circuit evaluations you need to estimate the gradient of the computation performed by the qnode? (So strictly speaking not bp through the qnode, but having the qnode as one node of the computational graph for which bp is done, as discussed above).

If so, then computing the gradient is efficient in that it requires the estimation of as many expectation values as there are parameters, times a small constant factor (see “parameter shift rules”).

Please also explain what you mean by exponential speedup :slight_smile:

Interesting after seeing parameter shift rules, I think I am starting to see some more of the picture, as I was misinterpreting nuclear building block. So new question what structural limitations inherent in architecting models with quantum circuits should I be aware of?

Re: Exponential speedup
So here are at two things that can have an impact on complexity that I’m aware of, scale of the input and complexity of the algorithm. So let’s say for fixed size input (small) we can show a quantum algorithm can solve our problem using parallelization of a bunch of low qubit systems, so overall number of total qubits is high, but the interconnections are low, while classical algorithms would not be capable within the life of Universe.

I guess my general question in this regard is in what ways does/might parallelization attribute to the consideration of quantum supremacy?

There must be a whole class of solvable problems that are divide and conquer framable for low input low qubit systems to work separately then combine partial solutions? For this class of problems would the classical v quantum solution time scales not really indicate quantum supremacy for some reason?

How do hybrid models play a role in quantum supremacy, is there a hybrid supremacy moment?

Hi @TylerRoost, very interesting questions. This probably deserves a much deeper discussion but hopefully this short answer will point you in the right direction.

Regarding the parameter-shift rules you can learn more about them here and here. Most of what we have in PennyLane is differentiable via this rule, but you should be careful not to update global variables from within your circuit because you might run into problem there. Also, if you use sample() for your measurement you will not be able to find a gradient.

RE: Exponential speedup

Quantum computers are great but they still have a long way to go. Encoding data into a quantum computer can be particularly slow, and currently they are not fault-tolerant, which means that your measurement will probably contain some errors.

Moreover, as soon as you measure you lose your quantum state and end up with a classical output. If you want to perform a complex computation that requires many qubits it’s not obvious how to split it into low-qubit modules. It’s not impossible though. In fact, we have implemented a “circuit cutter” in our latest PennyLane version, which allows you to do exactly that. You can learn more about it here.

The divide and conquer strategy may work to a certain extent, but since it takes time to encode your information into the quantum computer and then you lose information upon measurement, it’s possible that the computational cost added by this strategy is too large. I would invite you to try it for several examples and see if you find any particular trend by using this method.

I also recommend reading thisblog post by Maria, where she discusses exponential speedups in an excellent way.

I must say I hadn’t heard the term “hybrid supremacy” until now, but it is true that hybrid models look like they can have interesting advantages compared to purely classical or quantum ones.

I hope this was helpful, please let me know if any of your questions remain unanswered.

The project I’m working on has hope by me to design composable low qubit modules, we’ll see if it actually works soonish. I’m on family vacation currently or else I’d have more to share. Also in terms of circuit cutting I’m not trying to decompose larger computations into a set of cuts. I’m trying to build circuits that do work well together in a composable and modular fashion. As well I’m trying to build hybrid models with said modules organically.

In regards to the blog post regarding exponential speed ups, I think the take on maximizing expressivity takes a stance that aligns with the no free lunch theorem. I pose here that the no free lunch theorem may have exceptions. While there aren’t any models yet that perform well across all the different modalities I do think the trajectory of models is trending towards production of models that are manageable sizes and thus parameter efficiency is trending upwards in a value.
I think there is a fair argument to be made that parameter efficiency may indicate a direction to pursue for exploration in the space of more expressive models.

Your project sounds very interesting @TylerRoost. Please do share the results of this project with us!

I don’t know exactly what you mean by parameter efficiency but it is true that as we explore different ansatze we may find some that require less parameters but are still capable of modelling our problems well. It would in fact be ideal to model our data well by using a smart ansatz and few parameters.

I suppose generalizable parameter efficiency is a better measure of model performance. So models that can perform on a wide set of disjoint tasks as well as adjoint tasks with the least amount of parameters should be considered the “best” models or circuits or hybridization. My intuition tends towards thinking that the two best of pure models and quantum circuits by this definition should be components of hybrid models but I wouldn’t be too surprised to be wrong.

In this regard of best I expand on agreeable terms of the blog post and present possible continuations. Another of which would be that I agree with the concept that performing generally in practical applications is quantifiably better by the impact it can have on a gross quantity people, but I don’t think passing on the idea, that generalizing across these base “easy” tasks, can present the modules that may be useful in more meaningful and practical tasks.

That being said an argument can be made that while neural models have had more practical success, I think this hinges on hardware as did the deep learning revolution depended on hardware advancements to reach a practical advantage. Is there a Moores law for qubits?

Despite a clear need for more powerful qpus for grander scales of computation. I believe there are many low hanging fruit along the way, in regards to this quantity of best models which perform most optimally with regards to modular composability on a large range of different possible implementations of different representations. Here representations refers to quantum or classical system states that represent different relevant information across a a plethora of tasks. My guess would be to pursue pre-trained quantum circuits for composable feature capabilities.

Hi @TylerRoost, it’s a very interesting discussion. Regarding the Moore’s law for qubits there’s the Gambetta Law, which states that the quantum volume is doubling every year. I think that the trend has gone up since 2021 though.

If you come by any research about modular composability it would be very cool if you could share it here.

Having pre-trained circuits that are useful for a variety of tasks is a complicated task, not only for quantum computers but also classical ones. But maybe something like data re-uploading can work. You can learn more about this in our data-reuploading classifier demo.