Adjoint Differentiation

Hello! I’m a little confused about the content — Adjoint Differentiation

This method is mentioned here to obtain gradient values faster. I think the reason for the acceleration here is that the corresponding quantum states are transformed into matrix form, and the gradient is actually obtained through classical calculations. The obtained matrix is stored in the computer for the next use. Therefore, when the number of quantum bits is high (over 15), the memory requirements for classical computers are relatively strict.

Is there any problem with my understanding?

1 Like

Hey @SHAN, welcome back!

Some parts of your understanding are not 100% right, but others are! I’ll try to address your post sentence-by-sentence:

This method is mentioned here to obtain gradient values faster

You got it :slight_smile:

that the corresponding quantum states are transformed into matrix form,

Well, yes, states are reshaped into 2x2x2… arrays, but that’s for all statevector simulations. So that’s not the reason why adjoint differentiation is providing a speedup. It’s providing a speedup because we’re taking advantage of the unitary property of quantum operations.

The unitary property of quantum operations allows us to calculate something once. We can compute a bra from computing a ket — we compute both by computing just one by taking the adjoint :slight_smile:. That, and if you work out the math for calculating the derivative of an expectation value, one forward pass allows you to then avoid quadratic amounts of work!

the gradient is actually obtained through classical calculations

Adjoint differentiation is not compatible with quantum hardware (i.e., you cannot ask a quantum device to perform adjoint differentiation natively on the device). So, yes, the gradient is obtained through classical calculations (i.e., linear algebra).

Let me know if this helps!


Thank you very much for your explanation. It’s very easy to understand :smile:. Since the calculation of inner product is done on classical computers, what are the performance requirements for the computer :thinking:? For example, when the number of qubits is 15, the computer only has 8GB RAM. In addition to using other methods for processing, such as using ROM for assistance. I have a 16GB RAM computer, but I am unable to create a matrix corresponding to quantum bit numbers exceeding 13. So I want to know if this interesting method has certain requirements for RAM size. As in Windows with only 8GB of RAM, I am unable to process matrix sizes of 2 ^13. Here 13 represents the number of quantum bits.

1 Like

Hi, I have the same problem. When I use the algorithm of quantum graph neural network, I may use 39 quantum bits, which I think is too big and I don’t know if I should do it anymore

Hey @SHAN,

This is strange :thinking:. You should be able to simulate well above 13 qubits with 8GB of RAM. Roughly speaking, the amount of memory that an N-qubit state will take up in memory (in units of GB) is

GB = 2^{N} \times 128 / 10^9

128 is from each entry in a 2^N vector being represented by a complex128 number.

I would make sure that other processes running on your laptop are kept to a minimum.

@zhong_Feng this might be a good case to use the PennyLane-lightning plugin: Lightning plugins — Lightning 0.35.1 documentation. You might still run into hard cutoffs imposed by NumPy, but give this a try! It should help your code run much faster.

Hi @SHAN @zhong_Feng,

A bit more detail which may help your understanding and your estimate of the amount of memory needed for adjoint method.

The implementation of the adjoint method requires enough memory to store about 3 statevectors in memory. For comparison, just computing an expectation value, but no gradient, requires about 1 statevector’s worth of memory. So adjoint method uses 3x as much memory.

Note that this is still much more memory efficient that the standard backpropagation algorithm (which stores N statevectors, where N is the number of gates in your circuit), and it is more time-efficient than hardware-based gradient methods (like parameter-shift), since you can compute the entire gradient with one simulation of the circuit (not N).


Thank you for your answer :smiley:.

1 Like