Adjoint Differentiation

SHAN · March 14, 2024, 7:57am

Hello! I’m a little confused about the content — Adjoint Differentiation

This method is mentioned here to obtain gradient values faster. I think the reason for the acceleration here is that the corresponding quantum states are transformed into matrix form, and the gradient is actually obtained through classical calculations. The obtained matrix is stored in the computer for the next use. Therefore, when the number of quantum bits is high (over 15), the memory requirements for classical computers are relatively strict.

Is there any problem with my understanding?

isaacdevlugt · March 14, 2024, 8:37pm

Hey @SHAN, welcome back!

Some parts of your understanding are not 100% right, but others are! I’ll try to address your post sentence-by-sentence:

This method is mentioned here to obtain gradient values faster

You got it

that the corresponding quantum states are transformed into matrix form,

Well, yes, states are reshaped into 2x2x2… arrays, but that’s for all statevector simulations. So that’s not the reason why adjoint differentiation is providing a speedup. It’s providing a speedup because we’re taking advantage of the unitary property of quantum operations.

The unitary property of quantum operations allows us to calculate something once. We can compute a bra from computing a ket — we compute both by computing just one by taking the adjoint . That, and if you work out the math for calculating the derivative of an expectation value, one forward pass allows you to then avoid quadratic amounts of work!

the gradient is actually obtained through classical calculations

Adjoint differentiation is not compatible with quantum hardware (i.e., you cannot ask a quantum device to perform adjoint differentiation natively on the device). So, yes, the gradient is obtained through classical calculations (i.e., linear algebra).

Let me know if this helps!

SHAN · March 15, 2024, 12:58am

Thank you very much for your explanation. It’s very easy to understand . Since the calculation of inner product is done on classical computers, what are the performance requirements for the computer ? For example, when the number of qubits is 15, the computer only has 8GB RAM. In addition to using other methods for processing, such as using ROM for assistance. I have a 16GB RAM computer, but I am unable to create a matrix corresponding to quantum bit numbers exceeding 13. So I want to know if this interesting method has certain requirements for RAM size. As in Windows with only 8GB of RAM, I am unable to process matrix sizes of 2 ^13. Here 13 represents the number of quantum bits.

zhong_Feng · March 15, 2024, 2:08am

Hi, I have the same problem. When I use the algorithm of quantum graph neural network, I may use 39 quantum bits, which I think is too big and I don’t know if I should do it anymore

isaacdevlugt · March 18, 2024, 2:56pm

Hey @SHAN,

This is strange . You should be able to simulate well above 13 qubits with 8GB of RAM. Roughly speaking, the amount of memory that an N-qubit state will take up in memory (in units of GB) is

GB = 2^{N} \times 128 / 10^9

128 is from each entry in a 2^N vector being represented by a complex128 number.

I would make sure that other processes running on your laptop are kept to a minimum.

@zhong_Feng this might be a good case to use the PennyLane-lightning plugin: Lightning plugins — Lightning 0.35.1 documentation. You might still run into hard cutoffs imposed by NumPy, but give this a try! It should help your code run much faster.

nathan · March 18, 2024, 3:36pm

Hi @SHAN @zhong_Feng,

A bit more detail which may help your understanding and your estimate of the amount of memory needed for adjoint method.

The implementation of the adjoint method requires enough memory to store about 3 statevectors in memory. For comparison, just computing an expectation value, but no gradient, requires about 1 statevector’s worth of memory. So adjoint method uses 3x as much memory.

Note that this is still much more memory efficient that the standard backpropagation algorithm (which stores N statevectors, where N is the number of gates in your circuit), and it is more time-efficient than hardware-based gradient methods (like parameter-shift), since you can compute the entire gradient with one simulation of the circuit (not N).

SHAN · March 19, 2024, 1:50am

Thank you for your answer .

Topic		Replies	Views
Adjoint Gradient with complicated targets PennyLane Help	6	359	July 27, 2023
Adjoint Differentiation Method PennyLane Help	1	570	February 17, 2021
Second order differentiation using adjoint differentiation PennyLane Help	8	854	July 11, 2023
Hybrid model (keras) with adjoint differentiation PennyLane Help	9	806	March 29, 2022
Hardware/Plugins for Faster QNNs PennyLane Help	3	493	February 17, 2023

Adjoint Differentiation

Related topics