Pytorch backpropagation RAM issue

Hello I’m currently working on a project where I’m trying to classify arrays with shape (8,60).
My basic plan was to use 8 qubits and 60 layers to embed the data (approximately 1Gb). But I run into an issue while doing the backpropagation with pytorch my RAM seems to fill up pretty quickly and that leads to a crash of my processes.

My computer is a OMEN (16Gb of RAM, GTX2060, 6Gb VRAM, intel i7 9th gen) It seems like it’s a performance issue of either my computer or the QPU wich is the basic pennylane simulator for now. It seems to be working if I reduce the numbers of layers to 1 but that ruins my model.

From what I understand it seems that when doing the backpropagation the QPU uses the RAM for calculation instead of the GPU wich leads to a crash. Is there a way to run a simulator on the GPU maybe ?

I’m really lost and my deadline is coming pretty fast. Is there a way to speed up the calculation maybe? Or the technology is not quite there yet ?

Welcome @Qranos ! Thanks for the question.

If memory is an issue, try using diff_method="adjoint" or "reversible", see qnode doc. You have a lot of parameters, so this may still take a long time, but these methods won’t take more space as you add layers.

1 Like

Thanks @christina, it works fine now but It really does take a long time to train.

Do you have any idea on how I could accelerate the trainning ?
I’ve compared different devices already and the pennylane default qubit seems to be the most performant and the only one that accepts the diff_method = "adjoint".

Hi @Qranos! 8 qubits and 60 layers is quite a bit, so I can imagine it does take a while to train. How many parameters do you have in your circuit?

I’ve compared different devices already and the pennylane default qubit seems to be the most performant and the only one that accepts the diff_method = "adjoint" .

When using default.qubit with the adjoint method, does it lead to a significant speedup in training your model?

Hello @josh!
The shape of my parameters was [60, 2 , 8, 3] but I managed tu cut that off to [60, 1, 8] (480 parameters) by going from the Stronglyenteglinglayer to the Basic one and from 2 embedding layers to 1 and that lead to a speed up of ~ x5.

When using default.qubit with the adjoint method, does it lead to a significant speedup in training your model?

When I did my bench marking to try to see which simulator was the most performant by running my model with a single input, it did not show any kind of improvement.
However when I’m trying to run my training the diff_method = "adjoint" is the only thing that’s seems to work without crashing my computer by filling the RAM.
When I tried with diff_method = "reversible" it didn’t crashed but nothing happened for 5 minutes so I killed it.

Here are the results of my benchmarking:

As you can see qulacs[“cpu”] seems to perform a little better on my model but does not run when I’m training the model it did nothing for 5min so I killed it too. Also the forest virtual devices seem to freeze and never get an answer maybe I did something wrong during the installation.

    Encoding size: [10, 8, 60]
    Number of qubits: 8

    Device: pyQVM NumpyWavefunction Simulator Device
    Time = 1.753637929999968

    Device: Forest Wavefunction Simulator Device
    Time = 2.221205955998812

    Device: Default qubit PennyLane plugin
    Time = 1.4565937029983616

    Device: Qiskit PennyLane plugin
    Time = 4.764246687998821

    Device: Qulacs device
    Time = 1.3797734300005686

    Device: Qulacs device
    Time = 1.0685497309987113

    Device: Cirq Simulator device for PennyLane
    Time = 2.8849340850010776

    Device: ProjectQ PennyLane plugin
    Time = 6.023803063999367

    Device: Qiskit PennyLane plugin
    Time = 9.381965378999666

Wow that is some thorough benchmarking! Thanks for that, it is very useful.

Are all the benchmarks done using diff_method="adjoint"?

However when I’m trying to run my training the diff_method = "adjoint" is the only thing that’s seems to work without crashing my computer by filling the RAM.

This is not surprising — the adjoint method is a form of backpropagation that takes into account the reversibility of quantum computing, to reduce the amount of memory required :slight_smile:

No the `diff_method=“adjoint” was only used on the default qubit from penny lane it seems to be the only one that supports it.

Do you think that I’m doing something wrong here ? Because I’m only able to launch my training on the default.qubit with diff_method="adjoint" everything else results in a crash.

Or my model is just too big ?

I see!

Or my model is just too big ?

This would be my best guess, especially based on the information that adjoint (the best differentiation method in terms of memory usage) is the only approach where a crash doesn’t occur

1 Like

Ok I guess I will just work on reducing my model for now. Thanks for your help it has been a pleasure!
I might update this thread with my results later on, or If I find tricks to improve it more.

No worries @Qranos, sorry I couldn’t be of more help!

I might update this thread with my results later on, or If I find tricks to improve it more.

Please do, I would be interested to hear if you have any interesting tips or results.

1 Like