Hello I’m currently working on a project where I’m trying to classify arrays with shape (8,60).
My basic plan was to use 8 qubits and 60 layers to embed the data (approximately 1Gb). But I run into an issue while doing the backpropagation with pytorch my RAM seems to fill up pretty quickly and that leads to a crash of my processes.
My computer is a OMEN (16Gb of RAM, GTX2060, 6Gb VRAM, intel i7 9th gen) It seems like it’s a performance issue of either my computer or the QPU wich is the basic pennylane simulator for now. It seems to be working if I reduce the numbers of layers to 1 but that ruins my model.
From what I understand it seems that when doing the backpropagation the QPU uses the RAM for calculation instead of the GPU wich leads to a crash. Is there a way to run a simulator on the GPU maybe ?
I’m really lost and my deadline is coming pretty fast. Is there a way to speed up the calculation maybe? Or the technology is not quite there yet ?
If memory is an issue, try using diff_method="adjoint" or "reversible", see qnode doc. You have a lot of parameters, so this may still take a long time, but these methods won’t take more space as you add layers.
Thanks @christina, it works fine now but It really does take a long time to train.
Do you have any idea on how I could accelerate the trainning ?
I’ve compared different devices already and the pennylane default qubit seems to be the most performant and the only one that accepts the diff_method = "adjoint".
Hi @Qranos! 8 qubits and 60 layers is quite a bit, so I can imagine it does take a while to train. How many parameters do you have in your circuit?
I’ve compared different devices already and the pennylane default qubit seems to be the most performant and the only one that accepts the diff_method = "adjoint" .
When using default.qubit with the adjoint method, does it lead to a significant speedup in training your model?
Hello @josh!
The shape of my parameters was [60, 2 , 8, 3] but I managed tu cut that off to [60, 1, 8] (480 parameters) by going from the Stronglyenteglinglayer to the Basic one and from 2 embedding layers to 1 and that lead to a speed up of ~ x5.
When using default.qubit with the adjoint method, does it lead to a significant speedup in training your model?
When I did my bench marking to try to see which simulator was the most performant by running my model with a single input, it did not show any kind of improvement.
However when I’m trying to run my training the diff_method = "adjoint" is the only thing that’s seems to work without crashing my computer by filling the RAM.
When I tried with diff_method = "reversible" it didn’t crashed but nothing happened for 5 minutes so I killed it.
Here are the results of my benchmarking:
As you can see qulacs[“cpu”] seems to perform a little better on my model but does not run when I’m training the model it did nothing for 5min so I killed it too. Also the forest virtual devices seem to freeze and never get an answer maybe I did something wrong during the installation.
Encoding size: [10, 8, 60]
Number of qubits: 8
Device: pyQVM NumpyWavefunction Simulator Device
Time = 1.753637929999968
Device: Forest Wavefunction Simulator Device
Time = 2.221205955998812
Device: Default qubit PennyLane plugin
Time = 1.4565937029983616
Device: Qiskit PennyLane plugin
Time = 4.764246687998821
Device: Qulacs device
Time = 1.3797734300005686
Device: Qulacs device
Time = 1.0685497309987113
Device: Cirq Simulator device for PennyLane
Time = 2.8849340850010776
Device: ProjectQ PennyLane plugin
Time = 6.023803063999367
Device: Qiskit PennyLane plugin
Time = 9.381965378999666
Wow that is some thorough benchmarking! Thanks for that, it is very useful.
Are all the benchmarks done using diff_method="adjoint"?
However when I’m trying to run my training the diff_method = "adjoint" is the only thing that’s seems to work without crashing my computer by filling the RAM.
This is not surprising — the adjoint method is a form of backpropagation that takes into account the reversibility of quantum computing, to reduce the amount of memory required
No the `diff_method=“adjoint” was only used on the default qubit from penny lane it seems to be the only one that supports it.
Do you think that I’m doing something wrong here ? Because I’m only able to launch my training on the default.qubit with diff_method="adjoint" everything else results in a crash.
This would be my best guess, especially based on the information that adjoint (the best differentiation method in terms of memory usage) is the only approach where a crash doesn’t occur
Ok I guess I will just work on reducing my model for now. Thanks for your help it has been a pleasure!
I might update this thread with my results later on, or If I find tricks to improve it more.