Lightning GPU not working

I am getting the following error when trying to access ‘lightning.gpu’ device.

pennylane_lightning_gpu.lightning_gpu_qubit_ops.PLException: [/project/pennylane_lightning_gpu/src/util/DataBuffer.hpp][Line:37][Method:DataBuffer]: Error in PennyLane Lightning: no error

PS: I get this error when trying to run the sample code given in this blog.

I am not sure what’s wrong. Please help.

Hi @imakash thanks for reporting this.

Can you confirm the details of the system you are running this on (CPU type, GPU type) and the results of qml.about()? Also, what version of CUDA do you have installed? If you are using the newly released CUDA 12 this will not work with this library, and you may need to downgrade to CUDA 11.

If there is any more information from the error message, that would be a big help also.

Lastly, if you have a code sample that reproduces the issue you are seeing locally, that would be useful to help us debug. Thanks.

Hello @mlxd

GPU details: Nvidia Tesla V100. Volta architecture. 16GB graphics memory.

CUDA version - 11.7

Unfortunately, I don’t have any further information about the error message. I get this error while trying to run the code mentioned in this blog on the line :

dev = qml.device('lightning.gpu', wires=wires)

I hope this helps.

Thanks for the info.

Based on the information I have, it looks like there may be a problem with allocating memory on your GPU, as the calls to cudaMalloc may be failing. Can you confirm the GPU RAM is not full by running nvidia-smi and checking there is a sufficient amount of memory available for the number of wires you are providing. Note that the 16GB V100s may be too small to run that blog example in its entirety, as it was originally run on an A100 GPU with a minimum of 40GB of GPU memory.

If it looks like there is sufficient free GPU memory, you can try allocating a smaller GPU buffer by setting wires=4 and checking if this succeeds.

Failing that, I suspect this may be an compatibility issue with the compiled modules and the CUDA runtime (we currently build against CUDA 11.8). If you can upgrade your CUDA version to 11.8 and see if this works, that would be a potential workaround until we can rectify the situation. We will aim to understand if this is the case locally, as we likely will need to downgrade the CUDA compiler and runtime libraries we build against to ensure forward-compatibility, and re-release the wheels if so.

Hi again @imakash it would appear the forward compatibility support is the problem here. As we build against a newer CUDA toolkit (11.8), additional efforts are required to run on an older version (11.7 on your machine). More details here CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation

We will aim to validate this by rebuilding the application with the oldest toolkit version that supports cuQuantum. If this is the problem, we will push a new release of Lightning GPU.

Hello @mlxd

Thanks for your response. I tried with reduced number of wires but I get the same error. Also, for this GPU, memory usage was nil as no running processes were found(according to nvidia-smi output).

I believe that there could be a compatibility issue with the current CUDA version. Waiting for you to confirm on this.

Hi @imakash I can confirm the CUDA compiler being newer than the runtime is the issue here. We have issued an upgraded release of lightning.gpu, as v0.28.1. This can be installed with python -m pip install pennylane-lightning-gpu --upgrade

Please feel free to let us know if this solves your issue.

1 Like

Hello @mlxd

It works now. Thanks a lot for your help.

Glad to hear it, and thanks again for letting us know.

I’m trying to use lightning-gpu as well, but I got a similar error.
Can you tell me how much GPU specs you can run up to how many qubits?
Also, is it possible to increase the number of simulated qubits by modifying the code?

I can use Nvidia A100 SXM4 40GB × 8.
Please help.

Hi @masa, welcome to the forum!

With 40GB you will be able to run circuits with up to 27 or 28 qubits. If you want to run circuits with more qubits you can try:

  • If you have more than one observable you can split them into multiple GPUs by following the instructions here in the docs.
  • Circuit cutting. This demo shows how to use this technique, which splits your circuit into smaller parts, reducing the memory needs. It will have a classical overhead though so running the circuits will take more time. This can be more helpful or less helpful depending on how interconnected your circuit is.

Please let us know if any of these options solve your issue. If not, could you please post the output of qml.about() and your CUDA version?