Lightning GPU raises ImportError, NameError

Hi all, this question is sort of a two-parter.

I am trying to use Lightning GPU for more efficient use of cuda in a major experiment.

Here are my Pennylane specs:

Name: PennyLane
Version: 0.32.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info:           Linux-6.2.0-33-generic-x86_64-with-glibc2.35
Python version:          3.11.0
Numpy version:           1.23.5
Scipy version:           1.10.1
Installed devices:
- default.gaussian (PennyLane-0.32.0)
- default.mixed (PennyLane-0.32.0)
- default.qubit (PennyLane-0.32.0)
- default.qubit.autograd (PennyLane-0.32.0)
- default.qubit.jax (PennyLane-0.32.0)
- default.qubit.tf (PennyLane-0.32.0)
- default.qubit.torch (PennyLane-0.32.0)
- default.qutrit (PennyLane-0.32.0)
- null.qubit (PennyLane-0.32.0)
- lightning.qubit (PennyLane-Lightning-0.32.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.32.0)

I just installed the Lightning-GPU plugin, and I ran the lines below to check that it worked:

import pennylane as qml
wires = 3
dev = qml.device('lightning.gpu', wires=wires)

When I ran these lines, I got this message:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pennylane_lightning_gpu/lightning_gpu.py", line 52, in <module>
    from .lightning_gpu_qubit_ops import (
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/idlelib/run.py", line 578, in runcode
    exec(code, self.locals)
  File "<pyshell#3>", line 1, in <module>
  File "/usr/local/lib/python3.11/dist-packages/pennylane/__init__.py", line 336, in device
    plugin_device_class = plugin_devices[name].load()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pkg_resources/__init__.py", line 2517, in load
    return self.resolve()
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pkg_resources/__init__.py", line 2523, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pennylane_lightning_gpu/__init__.py", line 17, in <module>
    from .lightning_gpu import LightningGPU
  File "/usr/local/lib/python3.11/dist-packages/pennylane_lightning_gpu/lightning_gpu.py", line 111, in <module>
    except (ModuleNotFoundError, ImportError, ValueError, PLException) as e:
                                                          ^^^^^^^^^^^
NameError: name 'PLException' is not defined. Did you mean: 'Exception'?

I checked the forum for similar issues, and I noticed this one:
https://discuss.pennylane.ai/t/pennylane-lightning-gpu/3200/14
The creator of that post was able to resolve the first part of the problem by installing cuquantum, so I tried the same thing with sudo pip3.11 install --no-cache-dir cuquantum, but after that installation, I still get the same error, and I would really appreciate some help.

The second part of my question has to do with an issue that I have encountered when trying to run my experiment without Lightning GPU. In this part, I get an error message that says CUDA out of memory. The code that produces this error is proprietary, so I unfortunately can’t share it, but I am wondering if it is the sort of issue that could be resolved using Lightning GPU. I managed to solve the issue somewhat by downsizing my network architecture a bit, but I would rather not have to resort to that.

Hi @justin6626

For point 1, I believe the issue is the CUDA version. Currently, lightning.gpu requires CUDA 11 (between 11.5 and 11.8) to operate. We are currently undertaking a move to CUDA 12, but right now you will need a CUDA 11 library runtime, as well as associated math libraries to use lightning.gpu, alongside a CUDA 11 version of cuQuantum’s custatevec (pip install custatevec-cu11). Since most default installations of CUDA now favour 12.0, the difference in major version is likely the issue here, If you downgrade your CUDA toolkit to 11.8, this should fix the reported error.

For your second point, without seeing your circuit, it will be very difficult to reason the problem. I suspect since you are using CUDA but not lightning.gpu you are running default.qubit through either Torch or JAX’s for the CUDA support. If this is the case, your are also likely using backpropagation for the differentiation method. If so, native backprop will use a lot of RAM as your start to approach larger (14+) qubits, as well as deeper circuits. I suspect this is the issue, assuming I have reasoned correctly. Using lightning.gpu with the diff_method="adjoint" method will scale better in memory use, but at the expense of more compute. You can also use diff_method="parameter-shift" for any device, which will only use as much memory as is required for a single forward execution, but will scale in number of executions as O(2*M*N), where M is the number of parameters in your circuit, and N the number of observables. This may or may not be a reasonable runtime, depending on what you are trying to do.

If you can share a minimum working example, we will be happy to provide more help on this.

Thank you very much for getting back to me! I managed to find another workaround for my memory issue, so I won’t need to use lightning.gpu for that. I think I will wait for the move to CUDA 12, since I’m a bit nervous about potential problems resulting from downgrading to 11.8. I am actually using default.mixed. I may end up using the parameter-shift method in case I run into problems. Just for comparison, what’s the big-O cost of the backpropagation differentiation method?

Hi @justin6626, I’m glad to hear you solved your memory issue.

The demo here makes a good time comparison between the parameter-shift method and backpropagation. As you can see backpropagation is faster, and actually diff_method="adjoint" is much faster. You can see a comparison here.