Pennylane Lightning GPU

Hello I am trying to get the pennylane GPU work on my machine with Cuda on it. I installed both the pennylane and pennylane-lightning libraries.

I am running the following code:

import pennylane as qml
dev = qml.device(“lightning.gpu”, wires=2)

I get the following error:

>>> dev = qml.device("lightning.gpu", wires=2)
Traceback (most recent call last):
  File "/home/baradwaj/.local/lib/python3.10/site-packages/pennylane_lightning_gpu/", line 52, in                   <module>
    from .lightning_gpu_qubit_ops import (
ImportError: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/baradwaj/.local/lib/python3.10/site-packages/pennylane/", line 328, in device
    plugin_device_class = plugin_devices[name].load()
  File "/usr/lib/python3/dist-packages/pkg_resources/", line 2465, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/", line 2471, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/baradwaj/.local/lib/python3.10/site-packages/pennylane_lightning_gpu/", line 17, in <mod                  ule>
    from .lightning_gpu import LightningGPU
  File "/home/baradwaj/.local/lib/python3.10/site-packages/pennylane_lightning_gpu/", line 109, i                  n <module>
    except (ModuleNotFoundError, ImportError, ValueError, PLException) as e:
NameError: name 'PLException' is not defined. Did you mean: 'Exception'?
>>> ImportError: cannot open shared object file: No such file or directory
  File "<stdin>", line 1
    ImportError: cannot open shared object file: No such file or directory
SyntaxError: invalid syntax
>>> exit()

Can any one please help me on this?

I installed the ‘cuquantum’ and it worked.

1 Like

Hi @bharadwaj509

Welcome to the Forum!

It’s great that you found the solution to your problem. Thank you for posting it here. Enjoy using PennyLane!

@CatalinaAlbornoz Hey Good Day! So how do I know that my code us using the GPU or not. In the tutorial page it said " If the NVIDIA cuQuantum libraries are available, the above device will allow all operations to be performed on a CUDA capable GPU of generation SM 7.0 (Volta) and greater. If the libraries are not correctly installed, or available on path, the device will fall-back to lightning.qubit and perform all simulation on the CPU."

Other than the above libs, are there any libraries I should install? Because my code is still taking the same time to execute and I am wondering if I am missing something.

Are there any restrictions on the type of GPU? I have a Nvidia T4 GPU with me.

Could you please help me?

@mlxd Hey Lee, Sorry to bother you here. I see some a solution of yours in another post. I thought I could ask you here. I ran the qml. about() as well. And the below is what I got. Could you please suggest me if I am doing something wrong. I want to see if there is any drop of time.

I have a Nvidia T4 installed on my machine.

Name: PennyLane
Version: 0.31.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
License: Apache License 2.0
Location: /home/baradwaj/.local/lib/python3.10/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info:           Linux-5.15.0-67-generic-x86_64-with-glibc2.35
Python version:          3.10.6
Numpy version:           1.23.5
Scipy version:           1.10.0
Installed devices:
- default.gaussian (PennyLane-0.31.0)
- default.mixed (PennyLane-0.31.0)
- default.qubit (PennyLane-0.31.0)
- default.qubit.autograd (PennyLane-0.31.0)
- default.qubit.jax (PennyLane-0.31.0)
- (PennyLane-0.31.0)
- default.qubit.torch (PennyLane-0.31.0)
- default.qutrit (PennyLane-0.31.0)
- null.qubit (PennyLane-0.31.0)
- lightning.qubit (PennyLane-Lightning-0.31.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.31.0)

Hi @bharadwaj509
For cuQuantum support, provided the GPU you have has capability of SM 7.0 and greater, it should work with lightning.gpu. Since we expect full double-precision support for the simulator, you may find reduced performance on a non-tesla device (V100, A100, H100), as these devices are the only ones to support full performance at double precision. lightning.gpu is generally built as a HPC-focused simulator, and should really only show benefits for incredibly deep circuits, or circuits requiring 19+ qubits. Less than this, lightning.qubit is likely to be faster. Note that at the above circuits and depths, you should also enable the adjoint differentiation pipeline (setting `diff_method=“adjoint” in the QNode).

If you wish to use a GPU for smaller circuits, you can explore using PennyLane with PyTorch or JAX to handle GPU data with the backprop differentiation method, as this should be faster at the smaller scales.

@mlxd I am given an Nividia T4 and it seems to have SM 40. However, I think my circuit is rather small. Here are the specs. But I am not entirely certain how to see the qubit count here. Could you please help me?

{'resources': Resources(num_wires=6, num_gates=37, gate_types=defaultdict(<class 'int'>, {'QubitStateVector': 1, 'RX': 18, 'CNOT': 18}), gate_sizes=defaultdict(<class 'int'>, {6: 1, 1: 18, 2: 18}), depth=22, shots=Shots(total_shots=None, shot_vector=())),
 'gate_sizes': defaultdict(int, {6: 1, 1: 18, 2: 18}),
 'gate_types': defaultdict(int, {'QubitStateVector': 1, 'RX': 18, 'CNOT': 18}),
 'num_operations': 37,
 'num_observables': 6,
 'num_diagonalizing_gates': 0,
 'num_used_wires': 6,
 'num_trainable_params': 19,
 'depth': 22,
 'num_device_wires': 6,
 'device_name': 'lightning.gpu',
 'expansion_strategy': 'gradient',
 'gradient_options': {},
 'interface': 'auto',
 'diff_method': 'best',
 'gradient_fn': 'pennylane.gradients.parameter_shift.param_shift',
 'num_gradient_executions': 'NotSupported: Cannot differentiate with respect to parameter(s) {0}'}

Hi @bharadwaj509 ,

As mentioned here, num_device_wires tells you the number of qubits. Here you can also notice that in the first line you have num_wires=6. Here you are setting the number of qubits to be 6.

Let us know if you have any other questions!

1 Like

Perfect @CatalinaAlbornoz @mlxd Thanks a lot.

1 Like

Hey @CatalinaAlbornoz @mlxd , So I am using the angle embedding on my 26 features so that I could assign 1 qubit to each feature. And I am using GPU for this as the circuit size is going to be 26. And I got the below error.

File ~/.local/lib/python3.10/site-packages/pennylane_lightning_gpu/, in LightningGPU.adjoint_jacobian(self, tape, starting_state, use_device_state, **kwargs)
    720             jac = adj.adjoint_jacobian_serial(
    721                 self._gpu_state,
    722                 obs_serialized,
    723                 ops_serialized,
    724                 tp_shift,
    725             )
    727 else:
--> 728     jac = adj.adjoint_jacobian(
    729         self._gpu_state,
    730         obs_serialized,
    731         ops_serialized,
    732         tp_shift,
    733     )
    735 jac = np.array(jac)  # only for parameters differentiable with the adjoint method
    736 jac = jac.reshape(-1, len(tp_shift))

PLException: [/project/pennylane_lightning_gpu/src/util/DataBuffer.hpp][Line:48][Method:DataBuffer]: Error in PennyLane Lightning: out of memory

It says it is out of memory when training it.

Could you please help me what to do?


Thanks for your question!

26 qubits might be a bit too resource intensive but a good GPU might be able to handle it. Still, this error might have its origins in how the lightning library works in the backend. I’m tagging @mlxd here, since he is our GPU & performance guru. Please sit tight while he figures out a solution to your issue.



1 Like

Sure, @Alvaro_Ballon I will wait for him. We have a 32 GB Ram with the T4 GPU. And we tried the same on the CPU too, left the code running for 12 hours, but nothing happened.

Thanks and regards,
Baradwaj Aryasomayajula

You can also try qsim (40 qubits max), 90 core xeon cpu. 2 pip installs, change device to cirq.qsim. Demo: Beyond classical computing with qsim

Hi @bharadwaj509

The NVidia T4 GPU seems to have only 16GB of available memory – likely the issue is the size of the problem representation is too large to be run on a GPU of that size (I assume you have 1 qubit per feature, 26 features → 26 qubits). Though, overall, for 26 qubits you will need 16 bytes per complex value, and 2^26 values, which will require around 1GB of memory just for the statevector storage alone. Without explicitly seeing your circuit I cannot recommend further, but you should have enough memory available to store and evaluate this computation on a GPU of that size. Can you provide a minimum working example of your script that reproduces this error?

With that we should be able to better advise the next steps.

Hey, @mlxd ,

Is is possible to contact to you directly. I am using some one else’s code at my work. And I would like to correspond with you via my work email.

Hey @mlxd and @CatalinaAlbornoz ,

I would like to share the code with you. But I work for CVS health and my boss asks me to get an NDA if we dont have already? Should I check for Xanadu or Pennylane as your parent company name. Could you please share your email? I would like to correspond directly.

Hi @bharadwaj509

Answering user questions on the forum is a very small part of my work at Xanadu — I unfortunately don’t have the resources to take on external collaborations. We can continue to converse here in the open, where I can help you debug and answer questions about your PennyLane code, but please be aware that I don’t have the resources to help with research projects.

If you want to directly engage with others at Xanadu in an industry-focused research setting, I can give you the contact of the Business Development team and Community Team —

Feel free to reach out there directly, and they can advise on how to proceed.