Problems with using lightning.gpu


I tried to use lightning.gpu to have a small test using the following code on Colab T4 GPU. Its CUDA compute capacity as I found online is 7.5. I attached the code in the following so you can easily reproduce the results:>.

! nvidia-smi
! nvcc -V
! pip install pennylane cuquantum-cu12 pennylane-lightning-gpu

import pennylane as qml
import torch
import pdb
import math
from sklearn.datasets import make_moons
import numpy as np
import torch.nn as nn


device = torch.device(β€˜cuda’ if torch.cuda.is_available() else β€˜cpu’)


X, y = make_moons(n_samples=200, noise=0.1)

y = y[:, np.newaxis]

data_loader =
list(zip(X, y)), batch_size=2, shuffle=True, drop_last=True

class QuantumLayer(nn.Module):

def __init__(self, n_qubits):
    self.n_qubits = n_qubits
    self.sim_dev = qml.device('lightning.gpu', wires=n_qubits)
    self.show_plot = True

    self.weights = nn.Parameter(torch.rand(2, 2, 3, device = device) * 2 * math.pi)

def QNode(self, inputs, weights):

    @qml.qnode(self.sim_dev, interface = 'torch', diff_method = 'adjoint')
    def qnode(inputs, params):
        qml.templates.AngleEmbedding(inputs, wires=[0, 1])
        qml.templates.StronglyEntanglingLayers(weights, wires=[0, 1])
        return [qml.expval(qml.PauliZ(i)) for i in range(2)]

    return qnode(inputs, weights)

def forward(self, x):
    res = self.QNode(x, self.weights)
    if torch.numel(res[0]) == 1:
        q_out = torch.stack(res).reshape(self.n_qubits, -1).T.float()
    elif torch.numel(res[0]) != 1:
        q_out =, -1).T.float()
    return q_out

q_layer = QuantumLayer(2)
clayer_2 = torch.nn.Linear(2, 1)

layers = [q_layer, clayer_2]
model = torch.nn.Sequential(*layers)

opt = torch.optim.SGD(model.parameters(), lr=0.2)
loss = torch.nn.L1Loss()

epochs = 6

for epoch in range(epochs):
running_loss = 0
for xs, ys in data_loader:

    xs, ys =,


    loss_evaluated = loss(model(xs), ys)


    running_loss += loss_evaluated

The result is the following:

Wed Jan 10 11:46:44 2024
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 63C P8 11W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |

| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| No running processes found |

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Collecting pennylane
Downloading PennyLane-0.34.0-py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 14.1 MB/s eta 0:00:00
Collecting cuquantum-cu12
Downloading cuquantum_cu12-23.10.0-py3-none-manylinux2014_x86_64.whl (7.0 kB)
Collecting pennylane-lightning-gpu
Downloading PennyLane_Lightning_GPU-0.34.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 33.2 MB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from pennylane) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from pennylane) (1.11.4)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from pennylane) (3.2.1)
Collecting rustworkx (from pennylane)
Downloading rustworkx-0.13.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 62.4 MB/s eta 0:00:00
Requirement already satisfied: autograd in /usr/local/lib/python3.10/dist-packages (from pennylane) (1.6.2)
Requirement already satisfied: toml in /usr/local/lib/python3.10/dist-packages (from pennylane) (0.10.2)
Requirement already satisfied: appdirs in /usr/local/lib/python3.10/dist-packages (from pennylane) (1.4.4)
Collecting semantic-version>=2.7 (from pennylane)
Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Collecting autoray>=0.6.1 (from pennylane)
Downloading autoray-0.6.7-py3-none-any.whl (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.9/49.9 kB 5.8 MB/s eta 0:00:00
Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from pennylane) (5.3.2)
Collecting pennylane-lightning>=0.34 (from pennylane)
Downloading PennyLane_Lightning-0.34.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.1/18.1 MB 62.2 MB/s eta 0:00:00
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from pennylane) (2.31.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from pennylane) (4.5.0)
Collecting custatevec-cu12==1.5.0 (from cuquantum-cu12)
Downloading custatevec_cu12-1.5.0-py3-none-manylinux2014_x86_64.whl (38.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.4/38.4 MB 5.3 MB/s eta 0:00:00
Collecting cutensornet-cu12==2.3.0 (from cuquantum-cu12)
Downloading cutensornet_cu12-2.3.0-py3-none-manylinux2014_x86_64.whl (2.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 34.6 MB/s eta 0:00:00
Collecting cutensor-cu12<2,>=1.6.1 (from cutensornet-cu12==2.3.0->cuquantum-cu12)
Downloading cutensor_cu12-1.7.0-py3-none-manylinux2014_x86_64.whl (146.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 146.8/146.8 MB 5.1 MB/s eta 0:00:00
Requirement already satisfied: future>=0.15.2 in /usr/local/lib/python3.10/dist-packages (from autograd->pennylane) (0.18.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->pennylane) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->pennylane) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->pennylane) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->pennylane) (2023.11.17)
Installing collected packages: cutensor-cu12, custatevec-cu12, semantic-version, rustworkx, cutensornet-cu12, autoray, cuquantum-cu12, pennylane-lightning, pennylane, pennylane-lightning-gpu
Successfully installed autoray-0.6.7 cuquantum-cu12-23.10.0 custatevec-cu12-1.5.0 cutensor-cu12-1.7.0 cutensornet-cu12-2.3.0 pennylane-0.34.0 pennylane-lightning-0.34.0 pennylane-lightning-gpu-0.34.0 rustworkx-0.13.2 semantic-version-2.10.0
Name: PennyLane
Version: 0.34.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info: Linux-6.1.58Β±x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.23.5
Scipy version: 1.11.4
Installed devices:

  • default.gaussian (PennyLane-0.34.0)

  • default.mixed (PennyLane-0.34.0)

  • default.qubit (PennyLane-0.34.0)

  • default.qubit.autograd (PennyLane-0.34.0)

  • default.qubit.jax (PennyLane-0.34.0)

  • default.qubit.legacy (PennyLane-0.34.0)

  • (PennyLane-0.34.0)

  • default.qubit.torch (PennyLane-0.34.0)

  • default.qutrit (PennyLane-0.34.0)

  • null.qubit (PennyLane-0.34.0)

  • lightning.gpu (PennyLane-Lightning-GPU-0.34.0)

  • lightning.qubit (PennyLane-Lightning-0.34.0)
    /usr/local/lib/python3.10/dist-packages/torch/ UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at …/torch/csrc/tensor/python_tensor.cpp:451.)
    /usr/local/lib/python3.10/dist-packages/pennylane_lightning/lightning_gpu/ UserWarning: cannot open shared object file: No such file or directory
    warn(str(e), UserWarning)
    /usr/local/lib/python3.10/dist-packages/pennylane_lightning/lightning_gpu/ UserWarning:
    "Pre-compiled binaries for lightning.gpu are not available. Falling back to "
    "using the Python-based default.qubit implementation. To manually compile from "
    "source, follow the instructions at "


My question is that I dont understand why it is not using lightning.gpu. I see there are similar questions posed by other users, but I did not see how those would help this case. Since this is implemented in COLAB, it should easily be reproduced. Thank you for your help!:>

Hi @Daniel_Wang

Can you return this with cuquantum-cu11 or with custatevec-cu11. LightningGPU does not currently support CUDA 12.

Note, that if you have the CUDA 12 SDK installed, you will also need to install the CUDA 11 runtime libraries to support LightningGPUs operation. You can get all of the CUDA 11 required libraries as:

pip install custatevec_cu11 nvidia-cuda-runtime-cu11 nvidia-cusparse-cu11 nvidia-cublas-cu11

Thanks! That solved the problem. But it is just slower than using lightning.qubit which I guess is due to initiating the transfer between CPU and GPU?

Hey @Daniel_Wang, lightning.qubit might outperform lightning.gpu for some problems sizes due to overheads (which is what I think you’re saying at the end of your last post). As you increase num_qubits, things should eventually trade-off in the favour of lightning.gpu (this also depends on the calibre of GPU and CPU that you’re simulating on… @mlxd would be the expert there).

Hope this helps!

Yes, that is what I am thinking about! Thanks

1 Like

Let us know if you have any other issues :slight_smile:

1 Like