pennylane-lightning-GPU 0.35 on cuQuantum Appliance 23.10

I am looking to install pennylane-lightning-GPU on a cuQuantum Appliance 23.10 Docker instance, but am not able to do so, because the CUDA toolkit on the Docker image is 11, and Lightning-GPU requires :

 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

When running a code

/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:72
: UserWarning: libcudart.so.12: cannot open shared object file: No such file or directory
  warn(str(e), UserWarning)
/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:10
14: UserWarning:
                "Pre-compiled binaries for lightning.gpu are not available. Falling back to "
                "using the Python-based default.qubit implementation. To manually compile from "
                "source, follow the instructions at "
                "https://pennylane-lightning.readthedocs.io/en/latest/installation.html.",

I wonder if there is a workaround for this?

This is the link to the NVIDIA cuQuantum Appliance.

Hi @rht

It seems the cuQuantum appliance is still using CUDA 11.
In this case, you should be able to install the required CUDA 12 runtime libraries until the image is updated using something like:

python -m pip install custatevec_cu12 nvidia-cuda-runtime-cu12 nvidia-cusparse-cu12 nvidia-cublas-cu12

When working with a Python virtualenv, LightningGPU should be able to use these libraries to replace the container-provided CUDA libraries, assuming your CUDA driver is recent enough to support the libraries. Feel free to try this out and let us know.

I see, installing the cu12-specific pip packages did work for pennylane-lightning[gpu], thank you.

Given that the package name is custatevec_cu12, I assume that this is not going to collide with the existing custatevec system that is still on CUDA 11? For the other use case, I do need the CUDA 11 version to run Cirq circuits (we a cloud provider after all), and it can’t be a simple pip install -U cirq-core qsimcirq, because the appliance had their own custom version of Cirq/qsimcirq.

I suppose it’s fine to have a collision, as long as in the particular Docker instance, the Pennylane code works. This is the code that I am testing (I want to measure the performance speedup of device_vjp=True).

However, the code errored out

elapsed prepare new 0.0024683475494384766
terminate called after throwing an instance of 'Pennylane::Util::LightningException'
  what():  [/project/pennylane_lightning/core/src/simulators/lightning_gpu/utils/DataBuffer.h
pp][Line:133][Method:~DataBuffer]: Error in PennyLane Lightning: an illegal memory access was
 encountered
Aborted (core dumped)

This is after upgrading the CUDA toolkit from 11.8 to 12.4, following the instruction in CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer.

import time

import pennylane as qml
from pennylane import numpy as np

num_wires = 15

np.random.seed(0)
complex_type = np.complex64
real_type = np.float32
dev_gpu = qml.device("lightning.gpu", wires=num_wires, c_dtype=complex_type)

params = np.random.rand(2 * num_wires, requires_grad=True).astype(real_type)
target_probs = np.random.rand(2**num_wires).astype(real_type)
target_probs /= np.sum(target_probs)

dev_gpu_for_eval = qml.device("lightning.gpu", wires=num_wires, c_dtype=complex_type)


def get_prob(x):
    return np.abs(x) ** 2


@qml.qnode(dev_gpu_for_eval, diff_method="finite-diff")
def circuit_gpu(params):
    # define your quantum circuit here
    _ = [qml.RX(params[i], wires=i) for i in range(num_wires)]
    _ += [qml.CZ(wires=(i, i + 1)) for i in range(num_wires - 1)]
    _ += [qml.RX(params[i + num_wires], wires=i) for i in range(num_wires)]
    return qml.state()


def adjoint_l2_another(params):
    from scipy.sparse import diags

    wires = range(num_wires)
    ket = circuit_gpu(params)
    tic = time.time()
    obs = -2 * (target_probs - get_prob(ket))
    hmat = diags(obs, format="csr")
    h = qml.SparseHamiltonian(hmat, wires)
    print("elapsed prepare new", time.time() - tic)

    @qml.qnode(dev_gpu, diff_method="adjoint", device_vjp=True)
    def c(params):
        _ = [qml.RX(params[i], wires=i) for i in range(num_wires)]
        _ += [qml.CZ(wires=(i, i + 1)) for i in range(num_wires - 1)]
        _ += [qml.RX(params[i + num_wires], wires=i) for i in range(num_wires)]
        return qml.expval(h)

    tic = time.time()
    out = qml.grad(c)(params)
    print("Elapsed new grad only", time.time() - tic)
    return out


gradient = adjoint_l2_another(params)
print("first last", gradient[0], gradient[-1])

I have ensured that I am on CUDA 12.4, using the answer of this SO question. But with CUDA 11.8 still not yet uninstalled, just unselected.

Hey @rh

Just for curiosity, can you try running your example with np.complex128 and np.float64 to see if the crash happens in the same way. If so, I think the appliance being too old a version is likely the cause of the memory issues — some combination of libraries, driver, and docker runtime engine (which may not be CUDA 12 supported?) could be at play here, in which case I think our best bet here is to request an update to the container from NVIDIA. If the alignment changes causes it to run, then that’s something we can rectify locally.

In the meantime, with LGPU everything can be isolated in the virtualenv, so you can still run cirq with the system-provided versions, and as long as the cirq and PennyLane envs are separate they should not conflict.

In the same we that we can install the CUDA 12 libraries through pip, CUDA 11 versions can also be installed, so you shouldn’t need to adjust the container CUDA version. Since most of the libraries are runtime use only, having them pip installed with a working CUDA driver is usually enough to keep things working. That said, I don’t know if cirq respects RPATHs for pip-installed CUDA libraries (this seems to be the way TensorFlow, PyTorch and others handle such deps these days), so success may be limited with that.

As a final fallback, building a CUDA 11 version of LGPU is something that can be done, assuming the container has a compiler installed. Since we needed to migrate to CUDA 12 for newer functionality and support for newer systems, this may not be something that will work forever, but you should be able to build and install a version of LGPU in place with CUDA 11 provided nvcc as:

python -m pip install pennylane
git clone git@github.com:PennyLaneAI/pennylane-lightning.git 
cd pennylane-lightning && git checkout latest_release
python -m pip install cmake ninja custatevec_cu11
PL_BACKEND="lightning_gpu" python -m pip install . --verbose

which should install it into your local environment. Though, note that since we have deprecated CUDA 11 support, this may not work as intended, in which case requesting an update to the container itself may be our best option.

Let us know how this goes, and we can figure out if anything else is needed.

Just for curiosity, can you try running your example with np.complex128 and np.float64 to see if the crash happens in the same way.

I tested with this, and encountered no error. Although device_vjp=True had no effect, because vjp for GPU is not ready yet.

In the meantime, with LGPU everything can be isolated in the virtualenv, so you can still run cirq with the system-provided versions, and as long as the cirq and PennyLane envs are separate they should not conflict.

I think we will go with this option, thank you.

1 Like

Thanks for the info @rht

Device VJP support is currently in the master branch for LightningQubit, but not yet complete for LightningGPU, so setting that to false for now will be needed.

I have also tested your example locally (CUDA 12.3) with FP32/Complex64 and with the device VJP off everything runs fine, using the 0.35.x releases of PennyLane and LightningGPU, so I suspect the issue here is somewhere in the toolchain itself.

Feel free to reach out if there are further things you encounter, and we can try to mitigate them.