cuda_quantum compiler not found: 'PackageNotFoundError: No package metadata was found for cuda_quantum'

Hello there,

I was trying to use the cuda_quantum compiler in combination with jax qjit. Unfortunately, pennnylane can’t find the cuda_quantum package even though it is installed via ‘pip install cudaq’ (cuda_quantum does not exist anymore as fare as I can say). The issue occures first when while defining a device that supports the cuda_quantum compiler. A minimal example is shown below.
Does a workaround exist?

Example:

import pennylane as qml
dev = qml.device(“nvidia.custatevec”, wires=2)

Error msg:


StopIteration Traceback (most recent call last)
File /opt/conda/lib/python3.11/importlib/metadata/init.py:563, in Distribution.from_name(cls, name)
562 try:
→ 563 return next(cls.discover(name=name))
564 except StopIteration:

StopIteration:

During handling of the above exception, another exception occurred:

PackageNotFoundError Traceback (most recent call last)
Cell In[1], line 2
1 import pennylane as qml
----> 2 dev = qml.device(“nvidia.custatevec”, wires=2)

File /opt/conda/lib/python3.11/site-packages/pennylane/devices/device_constructor.py:266, in device(name, *args, **kwargs)
260 raise qml.DeviceError(
261 f"The {name} plugin requires PennyLane versions {required_versions}, "
262 f"however PennyLane version {qml.version()} is installed."
263 )
265 # Construct the device
→ 266 dev = plugin_device_class(*args, **options)
268 # Once the device is constructed, we set its custom expansion function if
269 # any custom decompositions were specified.
270 if custom_decomps is not None:

File /opt/conda/lib/python3.11/site-packages/catalyst/third_party/cuda/init.py:223, in NvidiaCuStateVec.init(self, shots, wires, multi_gpu)
221 def init(self, shots=None, wires=None, multi_gpu=False): # pragma: no cover
222 self.multi_gpu = multi_gpu
→ 223 super().init(wires=wires, shots=shots)

File /opt/conda/lib/python3.11/site-packages/catalyst/third_party/cuda/init.py:137, in BaseCudaInstructionSet.init(self, shots, wires)
136 def init(self, shots=None, wires=None):
→ 137 _check_version_compatibility()
138 super().init(wires=wires, shots=shots)

File /opt/conda/lib/python3.11/site-packages/catalyst/third_party/cuda/init.py:26, in _check_version_compatibility()
25 def _check_version_compatibility():
—> 26 installed_version = version(“cuda_quantum”)
27 compatible_version = “0.6.0”
28 if installed_version != compatible_version:

File /opt/conda/lib/python3.11/importlib/metadata/init.py:1009, in version(distribution_name)
1002 def version(distribution_name):
1003 “”“Get the version string for the named package.
1004
1005 :param distribution_name: The name of the distribution package to query.
1006 :return: The version string for the package as defined in the package’s
1007 “Version” metadata key.
1008 “””
→ 1009 return distribution(distribution_name).version

File /opt/conda/lib/python3.11/importlib/metadata/init.py:982, in distribution(distribution_name)
976 def distribution(distribution_name):
977 “”“Get the Distribution instance for the named package.
978
979 :param distribution_name: The name of the distribution package as a string.
980 :return: A Distribution instance (or subclass thereof).
981 “””
→ 982 return Distribution.from_name(distribution_name)

File /opt/conda/lib/python3.11/importlib/metadata/init.py:565, in Distribution.from_name(cls, name)
563 return next(cls.discover(name=name))
564 except StopIteration:
→ 565 raise PackageNotFoundError(name)

PackageNotFoundError: No package metadata was found for cuda_quantum

qml.about():

Name: PennyLane
Version: 0.41.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Built by researchers, for research.
Author:
Author-email:
License: Apache License 2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing-extensions
Required-by: PennyLane-Catalyst, PennyLane_Lightning, PennyLane_Lightning_GPU, PennyLane_Lightning_Kokkos
Platform info: Linux-5.15.0-134-generic-x86_64-with-glibc2.35
Python version: 3.11.10
Numpy version: 2.2.6
Scipy version: 1.16.0
Installed devices:

  • lightning.kokkos (PennyLane_Lightning_Kokkos-0.41.1)
  • default.clifford (PennyLane-0.41.1)
  • default.gaussian (PennyLane-0.41.1)
  • default.mixed (PennyLane-0.41.1)
  • default.qubit (PennyLane-0.41.1)
  • default.qutrit (PennyLane-0.41.1)
  • default.qutrit.mixed (PennyLane-0.41.1)
  • default.tensor (PennyLane-0.41.1)
  • null.qubit (PennyLane-0.41.1)
  • reference.qubit (PennyLane-0.41.1)
  • lightning.qubit (PennyLane_Lightning-0.41.1)
  • nvidia.custatevec (PennyLane-Catalyst-0.11.0)
  • nvidia.cutensornet (PennyLane-Catalyst-0.11.0)
  • oqc.cloud (PennyLane-Catalyst-0.11.0)
  • softwareq.qpp (PennyLane-Catalyst-0.11.0)
  • lightning.gpu (PennyLane_Lightning_GPU-0.41.1)

Hi @minn-bj, thank you for bringing up this issue. The cuda-quantum support in PennyLane is limited to version 0.6.0 of the package, it will not work with cudaq. You can download install it from here: cuda-quantum·PyPI

I don’t know of any current plans to update the support to newer versions.

Hey David_Ittah, thank you for your quick responds. I just read that cuda_quantum does potentially not support gradient computations. Is that (still) true?

Is it possible to accelerate a ML training routine by compiling it on a GPU with finite difference diff-method?

Best regards.

Hi @minn-bj

If you are looking to execute quantum circuits on GPUs with gradients, the lightning.gpu device should be quite efficient — it supports adjoint differentiation natively and runs on the cuStateVec library, which should be efficient on Nvidia GPUs.

Additionally, you can also use the lightning.kokkos device with the CUDA backend. This requires some manual compilation (see Lightning-Kokkos installation — Lightning 0.42.0 documentation), but should also support efficient gradient evaluations, and can run on any GPU device supported by Kokkos documentation

If you have trouble using either of these devices, feel free to share a workload and we can take a look.

2 Likes

Hello @mlxd,

thank you for your comment. I already considered this but the lightning.gpu method is for intermediate sized circuits 10-16 qubits much slower than e.g. the GPU usage with the default.qubit in jax or pytorch. My goal is to train a hybrid QML model on a quite large dataset. Each batch optimization (batch_size ~1000) needs to be faster than ~ 1 sec. In the case of a circuit with 12~16 Qubits, the Lightning GPU device seams to be significantly slower than other simulators. Therefore I thought to try the cuda_quantum compiler. Is cuda_quantum the right choice for my goals and do you have any further recommendations to speed up my calculations (CPU and/or GPU)?

Best regards,

minn-bj

Hi @minn-bj

Thanks for the context. Yes, as lightning.gpu is more suited for larger HPC workloads, we see performance beating other approaches beyond the 18-20 qubits barrier. This was observed to be the case with Nvidia’s custatevec, the CUDA library we use with lightning.gpu, so I’m not sure if using cuda_quantum will help here, as the limitation may persist.

I’m not sure how your workload is configured, but there may be some gains to be had using catalyst.accelerate — Catalyst 0.13.0-dev11 documentation

This allows catalyst to offload computation to the GPU, if supported, for specific function uses.
While this is no guarantee, it could be possible yield somewhat improved performance using the lightning.kokkos CUDA backend, but I suspect this will run into the same issues as custatevec, given the smaller size of the problem.

For CPU scaling, you can try the lightning.kokkos backend, which will use OpenMP by default for all gate and measurement processes. If you wish to try lightning.qubit with OpenMP, my responses here should help.

Feel free to let us know if the above help, or if there’s any minimum example workload we can see to offer more suggestions.