Leveraging GPUs without cuQuantum

Hello,

is there some way of leveraging a non-cuQuantum-ready GPU like an RTX3060 to run quantum circuits?

In particular, I am also interested in how much speed up or increase in number of qubits I could achieve by running the circuit on such a GPU vs. a CPU like i5-12400F. At the moment I am using such a CPU with the lightning-kokkos plugin and OpenMPI. Are there some specific configurations or parameters I could adjust to increase the current performance of the last a setup?

Thank you very much in advance.

Hi @pormelrog

This is a very good question, and the answer will depend highly on your workload (circuit width and depth). For a quick demonstration of OpenMP performance with gate applications, LightningKokkos is your best bet initially (as you have found). If your circuit is quite deep, or beyond 20 qubits, running on a GPU is usually best.

If you want to use the ML frameworks directly, you can likely use default.qubit as a device, and load data using their locality methods (see here). Similarly, using the JAX interface allows you to target GPUs through jitting. For small workloads, these will likely be optimal, and especially if you are using backprop for gradients. For larger problems, backprop becomes infeasible due to memory contraints, and LGPU/LKokkos is with adjoint is likely the best option. However, if you don’t want/need gradients, you can set diff_method=None in the QNode, which should save some time in execution.

Now, if you have an RTX3060, you should be able to use LightningGPU with cuQuantum (this GPU is SM8.6, above the SM7.0 threshold of cuQuantum). Explicit install instructions can be found here, which in this case are:

pip install nvidia-cusparse-cu12 nvidia-cublas-cu12 nvidia-cuda-runtime-cu12 custatevec_cu12
pip install pennylane pennylane-lightning-gpu

However, you can also install LightningKokkos with the CUDA backend with a little bit more work:

git clone https://github.com/PennyLaneAI/pennylane-lightning
cd pennylane-lightning
pip install -r ./requirements-dev.txt
PL_BACKEND="lightning_qubit" pip install . # Setup up the Lightning framework
CMAKE_ARGS="-DKokkos_ENABLE_CUDA=ON" PL_BACKEND="lightning_kokkos" pip install .

This should use the same Kokkos-defined kernels, and rebuild them to target your native CUDA architecture directly.

Feel free to let us know if the above steps help. If there are performance considerations you’d like more feedback on, sharing a minimum-working example of your workload would help too.