Pennylane Multi-gpu/multi-node vs Qiskit AER GPU setup

Qiskit AER GPU setup is much simpler than Pennylane GPU on Nersc Perlmutter machines.
I just installed one package and Qiskit AER multi-gpu works without any additional setup.
Where as pennylane needs - Distributing quantum simulations using lightning.gpu with NVIDIA cuQuantum | PennyLane Blog

Any idea how this can be simplied in pennylane, or is it already addressed in future releases?

Hi @QuantumMan

Thanks for your feedback. I do understand that setting up HPC environment software packages is frustrating, though we aim to make this as painless as possible. Just a note, the official LightningGPU documentation and installation instructions for MPI can be found at Lightning-GPU installation — Lightning 0.34.0 documentation
The original LightningGPU repository was fully migrated into the GitHub - PennyLaneAI/pennylane-lightning: The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane as of the v0.33 release, so all packages and instructions are now housed there. We have decided to fully archive that site, so you should see a warning and read-only marking now when visiting GitHub - PennyLaneAI/pennylane-lightning-gpu: GPU enabled Lightning simulator for accelerated circuit simulation. See https://github.com/PennyLaneAI/pennylane-lightning for all future development of this project. so thanks for prompting us to do this.

Note that Perlmutter very recently updated their default system environment and toolkits, so if you are using the older cached install instructions from your other forum post, they may not explicitly work. We are migrating LightningGPU right now to work with natively with CUDA 12, but for an active Perlmutter CUDA 11 environment you should be able to get a working version by ensuring the available libraries and toolkits are not defaulting to CUDA 12 variants.

Additionally, the Qiskit AER GPU support you reference is not using NVIDIA’s cuQuantum libraries, and builds the Qiskit-provided CUDA kernels with the CUDA version available. Given the backend observable and measurement process supports we added through LightningGPU and cuQuantum, I also expect a different feature set and performance between both packages.

Unfortunately, as requested in your other post, we don’t have the availability to validate Perlmutter’s latest changes right at this moment, but will be happy to provide input if you run into difficulty.

Thank you for the response and detailed information and I will try the 0.34.0v. One feedback about the usability of multi-GPU/multi-node GPUs is probably to have two variants, where users can build if they want to use the advanced features/performance improvements for real-world low latency cases, might make sense to use Nvidia CUQuantum. But users who are ok with using multi-gpu but not so sensitive to performance - should have the flexibility to install things easily and move on (just like qiskit cuda kernels with cuda version available.) I wonder if this would be a problem again with the new version of cuda whenever that happens.

Currently, it depends on underlying infra, and getting multiple things aligned to get the response, seems like affecting users’ productivity.

@mlxd - I tried it again
This is my cuda version of perlmutter…

Tried to use all old versions

module load PrgEnv-gnu/8.3.3 cray-mpich/8.1.25 cudatoolkit/11.7 craype-accel-nvidia80 evp-patch gcc/11.2.0
export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib/:$LD_LIBRARY_PATH
module load python
export CUQUANTUM_SDK=/global/homes/p/prmantha/.local/perlmutter/python-3.11/lib/python3.11/site-packages/cuquantum


```(nersc-python)@nid200329:~/pennylane-lightning> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0```

Successfully installed Pennylane-lightning 0.34.0 with MPI/GPU Support. But it still fails with the same problem

```File "/global/u1/p/prmantha/pennylane-lightning/pennylane_lightning/lightning_gpu/lightning_gpu.py", line 303, in _mpi_init_helper
    raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
    raise ValueError(
    raise ValueError(

I used - srun -n 4 python testdistmem.py
and used salloc -N 1 --qos interactive --time 02:00:00 --ntasks-per-node=4 --gpus-per-task=1 --constraint gpu --account= to launch the cluster.

Hey @QuantumMan! Sorry for the delay but we’ll get back to you as soon as we can :slight_smile:

Morning @QuantumMan , based on your comment I was wondering if this issue was resolved too? It appears you need to specify the number of GPUs with --gpus=4 while --gpus-per-task=1 alone is causing some GPU visibility issues, right?