Note that Perlmutter very recently updated their default system environment and toolkits, so if you are using the older cached install instructions from your other forum post, they may not explicitly work. We are migrating LightningGPU right now to work with natively with CUDA 12, but for an active Perlmutter CUDA 11 environment you should be able to get a working version by ensuring the available libraries and toolkits are not defaulting to CUDA 12 variants.
Additionally, the Qiskit AER GPU support you reference is not using NVIDIA’s cuQuantum libraries, and builds the Qiskit-provided CUDA kernels with the CUDA version available. Given the backend observable and measurement process supports we added through LightningGPU and cuQuantum, I also expect a different feature set and performance between both packages.
Unfortunately, as requested in your other post, we don’t have the availability to validate Perlmutter’s latest changes right at this moment, but will be happy to provide input if you run into difficulty.
Thank you for the response and detailed information and I will try the 0.34.0v. One feedback about the usability of multi-GPU/multi-node GPUs is probably to have two variants, where users can build if they want to use the advanced features/performance improvements for real-world low latency cases, might make sense to use Nvidia CUQuantum. But users who are ok with using multi-gpu but not so sensitive to performance - should have the flexibility to install things easily and move on (just like qiskit cuda kernels with cuda version available.) I wonder if this would be a problem again with the new version of cuda whenever that happens.
Currently, it depends on underlying infra, and getting multiple things aligned to get the response, seems like affecting users’ productivity.
```(nersc-python)@nid200329:~/pennylane-lightning> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0```
Successfully installed Pennylane-lightning 0.34.0 with MPI/GPU Support. But it still fails with the same problem
```File "/global/u1/p/prmantha/pennylane-lightning/pennylane_lightning/lightning_gpu/lightning_gpu.py", line 303, in _mpi_init_helper
raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
raise ValueError(
raise ValueError(
I used - srun -n 4 python testdistmem.py
and used salloc -N 1 --qos interactive --time 02:00:00 --ntasks-per-node=4 --gpus-per-task=1 --constraint gpu --account= to launch the cluster.
Morning @QuantumMan , based on your comment I was wondering if this issue was resolved too? It appears you need to specify the number of GPUs with --gpus=4 while --gpus-per-task=1 alone is causing some GPU visibility issues, right?