Pennylane multi-gpu script fails with error even there are enough gpus

QuantumMan · January 28, 2024, 9:17am

Hello! If applicable, put your complete code example down below. Make sure that your code:

is 100% self-contained — someone can copy-paste exactly what is here and run it to
reproduce the behaviour you are observing
includes comments

from mpi4py import MPI
import pennylane as qml
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
dev = qml.device('lightning.gpu', wires=8, mpi=True)
@qml.qnode(dev)
def circuit_mpi():
    qml.PauliX(wires=[0])
    return qml.state()
local_state_vector = circuit_mpi()
#rank 0 will collect the local state vector
state_vector = comm.gather(local_state_vector, root=0)
if rank == 0:
    print(state_vector)

If you want help with diagnosing an error, please put the full error message below:

> nvidia-smi --list-gpus | wc -l
4



srun -n 4 python testmpi2.py 
Traceback (most recent call last):
  File "/global/u1/p/prmantha/PilotQuantumBenchmarks/dist_mem_pennylane_mpi/testmpi2.py", line 5, in <module>
Traceback (most recent call last):
  File "/global/u1/p/prmantha/PilotQuantumBenchmarks/dist_mem_pennylane_mpi/testmpi2.py", line 5, in <module>
Traceback (most recent call last):
  File "/global/u1/p/prmantha/PilotQuantumBenchmarks/dist_mem_pennylane_mpi/testmpi2.py", line 5, in <module>
Traceback (most recent call last):
  File "/global/u1/p/prmantha/PilotQuantumBenchmarks/dist_mem_pennylane_mpi/testmpi2.py", line 5, in <module>
    dev = qml.device('lightning.gpu', wires=8, mpi=True)
    dev = qml.device('lightning.gpu', wires=8, mpi=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane/__init__.py", line 370, in device
    dev = qml.device('lightning.gpu', wires=8, mpi=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane/__init__.py", line 370, in device
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane/__init__.py", line 370, in device
    dev = qml.device('lightning.gpu', wires=8, mpi=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane/__init__.py", line 370, in device
    dev = plugin_device_class(*args, **options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 266, in __init__
    dev = plugin_device_class(*args, **options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 266, in __init__
    dev = plugin_device_class(*args, **options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 266, in __init__
    dev = plugin_device_class(*args, **options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 266, in __init__
    self._mpi_init_helper(self.num_wires)
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 308, in _mpi_init_helper
    self._mpi_init_helper(self.num_wires)
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 308, in _mpi_init_helper
    self._mpi_init_helper(self.num_wires)
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 308, in _mpi_init_helper
    self._mpi_init_helper(self.num_wires)
  File "/global/homes/p/prmantha/.local/lib/python3.11/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 308, in _mpi_init_helper
    raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
    raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
    raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
    raise ValueError(
ValueError: Number of devices should be larger than or equal to the number of processes on each node.
srun: error: nid200325: tasks 1-2: Exited with exit code 1
srun: Terminating StepId=20928774.12
slurmstepd: error: *** STEP 20928774.12 ON nid200325 CANCELLED AT 2024-01-28T09:11:54 ***
srun: error: nid200325: tasks 0,3: Exited with exit code 1

And, finally, make sure to include the versions of your packages. Specifically, show us the output of qml.about().

Name: PennyLane
Version: 0.32.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /global/homes/p/prmantha/.local/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info:           Linux-5.14.21-150400.24.81_12.0.87-cray_shasta_c-x86_64-with-glibc2.31
Python version:          3.11.7
Numpy version:           1.23.5
Scipy version:           1.12.0
Installed devices:
- default.gaussian (PennyLane-0.32.0)
- default.mixed (PennyLane-0.32.0)
- default.qubit (PennyLane-0.32.0)
- default.qubit.autograd (PennyLane-0.32.0)
- default.qubit.jax (PennyLane-0.32.0)
- default.qubit.tf (PennyLane-0.32.0)
- default.qubit.torch (PennyLane-0.32.0)
- default.qutrit (PennyLane-0.32.0)
- null.qubit (PennyLane-0.32.0)
- lightning.qubit (PennyLane-Lightning-0.32.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.32.0)

isaacdevlugt · January 29, 2024, 10:50pm

Hey @QuantumMan,

Thank you for your question! We have forwarded this question to members of our technical team who will be getting back to you within a week. Feel free to post any updates to your question here in the thread in the meantime!

mlxd · February 1, 2024, 4:24pm

Hi @QuantumMan

Just some follow-up questions that may help us identify the issue:

I assume this is running on Perlmutter (or an equivalent system). If so, can you confirm your mpi4py installation is correctly built against and running atop the Cray-MPICH library? You may need to visit the system specific documentation to validate this.
Does your build of lightning.gpu succeed when using the system-provided MPI compiler toolchains?
For a given allocation, are you specifying the number of GPUs matching the required number of processes spawned? In essence, there should be 1 GPU per spawned MPI process.
Is the problem you are composing large-enough to use the MPI pipeline? You can try scaling the number of wires to something like 20?
Assuming everything is built and set as above, are you launching the processes directly with srun, and not mpirun/mpiexec?

If setting things up fresh, I’d also recommend using the PL 0.34 version for better gradient performance.

Let us know the above, and we should be able to help identify the cause.

QuantumMan · February 4, 2024, 7:53am

Thank you for the comments. Here are the instructions I tried https://github.com/pradeepmantha/PilotQuantumBenchmarks/blob/main/dist_mem_pennylane_mpi/DIST_MEM_PENNYLANE_NERSC_SETUP_README.md, multiple times. Either Perlmutter base infra changed or something happened. I still see the same error message. I was able to get it working before with your instructions. But now it stopped working. I tried Qiskit AER Gpu without any issue, on the same machine.

If you don’t mind can you retry the steps above and let me know on Nersc Perlmutter.

“I’d also recommend using the PL 0.34 version for better gradient performance.” - How do I get the latest version, the PL GPU GitHub seems to have only a 0.32.0 tag.

isaacdevlugt · February 13, 2024, 11:30pm

Hey @QuantumMan! Sorry for the delay but we’ll get back to you as soon as we can

Vincent_Michaud-Riou · February 15, 2024, 9:50pm

Hi @QuantumMan , just to be sure, did you request a node and 4 GPUs with SLURM? For example,

salloc --nodes 1 -c 32 --gpus 4 --qos interactive --time 01:00:00 --constraint gpu --account=xxxxx

Without --gpus 4, you might see the GPUs with nvidia-smi but won’t be able to use them.

QuantumMan · February 16, 2024, 7:18am

Hi,

Yes, its on slurm and I tried… torch to see number of gpus available on the compute node… it seems 4 gpus are available and i am using following slurm command

salloc -N 1 --qos interactive --time 02:00:00 --gpus=4 --ntasks-per-node=4 --gpus-per-task=1 --constraint gpu --account=

~/pennylane-lightning> python

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux

Type “help”, “copyright”, “credits” or “license” for more information.

import torch

num_of_gpus = torch.cuda.device_count();

print(num_of_gpus);

4

QuantumMan · February 18, 2024, 1:42am

Finally figured out the problem

I need to run the salloc with first with

salloc -N 1 --qos interactive --time 02:00:00 --gpus=4 --constraint gpu --account=

So all the 4 gpus are visible,

and then i run the script once the node is allocated.

srun -n 4 python dist.py

Tom_Bromley · February 20, 2024, 5:55pm

Great that you worked it out @QuantumMan! Let us know if we can help with anything else.

mlxd · February 23, 2024, 10:08pm

Hi @QuantumMan , glad this was solved. Just to let you know, with the next release of lightning.gpu we are migrating from CUDA 11 to CUDA 12, as this brings additional benefits for the stack. The newest CrayPE (23.12) defaults to working with Cuda 12.2, and Cray-MPICH 8.1.28 natively expects this. As such, the following instructions will help you get started with the latest Cray environment and libraries once the release is public. The following snippet will allow you to explore this from the current pre-released package, and we will be making this more widely available for use on Perlmutter in the coming weeks:

module load PrgEnv-gnu cray-mpich cudatoolkit craype-accel-nvidia80 python

# Set up your python environment, preferably on the SCRATCH partition
python -m venv lgpu_env && source lgpu_env/bin/activate

# Build mpi4py 
MPICC="cc -shared" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py

# Clone PennyLane Lightning and set-up your Python virtualenv
git clone https://github.com/PennyLaneAI/pennylane-lightning
cd pennylane-lightning && git checkout latest_release
python -m pip install -r requirements-dev.txt

# Build and install Lightning-Qubit
CXX=$(which CC) python -m pip install -e . --verbose

# Build and install Lightning-GPU+MPI using the CrayPE toolchain, Cray-MPICH and CUDA 12
CXX=$(which CC) PL_BACKEND="lightning_gpu" CMAKE_ARGS="-DENABLE_MPI=on -DCMAKE_CXX_COMPILER=$(which CC)" python -m pip install -e . --verbose

# Ensure the CRAY library paths are used for the MPI enviroment to be visible by the NVIDIA cuQuantum libraries
# https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=a00113984en_us&page=Modify_Linking_Behavior_to_Use_Non-default_Libraries.html
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH

# Follow NERSCs recommendations for using GPU-aware MPI
# https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#using-mpi4py-with-gpu-aware-cray-mpich
export MPICH_GPU_SUPPORT_ENABLED=1

# Allocate your nodes/GPUs, and ensure each process can see all others on each respective local node.
salloc -N 2 -c 32 --qos interactive --time 0:30:00 --constraint gpu --ntasks-per-node=4 --gpus-per-task=1 --gpu-bind=none --account=<Account ID>

# Run you workload, using a power-of-2 number of processes (and GPUs)
srun -n 8 python ./my_workload.py

Feel free to reach out if there are any questions.

Martin_Guo · January 5, 2025, 8:52pm

Hi @mlxd,

Thanks for sharing step-by-step setup instructions for Pennylane lightning gpu.
But we notice pyenv might not be the case for large batch job submissions.
NERSC supports shifter and podman-hpc container mode.
May I ask if it is possible to deploy pennylane-gpu in the container with mpi support? I checked the latest pennylane-gpu docker does not support mpi.

CatalinaAlbornoz · January 11, 2025, 1:43am

Hi @Martin_Guo ,

You’re right that our current docker image doesn’t support MPI. I’ve forwarded your message to our team to see what we can do here. Thanks for bringing it up. I’ll keep you updated on this.

CatalinaAlbornoz · January 21, 2025, 2:44pm

Hi @Martin_Guo ,

Unfortunately we don’t have a solution that will work for you yet. Below I’m sharing a Dockerfile that will work for other systems. However Perlmutter uses MPICH as the compatibility layer with Cray’s MPI lib, so the solution won’t work in your case.

I hope the info below helps others though.

Here’s the dockerfile to install LGPU+MPI based on the cuquantum image:

FROM nvcr.io/nvidia/cuquantum-appliance:24.08-x86_64 as base

ENV DEBIAN_FRONTEND=noninteractive
ENV PATH=/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

RUN apt-get update && apt-get install -y build-essential openmpi-bin libopenmpi-dev

RUN git clone https://github.com/PennyLaneAI/pennylane-lightning.git && \
cd pennylane-lightning && \
pip install -r requirements.txt --upgrade && \
python -m pip install . && \
PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py && \
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . -vv

Note that this is based on NVIDIA’s cuquantum image, and subject to their licensing for that image.

CatalinaAlbornoz · January 22, 2025, 4:04pm

Hi @Martin_Guo,

My colleague Ali created these instructions which may help you. Let me know if they work for you.

Use Shifter or Podman-HPU containers (Docker is not installed on the box) to create an CUDA-aware MPICH-aware image. You may be able to use nvcr.io/nvidia/cuquantum-appliance:24.08-x86_64 as the base image.
To install LGPU MultiGPU, you need the following instructions inside the CUDA-aware MPICH-aware image:


RUN git clone https://github.com/PennyLaneAI/pennylane-lightning.git && \ 
cd pennylane-lightning && \ 
pip install -r requirements.txt && \ 
python -m pip install . && \ 
PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py && \ 
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . -vv

To run LGPU MultiGPU, you need to setup the following env variables: (you should set these variables outside the container where they execute the SLURM jobs)

# Check the docs for more info https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/#cuda-aware-mpi 
export MPICH_GPU_SUPPORT_ENABLED=1 
# Check the LGPU docs for more info https://docs.pennylane.ai/projects/lightning/en/stable/lightning_gpu/installation.html#install-lightning-gpu-with-mpi 
export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib:$LD_LIBRARY_PATH

Martin_Guo · February 9, 2025, 11:33pm

Hi, sry for the late reply.
I am able to use shifter image testing the multi nodes scaling.

However, NERSC might retire the shifter this year. Instead, they use podman-hpc for customized image running.
Here I produce the mpich in the podman file:

# Install MPICH
WORKDIR /tmp
ARG MPICH_VERSION=4.2.2
ARG MPICH_PREFIX=mpich-$MPICH_VERSION

RUN wget https://www.mpich.org/static/downloads/$MPICH_VERSION/$MPICH_PREFIX.tar.gz
RUN tar xvzf $MPICH_PREFIX.tar.gz
RUN cd $MPICH_PREFIX && \
./configure && \
make -j 8 && \
make install && \
make clean && \
cd .. && \
rm -rf $MPICH_PREFIX

RUN /sbin/ldconfig

Do you think it is possible to run the image with mpi plugin:

salloc -N 4 -t 20 -C gpu -q debug # nid[004384,007065-007067]
srun -n 4 podman-hpc run --gpu --rm --mpi pennylane:image python3 -m mpi4py.bench helloworld

CatalinaAlbornoz · February 11, 2025, 8:00pm

Hi @Martin_Guo ,

Shifter and podman should produce the same functionality. We don’t know for sure whether it will work but you can try it out.

If you do try it, please let us know if it worked for you!

QuantumMan · April 6, 2025, 5:25am

Hi @Martin_Guo - can you share the Dockerfile for running MPI application across GPUS. thanks. I tried multiple things with old versions and pennylane 0.40.0 , but didn’t work anything. so docker seems the best way to do it.

QuantumMan · April 6, 2025, 5:29am



# Install MPICH
WORKDIR /tmp
ARG MPICH_VERSION=4.2.2
ARG MPICH_PREFIX=mpich-$MPICH_VERSION

RUN wget https://www.mpich.org/static/downloads/$MPICH_VERSION/$MPICH_PREFIX.tar.gz
RUN tar xvzf $MPICH_PREFIX.tar.gz
RUN cd $MPICH_PREFIX && \
./configure && \
make -j 8 && \
make install && \
make clean && \
cd .. && \
rm -rf $MPICH_PREFIX

RUN /sbin/ldconfig

RUN git clone https://github.com/PennyLaneAI/pennylane-lightning.git && \ 
cd pennylane-lightning && \ 
pip install -r requirements.txt && \ 
python -m pip install . && \ 
PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py && \ 
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . -vv

# Check the docs for more info https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/#cuda-aware-mpi 
export MPICH_GPU_SUPPORT_ENABLED=1 
# Check the LGPU docs for more info https://docs.pennylane.ai/projects/lightning/en/stable/lightning_gpu/installation.html#install-lightning-gpu-with-mpi 
export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib:$LD_LIBRARY_PATH``` 

Does this look good?

CatalinaAlbornoz · April 7, 2025, 10:34pm

Hi @QuantumMan ,

What system are you using? Is it Perlmutter or a different system?

Have you tried using the instructions I shared on Jan 21st and 22nd?

If you’re running into issues with them, can you please share the full error you get with each set of instructions?

QuantumMan · April 8, 2025, 5:57am

Yes its perlmutter. I tried the above and see some result

i was able to run podman images etc..The image has mpich installed, so container mpi works

podman-hpc images 
REPOSITORY TAG IMAGE ID CREATED SIZE R/O 
localhost/pennylane-xanadu 1.0 6a4840029fc9 About an hour ago 16.6 GB true

podman-hpc run --rm --entrypoint= --gpu pennylane-xanadu:1.0 python /home/cuquantum/dist_mem_jacobian.py 

num_gpus: 1 wires: 16 layers 2 time: 0.2128275703289546

where as when i pass --mpi/–cuda-mpi flag, then it fails

podman-hpc run --rm --mpi --entrypoint= --gpu pennylane-xanadu:1.0 python /home/cuquantum/dist_mem_jacobian.py 


Traceback (most recent call last): File "/home/cuquantum/dist_mem_jacobian.py", line 1, in <module> from mpi4py import MPI ImportError: libmpi.so.40: cannot open shared object file: No such file or directory

[10:54](https://nerscusers.slack.com/archives/C9KLJC9AQ/p1744091676003449?thread_ts=1743276581.282679&cid=C9KLJC9AQ)

srun -n 2 podman-hpc run --rm --cuda-mpi --entrypoint= --gpu pennylane-xanadu:1.0 python /home/cuquantum/dist_mem_jacobian.py 

Error: no container with ID bf7e0fe0e2b50c047e03cedb2946340c8969ac090297eb4f62c905e95a95c225
 found in database: no such container srun: error: nid200340: task 0: Exited with exit code 255 srun: Terminating StepId=37558241.1 srun: error: nid200341: task 1: Terminated srun: Force Terminated StepId=37558241.1

Topic		Replies	Views
Lightning-gpu failing on multi-node/multi gpus PennyLane Help	23	1246	November 29, 2023
Pennylane lightning.gpu error PennyLane Help	2	181	May 22, 2024
Pennylane Lightning GPU not working PennyLane Help	3	468	August 1, 2023
Pennylane-Lightning GPU PennyLane Lightning	15	1887	July 20, 2023
Error in PennyLane Lightning: an illegal memory access was encountered PennyLane Help	5	29	March 19, 2025

Pennylane multi-gpu script fails with error even there are enough gpus

Related topics