Pennylane Multi-GPU support

Hi, I want to understand how the multi-GPU support works in PennyLane v0.31. I read the blog about the new Pennylane v0.31 and the multi-GPU support. Can anyone tell me what new libraries I must download and import for this support?
So I am currently using the NVIDIA DGX A100 GPU and CudaToolkit 11.7.0

The circuit I want to use is:

wires=4
dev4 = qml.device('lightning.gpu', wires=wires )
@qml.qnode(dev4)
def CONVCircuit(phi, wires, i=0):
    """
    quantum convolution Node
    """
    # parameter
    theta = np.pi / 2
    qml.Rot(phi[0]*2*np.pi/255,phi[1]*2*np.pi/255,phi[2]*2*np.pi/255, wires=0)
    qml.Rot(phi[3]*2*np.pi/255,phi[4]*2*np.pi/255,phi[5]*2*np.pi/255, wires=1)
    qml.Rot(phi[6]*2*np.pi/255,phi[7]*2*np.pi/255,phi[8]*2*np.pi/255, wires=2)
    qml.Rot(phi[9]*2*np.pi/255,phi[10]*2*np.pi/255,phi[11]*2*np.pi/255, wires=3)

    qml.RX(np.pi, wires=0)
    qml.RX(np.pi, wires=1)
    qml.RX(np.pi, wires=2)
    qml.RX(np.pi, wires=3)

    qml.CRZ(theta, wires=[1, 0])
    qml.CRZ(theta, wires=[3, 2])
    qml.CRX(theta, wires=[1, 0])
    qml.CRX(theta, wires=[3, 2])
    qml.CRZ(theta, wires=[2, 0])
    qml.CRX(theta, wires=[2, 0])

    # Expectation value
    measurement = qml.expval(qml.PauliZ(wires=0))

    return measurement

Please tell me what extra lines I must write in the above code for its multi-GPU support. I use Jupyter Notebook; the codes are in .ipynb file format.

The output of qml.about():

Name: PennyLane
Version: 0.31.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /dgxb_home/se21pphy004/miniconda3/envs/myenv/lib/python3.8/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info:           Linux-5.4.0-144-generic-x86_64-with-glibc2.17
Python version:          3.8.17
Numpy version:           1.24.3
Scipy version:           1.10.0
Installed devices:
- default.gaussian (PennyLane-0.31.0)
- default.mixed (PennyLane-0.31.0)
- default.qubit (PennyLane-0.31.0)
- default.qubit.autograd (PennyLane-0.31.0)
- default.qubit.jax (PennyLane-0.31.0)
- default.qubit.tf (PennyLane-0.31.0)
- default.qubit.torch (PennyLane-0.31.0)
- default.qutrit (PennyLane-0.31.0)
- null.qubit (PennyLane-0.31.0)
- lightning.qubit (PennyLane-Lightning-0.31.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.31.0)

Hello @mass_of_15 !

Would you mind giving me a bit more context?

Meanwhile, I strongly suggest taking a look at the Lightning documentation. You will probably find your answers there. :slight_smile:

Besides, I also recommend taking a look at this post discussion.

Does it help? :slight_smile:

So the above code is used for quantum image processing. Can this circuit be used on multiple GPUs for faster image processing where multiple images are processed on multiple GPUs using the same circuit, thereby reducing the runtime?

@mass_of_15,

A recent nVIDIA post on LinkedIn that might help. Best regards.

Thank you for the link. The above link is for another library, “CUDA Quantum 0.4”.
However, can I use Pennylane v0.31 for circuits that can be used on multiple GPUs for faster image processing, where multiple images are processed on multiple GPUs using the same circuit, thereby reducing the runtime?

Hi @mass_of_15

The new lightning.gpu support is for distributed execution to enable larger system sizes to be explored. In this case, for circuits that do not fit onto 1 GPU, we can through MPI use more than 1 GPU to store the statevector. This design will not improve performance for problems like you are investigating, primarily as there is no advantage to distributing the problem like this using the cuQuantum MPI distribution mechanism.

For your problem, given the small numbers of qubits, I’d suggest using jax.jit through default.qubit and its support for CUDA backends. LightningGPU is built primarily as a HPC-focused simulator, and works best for circuits beyond 20 qubits in register size, and with high depth.

Hope this helps.

2 Likes

Hi @CatalinaAlbornoz,

I’m testing this code:

from mpi4py import MPI

import pennylane as qml

dev = qml.device(‘lightning.gpu’, wires=8, mpi=True)

@qml.qnode(dev)

def circuit_mpi():

qml.PauliX(wires=[0])

return qml.state()

local_state_vector = circuit_mpi()

I got this error:

[b96eadb2eff6:426813] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.b96eadb2eff6.1000/jf.0/602603520/shared_mem_cuda_pool.b96eadb2eff6 could be created.

[b96eadb2eff6:426813] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728


ImportError Traceback (most recent call last)

Cell In[4], line 3

1 from mpi4py import MPI

2 import pennylane as qml

----> 3 dev = qml.device(‘lightning.gpu’, wires=8, mpi=True)

4 @qml.qnode(dev)

5 def circuit_mpi():

6 qml.PauliX(wires=[0])

File /opt/conda/envs/pen/lib/python3.12/site-packages/pennylane/init.py:413, in device(name, *args, **kwargs)

407 raise DeviceError(

408 f"The {name} plugin requires PennyLane versions {plugin_device_class.pennylane_requires}, "

409 f"however PennyLane version {version} is installed."

410 )

412 # Construct the device

→ 413 dev = plugin_device_class(*args, **options)

415 # Once the device is constructed, we set its custom expansion function if

416 # any custom decompositions were specified.

417 if custom_decomps is not None:

File /opt/conda/envs/pen/lib/python3.12/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:264, in LightningGPU.init(self, wires, mpi, mpi_buf_size, sync, c_dtype, shots, batch_obs)

262 else:

263 self._mpi = True

→ 264 self._mpi_init_helper(self.num_wires)

266 if mpi_buf_size < 0:

267 raise TypeError(f"Unsupported mpi_buf_size value: {mpi_buf_size}")

File /opt/conda/envs/pen/lib/python3.12/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:298, in LightningGPU._mpi_init_helper(self, num_wires)

296 “”“Set up MPI checks.”“”

297 if not MPI_SUPPORT:

→ 298 raise ImportError(“MPI related APIs are not found.”)

299 # initialize MPIManager and config check in the MPIManager ctor

300 self._mpi_manager = MPIManager()

ImportError: MPI related APIs are not found.

I’m currently setting up pennylane-lightning-gpu on a machine with an NVIDIA GPU, following the documentation and a combination of steps to ensure everything is correctly installed. However, I’m encountering persistent errors during MPI testing, and I’d greatly appreciate any guidance.

Steps Followed


# Step 1: Created a clean Conda environment

conda create -n test

conda activate test

# Step 2: Installed pennylane-lightning-gpu

conda install -c conda-forge pennylane-lightning-gpu=0.37.0

# Step 3: Installed Jupyter kernel

conda install ipykernel

python -m ipykernel install --user --name pennylane

# Step 4: Installed MPI dependencies

conda install -c conda-forge mpi4py openmpi

# Step 5: Installed CUDA runtime libraries

conda install -c conda-forge cuda-cudart cuda-version=12

# Step 6: Reinstalled pennylane-lightning-gpu with custatevec_cu12 support

conda uninstall -c conda-forge pennylane-lightning-gpu

conda install -c conda-forge pennylane-lightning-gpu

# Step 7: Attempted MPI testing

cd pennylane-lightning

mpirun -np 2 --mca opal_cuda_support 1 python -m pytest mpitests --tb=short

**Current Conda Configurations**

**CUDA Libraries:**

cuda-cudart 12.3.101 hd3aeb46_1 conda-forge

cuda-nvrtc 12.3.107 hd3aeb46_1 conda-forge

cuda-version 12.3 h32bc705_3 conda-forge

MPI Libraries:

mpi 1.0 openmpi conda-forge

mpi4py 4.0.1 py312h5ca6011_0 conda-forge

openmpi 5.0.6 hd45feaf_100 conda-forge

**Issue**

When I run the mpirun test, I encounter errors indicating that MPI-related APIs are not found. Below is an excerpt from the error log:

ERROR mpitests/test_apply.py::TestApply::test_state_prep[dev_mpi1-1-BasisState] - ImportError: MPI related APIs are not found.

...

prterun detected that one or more processes exited with non-zero status...

RROR mpitests/test_measurements_sparse.py::TestSparseExpval::test_sparse_Pauli_words[complex128-cases1] - ImportError: MPI related APIs are not found.

ERROR mpitests/test_measurements_sparse.py::TestSparseExpval::test_sparse_Pauli_words[complex128-cases2] - ImportError: MPI related APIs are not found.

ERROR mpitests/test_measurements_sparse.py::TestSparseExpval::test_sparse_Pauli_words[complex128-cases3] - ImportError: MPI related APIs are not found.

ERROR mpitests/test_measurements_sparse.py::TestSparseExpval::test_sparse_Pauli_words[complex128-cases4] - ImportError: MPI related APIs are not found.

ERROR mpitests/test_measurements_sparse.py::TestSparseExpval::test_sparse_Pauli_words[complex128-cases5] - ImportError: MPI related APIs are not found.

================================== 1510 failed, 7 passed, 79 skipped, 1 warning, 1320 errors in 75.75s (0:01:15) ==================================

--------------------------------------------------------------------------

prterun detected that one or more processes exited with non-zero status,

thus causing the job to be terminated. The first process to do so was:

Process name: [prterun-b96eadb2eff6-418850@1,1]

Exit code: 1

**Environment Details**

* **Pennylane Version:** 0.37.0
* **Pennylane-Lightning-GPU Version:** 0.37.0
* **CUDA Version:** 12.3
* **MPI Version:** OpenMPI 5.0.6

I’ve verified the OpenMPI installation, added relevant environment variables, and ensured that mpi4py can detect MPI bindings by testing with:

**Request for Help**

Does anyone know what might be causing these MPI-related API errors during pennylane-lightning-gputests? Could it be related to CUDA compatibility with OpenMPI, or am I missing a configuration step?

Any help would be greatly appreciated!

Hi @Parfait_Atchade ,

From your error message it seems that you’re missing the path to libmpi.so, which should be found in LD_LIBRARY_PATH

We have this Pull Request in progress to update the installation instructions for lightning.gpu with MPI support to add this step. Let us know if this fixes your issue.

If this doesn’t work then the issue might be with the installation of MPI on your system. You can try with Ubuntu via

sudo apt-get update && sudo apt-get install -y openmpi-bin libopenmpi-dev

Let me know if this solves your issue!

Many thanks @CatalinaAlbornoz! Let me try it and come back to you.