Multiprocessing issue with lightning.qubit

Hi!

I’ve been running into some very strange behavior and couldn’t find anything similar on SO or other forums. I have a code (which I did not write and the original author is not working on it anymore) that calls Pennylane in a subroutine which is executed inside a pool of worker processes. This whole procedure is executed multiple times, in a loop.

Now here’s the problem: as long as I use the “default.qubit” device, the whole thing runs smoothly, if a little slowly, and all cores on the cluster node are busy as expected. But if I switch to “lightning.qubit” something weird happens: the code spawns the worker pool and runs the parallel subroutines correctly the first time, giving the required output with a major speedup over the default backend, but as soon as the loop starts over, everything freezes and all cores go idle indefinitely. No errors, no crashes, no messages, the program just falls asleep. Applying it to a different use case leads to the same outcome, but after a few more runs of the loop. Did anyone ever run into something similar? What could be causing it? I’m using version 0.41.0.

Thanks for any suggestions, I’m utterly lost here.

Hi @Fabio_M , welcome to the Forum!

This definitely looks like unexpected behaviour. Can you please post the output of qml.about() and any info you can share about the cluster you’re running on?

Hi Catalina,

thanks a lot for your response. I’m running my code on a cluster node with 2 AMD EPYC 7402 24-core processors, as well as two Nvidia A100 GPUs which I’m not using in this specific code.

I should add that in the meantime I found one way to prevent the code from hanging, but it’s more of a dirty workaround than a real solution. I forced OMP_NUM_THREADS=1 before running the code, on the hunch that pennylane.lightning’s internal multithreading may be getting in the way of the parallel processes generated by pool, and it seems to be working because the code isn’t getting stuck anymore. Not sure how reliable this is, but I hope it can be useful to figure out what’s going on.

Here is the output of qml.about(), as you asked. Thanks

Name: PennyLane
Version: 0.41.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: /my-virtual-environment-path/lib/python3.10/site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing-extensions
Required-by: PennyLane_Lightning, PennyLane_Lightning_GPU

Platform info:           Linux-4.18.0-147.el8.x86_64-x86_64-with-glibc2.28
Python version:          3.10.9
Numpy version:           1.26.4
Scipy version:           1.14.1
Installed devices:
- lightning.qubit (PennyLane_Lightning-0.41.1)
- lightning.gpu (PennyLane_Lightning_GPU-0.41.1)
- default.clifford (PennyLane-0.41.1)
- default.gaussian (PennyLane-0.41.1)
- default.mixed (PennyLane-0.41.1)
- default.qubit (PennyLane-0.41.1)
- default.qutrit (PennyLane-0.41.1)
- default.qutrit.mixed (PennyLane-0.41.1)
- default.tensor (PennyLane-0.41.1)
- null.qubit (PennyLane-0.41.1)
- reference.qubit (PennyLane-0.41.1)

Thanks for sharing these details @Fabio_M .

I’ve shared these with our team so that they can look into the issue.
If possible, could you please share a minimal reproducible code example that we can use to try to reproduce the problem?