Parallel Circuit execution during optimisation

acn · June 2, 2023, 5:38am

Hi,

I am trying to optimise a variational circuit by batching the inputs. A natural way I feel to speed this process up would be to execute each input in parallel during the optimisation. However trying to implement this using both multiprocessing and pathos seems to not work. Here’s a minimal example to illustrate this

import pennylane as qml
from pennylane import numpy as np
from pennylane.templates import AmplitudeEmbedding
import multiprocessing as mp
from functools import partial
from pathos.multiprocessing import ProcessingPool as Pool

dev = qml.device("default.qubit", wires=2)
var = np.random.randn(2, requires_grad=True)
opt = qml.AdamOptimizer()
xes = np.array(([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]))
batch_size = 2

@qml.qnode(dev)
def circuit(var, x):
   AmplitudeEmbedding(features=x, wires=range(2), normalize=True)
   qml.RX(var[0], wires=0)
   qml.RX(var[1], wires=0)
   return qml.expval(qml.PauliZ(0))

# To avoid the pickler complaining even more than it would otherwise
def circuit_wrapper(var, x):
   return circuit(var, x)

def cost_serial(var, X):
   return np.sum([circuit(var,x) for x in X])

def cost_parallel(var, X, pool):
   mapper = partial(circuit_wrapper, var)
   return np.sum(pool.map(mapper, X))

# First step serially
for i in range(0, len(xes), batch_size):
   X = xes[i:i+batch_size]
   var = opt.step(lambda v : cost_serial(v, X), var)
# This will work without error

# Try and just do a standard evaluation in parallel using multiprocessing
pool = mp.Pool(4)
mapper = partial(circuit_wrapper, var)
result = np.sum(pool.map(mapper, X))
print("The result of our dummy circuit using multiprocessing is {:.3f}".format(result))
# This will work without error

# Now try and do the circuit evaluation in an optimiser
for i in range(0, len(xes), batch_size):
   X = xes[i:i+batch_size]
   try:
       var = opt.step(lambda v : cost_parallel(v, X, pool), var)
   except Exception as e:
       print(e)
# This will fail

# Now let's try and use pathos
pool = Pool(4)
mapper = partial(circuit_wrapper, var)
result = np.sum(pool.map(mapper, X))
print("The result of our dummy circuit using pathos is {:.3f}".format(result))
# This will work without error

# Now try and do the circuit evaluation in an optimiser
for i in range(0, len(xes), batch_size):
   X = xes[i:i+batch_size]
   try:
       var = opt.step(lambda v : cost_parallel(v, X, pool), var)
   except Exception as e:
       print(e)
# This will fail

multiprocessing will fail with Can't pickle local object 'VJPNode.initialize_root.<locals>.<lambda>' which is not that surprising given what I’ve seen online.

Using pathos gives a far stranger error though

Traceback (most recent call last):
  File "[redacted]/lib/python3.11/site-packages/autograd/tracer.py", line 118, in new_box
    return box_type_mappings[type(value)](value, trace, node)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: <class 'pennylane.numpy.tensor.tensor'>

I think on a technical level I know what’s going on here and why pathos won’t work. The optimiser registers the parameters with autograd but the subprocesses are spawning off their own copies of the parameters which autograd doesn’t know about…I could be wrong about this but that’s my guess after digging around for a bit.

That’s a bit of an aside though. fundamentally I would like to be able to run these circuits in parallel during the optimisation by whatever means works. What are people’s suggestions here on how best to achieve this?

isaacdevlugt · June 2, 2023, 2:01pm

Hey @acn! Welcome to the forum

I’m not sure which version of PennyLane you’re using, but in v0.24 we introduced support for parameter broadcasting . I recommend using v0.30 (the most current version), though.

Here’s an example!

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev)
def circuit(features):
    qml.AngleEmbedding(features, wires=range(2))
    return [qml.expval(qml.PauliZ(i)) for i in range(2)]

batch1 = [0.1, 0.2]
batch2 = [0.3, 0.4]
batches = [batch1, batch2]

print(circuit(batches))
print(dev.num_executions)

'''
[tensor([0.99500417, 0.95533649], requires_grad=True), tensor([0.98006658, 0.92106099], requires_grad=True)]
1
'''

As you can see, the device only has one execution even though there are two batches of parameters. There are some gaps of support for parameter broadcasting that we are working on for future releases, but if you’re using default.qubit you should be fine

erikrecio · December 28, 2023, 10:19am

Hello @acn I’m going through exactly the same problem. Did you find any solution for parallelizing circuits while running in an optimizer?

isaacdevlugt · January 3, 2024, 2:51pm

Hey @erikrecio, welcome to the forum!

If parameter broadcasting isn’t what you need, there are some parallel computation packages you can use like Dask (although I don’t recommend using it). Our lightning qubit device also has parallel adjoint differentiation support (see here: Lightning Qubit device — Lightning 0.33.1 documentation). Our lightning gpu device also has parallel compute support: PennyLane v0.31 released | PennyLane Blog

Let me know if any of this helps!

Tz_19 · March 20, 2025, 3:38am

Hello @isaacdevlugt , I’m encountering the similar problem. I got errors when trying to use parameter broadcasting. Here’s a minimal example circuit:

N=6
dev=qml.device('lightning.qubit',wires=N)

@qml.qnode(dev)
def circuit(ini_state_batch, tar_states):
    # An simple example that output the fidelity between the initial state and the target state
    qml.StatePrep(ini_state_batch, wires=range(N)) # The StatePrep broadcasts the states with no error
    if len(tar_states[0])>1: # more than one state
        tar_batch=[]
        for i in range(len(tar_states)):
            tar_batch.append(qml.math.dm_from_state_vector(tar_states[i])) 
    else: # just one state, do not need broadcasting
        tar_batch=qml.math.dm_from_state_vector(tar_states)
    return qml.expval(qml.Hermitian(tar_batch, wires=range(N)))

ini_state_batch=[]
tar_state_batch=[]
target = SU_N(2**N) # a function used to generate a random SU(N) matrix
for i in range(10):
    ini_state = getstates(N, True) # a function used to generate a random quantum state, the return is an np.array with 2**N dimensions. True means it will return a complex state.
    ini_state_batch.append(ini_state.tolist()) # the tolist() operation is to prevent pennylane from regarding the initial state as trainable parameters
    tar_state_batch.append((target @ ini_state).tolist())
print(circuit(ini_state_batch, tar_state_batch))

I constanty get the error ValueError: Observable must be a square matrix. at the last statement of the circuit() function. It looks like that qml.Hermitian always take tar_batch as a single density matrix, however, the tar_batch is a batch containing 10 density matrices. Since the SU_N() and getstates() has been used hundreds of times, they are probably not the cause of the error. Is my way of using parameter broadcasting wrong?
My qml.about():

Name: PennyLane
Version: 0.40.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /home/timmy/.local/lib/python3.12/site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing-extensions
Required-by: PennyLane_Lightning, PennyLane_Lightning_GPU

Platform info:           Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version:          3.12.3
Numpy version:           2.2.2
Scipy version:           1.14.1
Installed devices:
- lightning.gpu (PennyLane_Lightning_GPU-0.40.0)
- lightning.qubit (PennyLane_Lightning-0.40.0)
- default.clifford (PennyLane-0.40.0)
- default.gaussian (PennyLane-0.40.0)
- default.mixed (PennyLane-0.40.0)
- default.qubit (PennyLane-0.40.0)
- default.qutrit (PennyLane-0.40.0)
- default.qutrit.mixed (PennyLane-0.40.0)
- default.tensor (PennyLane-0.40.0)
- null.qubit (PennyLane-0.40.0)
- reference.qubit (PennyLane-0.40.0)
None

If you want to run my code, the SU_N and getstates() are below:

def SU_N(n):
    """
    Generate an n-order special unitary matrix (SU(n)).
    
    Args:
        n (int): The order of the matrix.
        
    Returns:
        np.ndarray: An n x n unitary matrix with determinant 1.
    """
    # Step 1: Generate a random complex matrix
    z = (np.random.randn(n, n) + 1j * np.random.randn(n, n)) / np.sqrt(2)
    
    # Step 2: Perform QR decomposition to get a unitary matrix
    q, r = np.linalg.qr(z)
    
    # Step 3: Adjust the diagonal of R to ensure Q is unitary
    d = np.diagonal(r)
    ph = d / np.abs(d)  # Normalize to unit modulus
    q = q @ np.diag(ph)
    
    # Step 4: Ensure determinant is 1
    det_q = np.linalg.det(q)
    q[0] = q[0] / det_q
    return q

def getstates(n, complex=False):
    state_r=np.random.rand(2**n)
    while np.linalg.norm(state_r)<10**(-8):
        state_r=np.random.rand(2**n)
    if complex:
        state_i=np.random.rand(2**n)
        while np.linalg.norm(state_i)<10**(-8):
            state_i=np.random.rand(2**n)
        state=state_r+1j*state_i
    else:
        state=state_r
    state=state/np.linalg.norm(state)
    return state

CatalinaAlbornoz · March 20, 2025, 9:13pm

Hi @Tz_19 ,

You shouldn’t use batches within the circuit but instead outside of it. In many cases broadcasting is supported, but not always.

In this case my recommendation would be to create a simple test circuit which receives some batched inputs and parameters. You can try something similar to the code by @Kernel123 in topic #8124. You can even simplify it more if you want to test any changes.

If you’re still facing issues with a simple test circuit please post the full code to reproduce and the full error traceback.

I hope this helps!

Tz_19 · April 11, 2025, 8:33am

Hi @CatalinaAlbornoz , thank you for your reply! I followed your reply and ensured that the batches are used outside the circuit. I also read the topic #8124 and changed the type of my batches to np.array as you suggested the poster of #8124 to change the type of his batch to jnp.array. However, that did not solve the problem. Here is a minimal full example:

import pennylane as qml
from pennylane import numpy as np
import cmath

N=6
dev=qml.device('lightning.qubit',wires=N)

def getstates(n, complex=False):
    """
    Generate a random state vector.

    Args:
        n (int): the qubit number.

    Returns:
        np.ndarray: An 2^n dimension vector.
    """
    state_r=np.random.rand(2**n)
    while np.linalg.norm(state_r)<10**(-8):
        state_r=np.random.rand(2**n)
    if complex:
        state_i=np.random.rand(2**n)
        while np.linalg.norm(state_i)<10**(-8):
            state_i=np.random.rand(2**n)
        state=state_r+1j*state_i
    else:
        state=state_r
    state=state/np.linalg.norm(state)
    return state

def SU_N(n):
    """
    Generate an n-order special unitary matrix (SU(n)).
   
    Args:
        n (int): The order of the matrix.
        
    Returns:
        np.ndarray: An n x n unitary matrix with determinant 1.
    """
    z = (np.random.randn(n, n) + 1j * np.random.randn(n, n)) / np.sqrt(2)
    q, r = np.linalg.qr(z)
    d = np.diagonal(r)
    ph = d / np.abs(d)  # Normalize to unit modulus
    q = q @ np.diag(ph)
    det_q = np.linalg.det(q)
    q[0] = q[0] / det_q
    return q

@qml.qnode(dev)
def circuit(ini_state_batch, tar_dm_batch):
    qml.StatePrep(ini_state_batch, wires=range(N))
    return qml.expval(qml.Hermitian(tar_dm_batch, wires=range(N)))

ini_state_batch=[]
tar_dm_batch=[]
target = SU_N(2**N)
for i in range(10):
    ini_state = getstates(N, True)
    ini_state_batch.append(ini_state.tolist())
    tar_state=target@ini_state
    tar_dm_batch.append(qml.math.dm_from_state_vector(tar_state))
for j in range(len(tar_dm_batch)):
    print(len(tar_dm_batch[j]), len(tar_dm_batch[j][0]))
print(circuit(np.array(ini_state_batch), np.array(tar_dm_batch)))

Here is the output of the code:

64 64
64 64
64 64
64 64
64 64
64 64
64 64
64 64
64 64
64 64
Traceback (most recent call last):
  File "/home/timmy/2025Spring/test.py", line 65, in <module>
    print(circuit(np.array(ini_state_batch), np.array(tar_dm_batch)))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/workflow/qnode.py", line 905, in __call__
    return self._impl_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/workflow/qnode.py", line 868, in _impl_call
    tape = self.construct(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/logging/decorators.py", line 61, in wrapper_entry
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/workflow/qnode.py", line 854, in construct
    self._qfunc_output = self.func(*args, **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/2025Spring/test.py", line 53, in circuit
    return qml.expval(qml.Hermitian(tar_dm_batch, wires=range(N)))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/capture/capture_meta.py", line 89, in __call__
    return type.__call__(cls, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/ops/qubit/observables.py", line 87, in __init__
    Hermitian._validate_input(A, expected_mx_shape)
  File "/home/timmy/.local/lib/python3.12/site-packages/pennylane/ops/qubit/observables.py", line 95, in _validate_input
    raise ValueError("Observable must be a square matrix.")
ValueError: Observable must be a square matrix.

The first ten 64 64 is the number of the rows and columns of the elements in tar_dm_batch, printed from line 63 and 64, which meets the qubit number N=6. However, the error ValueError: Observable must be a square matrix. is constantly thrown out at line 53, which is the return statement of circuit(). The only thing I did not follow is that the poster in #8124 uses jax, but I’m not familiar with it so I just used numpy. Is jax necessary in my case?
My qml.about():

Version: 0.40.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /home/timmy/.local/lib/python3.12/site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing-extensions
Required-by: PennyLane_Lightning, PennyLane_Lightning_GPU

Platform info:           Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version:          3.12.3
Numpy version:           2.2.2
Scipy version:           1.14.1
Installed devices:
- lightning.gpu (PennyLane_Lightning_GPU-0.40.0)
- lightning.qubit (PennyLane_Lightning-0.40.0)
- default.clifford (PennyLane-0.40.0)
- default.gaussian (PennyLane-0.40.0)
- default.mixed (PennyLane-0.40.0)
- default.qubit (PennyLane-0.40.0)
- default.qutrit (PennyLane-0.40.0)
- default.qutrit.mixed (PennyLane-0.40.0)
- default.tensor (PennyLane-0.40.0)
- null.qubit (PennyLane-0.40.0)
- reference.qubit (PennyLane-0.40.0)
None

CatalinaAlbornoz · April 11, 2025, 7:58pm

Thanks for sharing this code @Tz_19 ! I can replicate the error. My colleague Daniela will take a look at you code next week to see what the issue might be. I think we might need to add a for loop to break the batches somewhere. We’ll let you know what we find.

daniela.murcillo · April 14, 2025, 5:59pm

Hi, @Tz_19
Thank you for your question. I used your code and was able to reproduce the error.
The first thing is that you are trying to do parameter broadcasting for qml.Hermitian but this isn’t supported yet (see here).
Just to double check, I tried to follow the example presented for qml.transforms.broadcast_expand with qml.Hermitian in a simplified example and got the same error as you.

import pennylane as qml
from pennylane import numpy as np
qml.Hermitian.ndim_params = (2,)
dev = qml.device("lightning.qubit", wires=1)
@qml.qnode(dev)
def circuit(x):
    return qml.expval(qml.Hermitian(x, wires=range(1)))

expanded_circuit = qml.transforms.broadcast_expand(circuit)
x = np.array([[[0.2, 0.6],[0.5, 1.0]],[[0.2, 0.6],[0.5, 1.0]]], requires_grad=True)
print(expanded_circuit(x))

Given that in topic #8124 they were able to implement an example using qml.Hermitian, it will be wise to give JAX a chance. This demo and that forum discussion can be helpful.

Let us know of any questions you might have while trying this new approach.

Tz_19 · May 7, 2025, 8:28am

Thank you very much. I had succeeded in revising the code by the demo you mentioned. My code can run correctly now, although it runs very slowly. I will try to figure out the reason causing the code running slowly. Thank you @daniela.murcillo @CatalinaAlbornoz a lot for helping.

CatalinaAlbornoz · May 7, 2025, 10:07pm

I’m glad to hear this worked @Tz_19 ! Thank you for letting us know!

Topic		Replies	Views
How to perform optimisation in parallel? PennyLane Help	6	62	December 18, 2024
Questions about parallel execution PennyLane Help	7	355	August 3, 2023
Batching Inputs to Quantum Circuit PennyLane Help	14	3310	September 10, 2019
Backward function takes long and batches PennyLane Help	14	2599	July 4, 2023
Is there a way to parallelize the same circuit for multiple input data? PennyLane Help	18	2456	April 3, 2023

Parallel Circuit execution during optimisation

Related topics