Is parameter broadcasting considered parallel or concurrent processing when using PennyLane?

CatalinaAlbornoz · May 13, 2025, 3:59pm

At the moment you don’t have any broadcasting in your circuit nor anything that would make it run in parallel.

Broadcasting itself doesn’t parallelize but is makes the looping (e.g. the for loops) happen in C instead of Python which is more efficient.

The code below shows how to do actual broadcasting.

import torch
import torch.nn as nn
import pennylane as qml
import matplotlib.pyplot as plt

class QNN(nn.Module):
    def __init__(self):
        super(QNN, self).__init__()
        self.qubit_num = 4
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        if torch.cuda.is_available():
            print("Using GPU")
            q_device = qml.device("default.qubit.torch", wires=self.qubit_num, torch_device="cuda")
        else:
            print("Using CPU")
            # Set max_workers depending on the number of cores you have available
            q_device = qml.device("default.qubit", wires=self.qubit_num, max_workers=2)

        # Set diff_method="adjoint"
        self.QNode = qml.QNode(self.qnn_circuit, q_device, interface="torch", diff_method="adjoint")
        self.measure_set = [0, 2]

        self.p_rotation = nn.Parameter(torch.rand(12) * torch.pi, requires_grad=True)

    def qnn_circuit(self, embedding):
        qml.AngleEmbedding(features=embedding, wires=range(4))
        for i in range(4):
            qml.RY(self.p_rotation[i], wires=i)
        return [qml.expval(qml.PauliZ(w)) for w in self.measure_set]

    def forward(self, x):
        x = x / torch.norm(x, dim=1, keepdim=True)  # normalize
        # You don't need to split the dataset into the different datapoints
        #outputs = [torch.tensor(self.QNode(xi), device=self.device) for xi in x]

        # You can use broadcasting so that the looping occurs in C which is more efficient
        outputs = self.QNode(x)
        return torch.stack(outputs, dim=1) # Add the dimension so that the results get stacked properly

model = QNN()
x = torch.randn(5,4)  
output = model(x)
#print("Output shape:", output.shape)
print(output)

In addition to broadcasting …

Since you’re using default.qubit you can set max_workers as seen in the docs to use a pool of at most max_workers processes asynchronously. This doesn’t guarantee that they will all be used, this is just a maximum. Note that backprop doesn’t work with this so you’ll need to change the diff_method to diff_method="adjoint".

Take a look at the code above and let me know if you have any further questions. You can also check out our performance page to learn about other simulators we have.

I hope this helps!

Topic		Replies	Views
Questions about parallel execution PennyLane Help	7	345	August 3, 2023
Question on data parallelization PennyLane Help	4	491	January 24, 2024
How to execute quantum circuits with different parameters and different measurements in parallel? PennyLane Help	1	266	January 18, 2024
Parallellization issues PennyLane Help	5	2119	April 28, 2022
Parameter broadcasting problem with torch node PennyLane Help	13	1006	August 3, 2023

Is parameter broadcasting considered parallel or concurrent processing when using PennyLane?

Related topics