Circuit execution for batched inputs

JesusBG · May 22, 2025, 3:45pm

I’m currently benchmarking several QML models with different architectures, trying to identify the fastest simulation backend when using batched inputs, especially since some of the models are hybrid and built with TorchLayer.

I noticed that even though default.qubit, lightning.qubit, and lightning.gpu all accept batched inputs, only default.qubit seems to actually benefit from batching in terms of execution speed: the time taken is significantly reduced and reduction scales with batch size.

To test this, I ran the following minimal example:

import pennylane as qml
import torch
import time

dev = qml.device("lightning.gpu", wires=3)

@qml.qnode(dev, interface="torch")
def circuit(x):
    qml.AngleEmbedding(x, wires=[0, 1, 2])
    return qml.expval(qml.PauliZ(0))

x_batch = torch.randn(5000, 3)

# --- Batched execution ---
start_batch = time.time()
results_batch = circuit(x_batch)
end_batch = time.time()

# --- Sequential execution ---
start_seq = time.time()
results_seq = torch.stack([circuit(x) for x in x_batch])
end_seq = time.time()

print(f"Batched time:   {end_batch - start_batch:.4f} s")
print(f"Sequential time:{end_seq - start_seq:.4f} s")

The timings for lightning.qubit and lightning.gpu are only slightly different between the batched and sequential versions, which makes me wonder whether these devices actually support parameter broadcasting in the sense of processing batched inputs in parallel? Or are the inputs being unrolled and evaluated sequentially under the hood despite accepting batched input shapes?

Thanks a lot!

CatalinaAlbornoz · May 22, 2025, 9:57pm

Hi @JesusBG , welcome to the Forum!

default.qubit utilizes NumPy broadcasting under the hood:

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

So it makes sense that the benefit of using default.qubit with batched inputs scales well.

Since lightning.qubit and lightning.gpu are already built on C++ I can see why the benefit from batching with those devices is lower.

Feel free to take a look at Forum post #8403, you might find it useful too.

I hope this helps!

Topic		Replies	Views
Batching Inputs to Quantum Circuit PennyLane Help	14	3297	September 10, 2019
Lightning gpu batch circuit PennyLane Help	5	37	May 9, 2025
Batching both parameters and input PennyLane Help	3	475	January 18, 2024
Circuit simulation on 'ibmq_qasm_simulator' taking a lot of time PennyLane Qiskit	7	698	August 9, 2023
How to execute quantum circuits with different parameters and different measurements in parallel? PennyLane Help	1	264	January 18, 2024

Circuit execution for batched inputs

Related topics