Circuit execution for batched inputs

I’m currently benchmarking several QML models with different architectures, trying to identify the fastest simulation backend when using batched inputs, especially since some of the models are hybrid and built with TorchLayer.

I noticed that even though default.qubit, lightning.qubit, and lightning.gpu all accept batched inputs, only default.qubit seems to actually benefit from batching in terms of execution speed: the time taken is significantly reduced and reduction scales with batch size.

To test this, I ran the following minimal example:

import pennylane as qml
import torch
import time

dev = qml.device("lightning.gpu", wires=3)

@qml.qnode(dev, interface="torch")
def circuit(x):
    qml.AngleEmbedding(x, wires=[0, 1, 2])
    return qml.expval(qml.PauliZ(0))

x_batch = torch.randn(5000, 3)

# --- Batched execution ---
start_batch = time.time()
results_batch = circuit(x_batch)
end_batch = time.time()

# --- Sequential execution ---
start_seq = time.time()
results_seq = torch.stack([circuit(x) for x in x_batch])
end_seq = time.time()

print(f"Batched time:   {end_batch - start_batch:.4f} s")
print(f"Sequential time:{end_seq - start_seq:.4f} s")

The timings for lightning.qubit and lightning.gpu are only slightly different between the batched and sequential versions, which makes me wonder whether these devices actually support parameter broadcasting in the sense of processing batched inputs in parallel? Or are the inputs being unrolled and evaluated sequentially under the hood despite accepting batched input shapes?

Thanks a lot!

Hi @JesusBG , welcome to the Forum!

default.qubit utilizes NumPy broadcasting under the hood:

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

So it makes sense that the benefit of using default.qubit with batched inputs scales well.

Since lightning.qubit and lightning.gpu are already built on C++ I can see why the benefit from batching with those devices is lower.

Feel free to take a look at Forum post #8403, you might find it useful too.

I hope this helps!

1 Like