Backward function takes long and batches


I’m working on a project where I use the pennylane/PyTorch interface.
To do so I have build a model inheriting the nn.Module, in which I have defined my circuit inside my forward function.

I’m working on a8 qbits simulation, with 3 simple layers. Each layer have around 20 rotations and 8 CNOT. I optimize some parameters of the rotations. My data is quite simple (8 features).
However when it comes to the backwards function, it take long a few minutes for each sample of my data.

  1. Is it normal or I did something wrong somewhere ?
  2. Can I use a DataLoader with batch for a simulation ?
  3. Is the gradient calculated (for simulation) as a classical object using autograd or the quantum way (meaning I can only access measurement and all the problems that comes along).

Best regards and thank you for your job on this great library,


Hi @barthelemymp,

To answer your questions (as best I can):

  1. If you have m parameters, it takes O(2m) circuit evaluations to compute the gradients (needed for backpropagation through the quantum part using pytorch). We have noticed that the timing usually reflects this pretty well. How does your backwards function evaluation compare in run-time with the forward evaluation?

  2. Batching is currently not supported (since none of the simulators support it), but it is a planned feature for the future

  3. All gradients are calculated the quantum way, but we recognize that this can be sped up if i) you are using a simulator which supports (classical) automatic differentiation, and ii) we were to add this awareness to PL (i.e., it recognizes when the simulator can do gradient calculations classically). This is also on our roadmap, but first requires better simulator support


Hi @nathan Thanks for your answer, I am wondering is there any update on batch-size calculations instead of using for loops (Second bullet point above) or still one should use some sort of looping over batch-size? To give you an example please look at the tutorial named, " Quantum transfer learning

q_out = torch.Tensor(0, n_qubits)
q_out =
for elem in q_in:
    q_out_elem = quantum_net(elem, self.q_params).float().unsqueeze(0)
    q_out =, q_out_elem))

This is the bottle-neck of the code which corresponds to a for loop over batch-size for the quantum circuit.

Hi @mamadpierre,

Your question is well-timed. Support for batching circuits is our next major priority to implement in PennyLane. You should expect to see some progress on that pretty soon.

The interesting thing about batching in the case of quantum computations is that there are so many possible things that you could “batch over”, e.g., parameter values (the most common use of batching from ML), measurement settings, size of circuits (as in your example), hyperparameter choices, ansatz choices, etc. One reason we have been holding off with implementing batching is because we want to make sure all these cases are covered as naturally as possible.

Thanks for your answer, :wink: