Running circuits with many mid-circuit measurements

I’d like to simulate very long circuits with many mid-circuit measurements as space and time-efficiently as possible, and am having trouble figuring out what the best approach is in terms of simulator settings.

As a toy example, here is some code that generates a parameterized circuit that can potentially have very many mid-circuit measurements.

import pennylane as qml
import torch

def qnode_gen(device, diff_method='backprop', postselect=None):
    @qml.qnode(device, interface='torch', diff_method=diff_method)
    def qnode(inputs, weight):
        measurements = []
        for real in inputs:
            qml.RY(real, 0)
            qml.RY(weight, 0)
            m = qml.measure(0, reset=True, postselect=postselect)
        return tuple(qml.expval(m) for m in measurements)
    return qnode

inputs = torch.rand(ITERATIONS)
weight = torch.rand(1).requires_grad_()

qnode = qnode_gen(qml.device('default.qubit', wires=1))
fig, _ = qml.draw_mpl(qnode)(inputs, weight)

This produces the following circuit diagram for a circuit with four measurements:

If you’re interested enough to see a testing script for this toy example, I have one here.

I’ve tested a number of combinations of different torch and pennylane device settings without finding one that runs satisfyingly quickly for larger values of ITERATIONS. In summary:

  1. qml.device('default.qubit', diff_method='backprop') has exponentially-sized saved tensors in the computation graph (Saved by BmmBackward0 and PowBackward0). This makes sense since it has to maintain a statistical mixture of measurement outcomes.
  2. qml.device('default.qubit', diff_method='parameter-shift') doesn’t have exponentially-sized saved tensors in the computation graph, but still scales poorly. I expect for the same reason: the forward pass requires maintaining the statistical mixture.
    1. qml.device('default.qubit', diff_method='parameter-shift', shots=10) gives me a “probabilities do not sum to 1” error.
  3. qml.device('default.qubit.torch') with either method causes an error where the fixed number of qubits isn’t enough to support the additional qubits that the automatic call to defer_measurements requires.
  4. Using a torch.device('cuda') instead of cpu presents an issue in case 1) but not in case 2) involving not all tensors being on the same device.
  5. It occurs to me that if all these measurements are post-selected, then there shouldn’t be an exponential scaling issue in cases 1) and 2), which makes me think my justification is wrong.

In more summary:
I have a number of issues or errors trying different options and figured it would be more time-efficient to ask what the best approach is before going down the debugging rabbit hole:

What is the best simulator and torch device configuration for circuits with many mid-circuit measurements, like above example?

I suspect it will be qml.device('default.qubit', diff_method='parameter-shift', shots=some_int) on torch.device('cuda').

Thank you very much for reading this, and any advice you can provide!

Hi @ahirth! Thanks for posting this, it’s really useful feedback for a feature we’re actively developing in 2024.

Historically, PennyLane has always used the deferred measurement approach to carry out mid-circuit measurements by adding additional qubits. This was a known scaling issue and we have added a one-shot approach in the 0.35 release of PennyLane, allowing in principle to perform hundreds or thousands of mid-circuit measurements when working in a finite-shot setting.

However, your example seems to highlight a few issues that we’ve recorded on our GitHub repo:

  • 5443 - issue with falling back to the deferred measurement approach when broadcasting.
  • 5444 - issue with probabilities not summing up to one when broadcasting.
  • 5442 - not completely related, but issues with using Torch when returning qml.sample.

The reason I mention broadcasting is because weight in your example has a shape of [1] and hence looks like a batch dimension to PennyLane. You could solve this for now by doing weight = torch.rand(1).requires_grad_().squeeze().

Thank you! How lucky am I to not only receive a response so quickly, but for the feature I needed to be a recent addition.

To confirm: My target to get working should be a default.qubit device with finite shots, which should work on cpu and cuda. I noticed diff_method isn’t mentioned in the link for one-shot approaches, does that choice matter here?

Additionally I need to make sure that no broadcasting is happening, to avoid the bugs you found.

To make sure I understand broadcasting correctly: Would changing the use of weight from qml.RY(weight, 0) to qml.RY(weight[0], 0) be an equivalent temporary solution?:

  • My understanding is that the dimension of inputs is checked (in the forward function of TorchLayer), and any input with more than one dimension is considered batched. I don’t know if the dimension of other inputs, like weight/weights is checked explicitly.
  • My guess is that non-inputs parameter batching is performed based on the shape of the value that each base level operation (eg qml.RY) receives, rather than by considering the explicit shape of non-inputs parameters of the qnode.
  • This should enable us to send any arbitrary shape tensor into non-inputs parameters, as long as we make sure the operations themselves are receiving the right shape inputs (zero-dim tensors for qml.RY).
  • If this guess is incorrect, every parameter needs to be flattened, or sent as its own argument.

Is there a way to explicitly check if batching is happening in my code to make sure I’m avoiding it for now? (without modifying library code :smile: )

Thank you so much!

To add to this:

  1. I don’t think qml.dynamic_one_shot is available in the 0.35.1 release on pip unless I’m mistaken.
  2. It is available on 0.36.0.dev0, but I’m running into an issue that looks like 5319.

It would be challenging to post a more detailed error message at the moment due to the size of my code, but if anything comes to mind as a fix or workaround, I’d greatly appreciate it.

Here’s a colab that highlights what I’m running into: link. In short, default.qubit with parameter-shift and finite shots either errors or yields undifferentiable results with interfaces torch, tf and jax, for the circuit example presented above, on version 0.36.0.dev0.

Might be related to 5316, but the errors look quite different.

Hey @ahirth! Apologies for the delay in getting back to you. I just wanted to quickly jump on here and let you know that I’m looking into the issue and will get back to you shortly!

Unfortunately I don’t think there’s an intermediate fix for now, but we have this issue tracked internally and will update you when we can :slight_smile: