Running circuits with many mid-circuit measurements

I’d like to simulate very long circuits with many mid-circuit measurements as space and time-efficiently as possible, and am having trouble figuring out what the best approach is in terms of simulator settings.

As a toy example, here is some code that generates a parameterized circuit that can potentially have very many mid-circuit measurements.

``````import pennylane as qml
import torch

def qnode_gen(device, diff_method='backprop', postselect=None):
@qml.qnode(device, interface='torch', diff_method=diff_method)
def qnode(inputs, weight):
measurements = []
for real in inputs:
qml.RY(real, 0)
qml.RY(weight, 0)
m = qml.measure(0, reset=True, postselect=postselect)
measurements.append(m)
return tuple(qml.expval(m) for m in measurements)
return qnode

ITERATIONS = 4
torch.manual_seed(6)
inputs = torch.rand(ITERATIONS)
print(inputs)

qnode = qnode_gen(qml.device('default.qubit', wires=1))
fig, _ = qml.draw_mpl(qnode)(inputs, weight)
``````

This produces the following circuit diagram for a circuit with four measurements:

If you’re interested enough to see a testing script for this toy example, I have one here.

I’ve tested a number of combinations of different torch and pennylane device settings without finding one that runs satisfyingly quickly for larger values of `ITERATIONS`. In summary:

1. `qml.device('default.qubit', diff_method='backprop')` has exponentially-sized saved tensors in the computation graph (Saved by `BmmBackward0` and `PowBackward0`). This makes sense since it has to maintain a statistical mixture of measurement outcomes.
2. `qml.device('default.qubit', diff_method='parameter-shift')` doesn’t have exponentially-sized saved tensors in the computation graph, but still scales poorly. I expect for the same reason: the forward pass requires maintaining the statistical mixture.
1. `qml.device('default.qubit', diff_method='parameter-shift', shots=10)` gives me a “probabilities do not sum to 1” error.
3. `qml.device('default.qubit.torch')` with either method causes an error where the fixed number of qubits isn’t enough to support the additional qubits that the automatic call to `defer_measurements` requires.
4. Using a `torch.device('cuda')` instead of `cpu` presents an issue in case 1) but not in case 2) involving not all tensors being on the same device.
5. It occurs to me that if all these measurements are post-selected, then there shouldn’t be an exponential scaling issue in cases 1) and 2), which makes me think my justification is wrong.

In more summary:
I have a number of issues or errors trying different options and figured it would be more time-efficient to ask what the best approach is before going down the debugging rabbit hole:

What is the best simulator and torch device configuration for circuits with many mid-circuit measurements, like above example?

I suspect it will be `qml.device('default.qubit', diff_method='parameter-shift', shots=some_int)` on `torch.device('cuda')`.

Thank you very much for reading this, and any advice you can provide!

Hi @ahirth! Thanks for posting this, it’s really useful feedback for a feature we’re actively developing in 2024.

Historically, PennyLane has always used the deferred measurement approach to carry out mid-circuit measurements by adding additional qubits. This was a known scaling issue and we have added a one-shot approach in the 0.35 release of PennyLane, allowing in principle to perform hundreds or thousands of mid-circuit measurements when working in a finite-shot setting.

However, your example seems to highlight a few issues that we’ve recorded on our GitHub repo:

• 5443 - issue with falling back to the deferred measurement approach when broadcasting.
• 5444 - issue with probabilities not summing up to one when broadcasting.
• 5442 - not completely related, but issues with using Torch when returning `qml.sample`.

The reason I mention broadcasting is because `weight` in your example has a shape of `[1]` and hence looks like a batch dimension to PennyLane. You could solve this for now by doing `weight = torch.rand(1).requires_grad_().squeeze()`.

Thank you! How lucky am I to not only receive a response so quickly, but for the feature I needed to be a recent addition.

To confirm: My target to get working should be a `default.qubit` device with finite `shots`, which should work on `cpu` and `cuda`. I noticed `diff_method` isn’t mentioned in the link for one-shot approaches, does that choice matter here?

Additionally I need to make sure that no broadcasting is happening, to avoid the bugs you found.

To make sure I understand broadcasting correctly: Would changing the use of `weight` from `qml.RY(weight, 0)` to `qml.RY(weight[0], 0)` be an equivalent temporary solution?:

• My understanding is that the dimension of `inputs` is checked (in the `forward` function of `TorchLayer`), and any input with more than one dimension is considered batched. I don’t know if the dimension of other inputs, like `weight`/`weights` is checked explicitly.
• My guess is that non-`inputs` parameter batching is performed based on the shape of the value that each base level operation (eg `qml.RY`) receives, rather than by considering the explicit shape of non-`inputs` parameters of the qnode.
• This should enable us to send any arbitrary shape tensor into non-`inputs` parameters, as long as we make sure the operations themselves are receiving the right shape inputs (zero-dim tensors for `qml.RY`).
• If this guess is incorrect, every parameter needs to be flattened, or sent as its own argument.

Is there a way to explicitly check if batching is happening in my code to make sure I’m avoiding it for now? (without modifying library code )

Thank you so much!

1. I don’t think `qml.dynamic_one_shot` is available in the 0.35.1 release on pip unless I’m mistaken.
2. It is available on 0.36.0.dev0, but I’m running into an issue that looks like 5319.

It would be challenging to post a more detailed error message at the moment due to the size of my code, but if anything comes to mind as a fix or workaround, I’d greatly appreciate it.

Here’s a colab that highlights what I’m running into: link. In short, `default.qubit` with `parameter-shift` and finite `shots` either errors or yields undifferentiable results with interfaces `torch`, `tf` and `jax`, for the circuit example presented above, on version 0.36.0.dev0.

Might be related to 5316, but the errors look quite different.

Hey @ahirth! Apologies for the delay in getting back to you. I just wanted to quickly jump on here and let you know that I’m looking into the issue and will get back to you shortly!

Unfortunately I don’t think there’s an intermediate fix for now, but we have this issue tracked internally and will update you when we can

Thank you very much @isaacdevlugt for looking into this!

Is a fix for this something that I could help push forward? I’m in need of both parameter-shift gradients and an efficient simulator for circuits with many mid-circuit measurements, and pennylane seems like the closest thing that exists.

If this isn’t an internal-only project, how large do you estimate it to be, and where should I look first?

Hey @ahirth!

There were lots of updates and fixes in the recent v0.36 release. Have you tried updating PennyLane and seeing if your issue is solved?

I have, but unfortunately my issue persists. Here’s a colab that demonstrates it: link.

To summarize, for parameterized circuits with many mid-circuit measurements:

1. `default.qubit` and `parameter-shift` work in analytic mode (`shots=None`), but take time exponential in circuit depth.
2. Other tested configurations using `parameter-shift` fail for one reason or another:
• `default.qubit` and `shots=10`
• `lightning.qubit` and `shots=None`
• `lightning.qubit` and `shots=10`
• all three of above with `torch` instead of `autograd` interface.

I hope these results are incorrect and I instead made a user error while switching from `torch` to `autograd`.

Hi @ahirth!

The problems you mention while using the `torch` interface might be resolved with this issue, which @mudit2812 is working on a fix for.

Though I’m interested to understand the issues you’re seeing with the autograd interface. So we’re on the same page, the following works for me in `v0.36`:

``````import pennylane as qml
import numpy as np

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev)
def f(x):
qml.RX(x, 0)
for i in range(5):
qml.measure(0)
return qml.sample(qml.PauliX(0))

x = np.array(0.4)
f(x)
``````

This is using `default.qubit` and `shots=10`, but things could still break in a more complicated script. Are you able to share a minimal example?

For `lightning.qubit`, things work in the above if you update the device line:

``````dev = qml.device("lightning.qubit", shots=10, wires=1)
``````

We’ve made some improvements to the PennyLane-Lightning device in the most recent release so it might be worth making sure you are using the latest version:

``````pip install pennylane-lightning --upgrade
``````

``````import pennylane as qml
from pennylane import numpy as np # import numpy from pennylane

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev)
def f(x):
qml.RX(x, 0)
for i in range(5):
qml.measure(0)
return qml.expval(qml.PauliX(0)) # return expval instead of sample

print(f(x))
``````

The error I get in colab is the following:

``````KeyError                                  Traceback (most recent call last)
117     try:
--> 118         return box_type_mappings[type(value)](value, trace, node)
119     except KeyError:

KeyError: <class 'int'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
15 frames
<ipython-input-15-324f31e5cf29> in <cell line: 16>()
15 print(f(x))

163             return ()
164
166         self._forward = ans
167

18             else:
19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
21         return nary_f
22     return nary_operator

181         difference being that it returns both the gradient *and* the forward pass
182         value."""
--> 183         vjp, ans = _make_vjp(fun, x)  # pylint: disable=redefined-outer-name
184
185         if vspace(ans).size != 1:

8 def make_vjp(fun, x):
9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
11     if end_node is None:
12         def vjp(g): return vspace(x).zeros()

8     with trace_stack.new_trace() as t:
9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
11         if isbox(end_box) and end_box._trace == start_box._trace:
12             return end_box._value, end_box._node

13                 else:
14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
16             if isinstance(argnum, int):
17                 x = args[argnum]

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/qnode.py in __call__(self, *args, **kwargs)
1096
1097         try:
-> 1098             res = self._execution_component(args, kwargs, override_shots=override_shots)
1099         finally:
1100             if old_interface == "auto":

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/qnode.py in _execution_component(self, args, kwargs, override_shots)
1050
1051         # pylint: disable=unexpected-keyword-arg
-> 1052         res = qml.execute(
1053             (self._tape,),
1054             device=self.device,

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/execution.py in execute(tapes, device, gradient_fn, interface, transform_program, config, grad_on_execution, gradient_kwargs, cache, cachesize, max_diff, override_shots, expand_fn, max_expansion, device_batch_transform, device_vjp)
796         )
797
--> 798     return post_processing(results)
799
800

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in _apply_postprocessing_stack(results, postprocessing_stack)
86     """
87     for postprocessing in reversed(postprocessing_stack):
---> 88         results = postprocessing(results)
89     return results
90

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in _batch_postprocessing(results, individual_fns, slices)
56
57     """
---> 58     return tuple(fn(results[sl]) for fn, sl in zip(individual_fns, slices))
59
60

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in <genexpr>(.0)
56
57     """
---> 58     return tuple(fn(results[sl]) for fn, sl in zip(individual_fns, slices))
59
60

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/dynamic_one_shot.py in processing_fn(results, has_partitioned_shots, batched_results)
157         mcm_samples = np.zeros((len(results), n_mcms), dtype=np.int64)
158         for i, res in enumerate(results):
--> 159             mcm_samples[i, :] = [res] if single_measurement else res[-n_mcms::]
160         mcm_mask = qml.math.all(mcm_samples != -1, axis=1)

44             ans = f_wrapped(*argvals, **kwargs)
45             node = node_constructor(ans, f_wrapped, argvals, kwargs, argnums, parents)
---> 46             return new_box(ans, trace, node)
47         else:
48             return f_raw(*args, **kwargs)

118         return box_type_mappings[type(value)](value, trace, node)
119     except KeyError:
--> 120         raise TypeError("Can't differentiate w.r.t. type {}".format(type(value)))
121
122 box_types = Box.types

TypeError: Can't differentiate w.r.t. type <class 'int'>
``````

Hi @ahirth,

If I understand correctly your original issue was having an error related to the number of qubits when using `@qml.defer_measurements` right? In that case you need to add one extra qubit per measurement. In your code above that means 5 extra qubits. You mentioned needing many mid-circuit measurements. Is it in a range where you could simply add more extra qubits or are they so many that you require an option without adding extra ones?

``````import pennylane as qml
from pennylane import numpy as np # import numpy from pennylane

dev = qml.device("default.qubit",wires=6, shots=10)

@qml.defer_measurements
@qml.qnode(dev)
def f(x):
qml.RX(x, 0)
for i in range(5):
qml.measure(0)
return qml.expval(qml.PauliX(0)) # return expval instead of sample

print(f(x))
``````

Thanks for taking a look!

I specifically wanted to avoid adding one qubit per measurement with `defer_measurements`, as this would be untenable when there are many mid-circuit measurements (1000+). This is why I’m trying to use finite-shot simulations, `parameter-shift` gradients and configurations that use `dynamic_one_shot` instead of `defer_measurements`.

Hi @ahirth , as @Tom_Bromley mentioned, I’m currently working on a bug fix to add better compatibility for pytorch with mid-circuit measurements. While the bug fix is still open, it has been updated to support the torch interface with parameter shift, so you’re welcome to install pennylane from the branch for the bug fix. You can find the pull request for the bug fix here, and you can install pennylane from that branch using `pip install git+https://github.com/PennyLaneAI/pennylane.git@dos-interfaces`. Feel free to continue the discussion here if you come across any other issues.

In the mean time, I will continue investigating the issues you brought up with the autograd interface.

1 Like

Thanks @mudit, awesome work !

Hi @mudit,

Thank you very much! I’ve switched to this branch and done some preliminary testing. In case you haven’t seen this yet, I’m getting the following issue on it… Here’s a modification of the previously discussed script, with highlighted changes, tested with `!pip install git+https://github.com/PennyLaneAI/pennylane.git@dos-interfaces`

``````import pennylane as qml
import torch

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev, interface='torch') # switch to torch interface
def f(x):
qml.RX(x, 0) # remove extraneous instructions
return qml.expval(qml.measure(0)) # REPLACE PauliX with measure

x = torch.tensor(0.4, requires_grad=True) # switch to torch tensor
result = f(x)
result.backward() # replace with torch gradient computation
``````

And the associated error:

``````/usr/local/lib/python3.10/dist-packages/autoray/autoray.py:81: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return func(*args, **kwargs)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-879285d05025> in <cell line: 13>()
11 x = torch.tensor(0.4, requires_grad=True) # switch to torch tensor
12 result = f(x)
---> 13 result.backward() # replace with torch gradient computation

1 frames
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
520                 inputs=inputs,
521             )
523             self, gradient, retain_graph, create_graph, inputs=inputs
524         )

264     # some Python versions print out the first line of a multi-line function
265     # calls in the traceback and some print out the last line
--> 266     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
267         tensors,
Note that this code block works just fine if you swap `shots=10` to `shots=None`.