Running circuits with many mid-circuit measurements

I’d like to simulate very long circuits with many mid-circuit measurements as space and time-efficiently as possible, and am having trouble figuring out what the best approach is in terms of simulator settings.

As a toy example, here is some code that generates a parameterized circuit that can potentially have very many mid-circuit measurements.

import pennylane as qml
import torch

def qnode_gen(device, diff_method='backprop', postselect=None):
    @qml.qnode(device, interface='torch', diff_method=diff_method)
    def qnode(inputs, weight):
        measurements = []
        for real in inputs:
            qml.RY(real, 0)
            qml.RY(weight, 0)
            m = qml.measure(0, reset=True, postselect=postselect)
            measurements.append(m)
        return tuple(qml.expval(m) for m in measurements)
    return qnode

ITERATIONS = 4
torch.manual_seed(6)
inputs = torch.rand(ITERATIONS)
weight = torch.rand(1).requires_grad_()
print(inputs)

qnode = qnode_gen(qml.device('default.qubit', wires=1))
fig, _ = qml.draw_mpl(qnode)(inputs, weight)

This produces the following circuit diagram for a circuit with four measurements:

If you’re interested enough to see a testing script for this toy example, I have one here.

I’ve tested a number of combinations of different torch and pennylane device settings without finding one that runs satisfyingly quickly for larger values of ITERATIONS. In summary:

  1. qml.device('default.qubit', diff_method='backprop') has exponentially-sized saved tensors in the computation graph (Saved by BmmBackward0 and PowBackward0). This makes sense since it has to maintain a statistical mixture of measurement outcomes.
  2. qml.device('default.qubit', diff_method='parameter-shift') doesn’t have exponentially-sized saved tensors in the computation graph, but still scales poorly. I expect for the same reason: the forward pass requires maintaining the statistical mixture.
    1. qml.device('default.qubit', diff_method='parameter-shift', shots=10) gives me a “probabilities do not sum to 1” error.
  3. qml.device('default.qubit.torch') with either method causes an error where the fixed number of qubits isn’t enough to support the additional qubits that the automatic call to defer_measurements requires.
  4. Using a torch.device('cuda') instead of cpu presents an issue in case 1) but not in case 2) involving not all tensors being on the same device.
  5. It occurs to me that if all these measurements are post-selected, then there shouldn’t be an exponential scaling issue in cases 1) and 2), which makes me think my justification is wrong.

In more summary:
I have a number of issues or errors trying different options and figured it would be more time-efficient to ask what the best approach is before going down the debugging rabbit hole:

What is the best simulator and torch device configuration for circuits with many mid-circuit measurements, like above example?

I suspect it will be qml.device('default.qubit', diff_method='parameter-shift', shots=some_int) on torch.device('cuda').

Thank you very much for reading this, and any advice you can provide!

Hi @ahirth! Thanks for posting this, it’s really useful feedback for a feature we’re actively developing in 2024.

Historically, PennyLane has always used the deferred measurement approach to carry out mid-circuit measurements by adding additional qubits. This was a known scaling issue and we have added a one-shot approach in the 0.35 release of PennyLane, allowing in principle to perform hundreds or thousands of mid-circuit measurements when working in a finite-shot setting.

However, your example seems to highlight a few issues that we’ve recorded on our GitHub repo:

  • 5443 - issue with falling back to the deferred measurement approach when broadcasting.
  • 5444 - issue with probabilities not summing up to one when broadcasting.
  • 5442 - not completely related, but issues with using Torch when returning qml.sample.

The reason I mention broadcasting is because weight in your example has a shape of [1] and hence looks like a batch dimension to PennyLane. You could solve this for now by doing weight = torch.rand(1).requires_grad_().squeeze().

Thank you! How lucky am I to not only receive a response so quickly, but for the feature I needed to be a recent addition.

To confirm: My target to get working should be a default.qubit device with finite shots, which should work on cpu and cuda. I noticed diff_method isn’t mentioned in the link for one-shot approaches, does that choice matter here?

Additionally I need to make sure that no broadcasting is happening, to avoid the bugs you found.

To make sure I understand broadcasting correctly: Would changing the use of weight from qml.RY(weight, 0) to qml.RY(weight[0], 0) be an equivalent temporary solution?:

  • My understanding is that the dimension of inputs is checked (in the forward function of TorchLayer), and any input with more than one dimension is considered batched. I don’t know if the dimension of other inputs, like weight/weights is checked explicitly.
  • My guess is that non-inputs parameter batching is performed based on the shape of the value that each base level operation (eg qml.RY) receives, rather than by considering the explicit shape of non-inputs parameters of the qnode.
  • This should enable us to send any arbitrary shape tensor into non-inputs parameters, as long as we make sure the operations themselves are receiving the right shape inputs (zero-dim tensors for qml.RY).
  • If this guess is incorrect, every parameter needs to be flattened, or sent as its own argument.

Is there a way to explicitly check if batching is happening in my code to make sure I’m avoiding it for now? (without modifying library code :smile: )

Thank you so much!

To add to this:

  1. I don’t think qml.dynamic_one_shot is available in the 0.35.1 release on pip unless I’m mistaken.
  2. It is available on 0.36.0.dev0, but I’m running into an issue that looks like 5319.

It would be challenging to post a more detailed error message at the moment due to the size of my code, but if anything comes to mind as a fix or workaround, I’d greatly appreciate it.

Here’s a colab that highlights what I’m running into: link. In short, default.qubit with parameter-shift and finite shots either errors or yields undifferentiable results with interfaces torch, tf and jax, for the circuit example presented above, on version 0.36.0.dev0.

Might be related to 5316, but the errors look quite different.

Hey @ahirth! Apologies for the delay in getting back to you. I just wanted to quickly jump on here and let you know that I’m looking into the issue and will get back to you shortly!

Unfortunately I don’t think there’s an intermediate fix for now, but we have this issue tracked internally and will update you when we can :slight_smile:

Thank you very much @isaacdevlugt for looking into this!

Is a fix for this something that I could help push forward? I’m in need of both parameter-shift gradients and an efficient simulator for circuits with many mid-circuit measurements, and pennylane seems like the closest thing that exists.

If this isn’t an internal-only project, how large do you estimate it to be, and where should I look first?

Hey @ahirth!

There were lots of updates and fixes in the recent v0.36 release. Have you tried updating PennyLane and seeing if your issue is solved?

I have, but unfortunately my issue persists. Here’s a colab that demonstrates it: link.

To summarize, for parameterized circuits with many mid-circuit measurements:

  1. default.qubit and parameter-shift work in analytic mode (shots=None), but take time exponential in circuit depth.
  2. Other tested configurations using parameter-shift fail for one reason or another:
    • default.qubit and shots=10
    • lightning.qubit and shots=None
    • lightning.qubit and shots=10
    • all three of above with torch instead of autograd interface.

I hope these results are incorrect and I instead made a user error while switching from torch to autograd.

Hi @ahirth!

The problems you mention while using the torch interface might be resolved with this issue, which @mudit2812 is working on a fix for.

Though I’m interested to understand the issues you’re seeing with the autograd interface. So we’re on the same page, the following works for me in v0.36:

import pennylane as qml
import numpy as np

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev)
def f(x):
    qml.RX(x, 0)
    for i in range(5):
        qml.Hadamard(0)
        qml.measure(0)
    return qml.sample(qml.PauliX(0))

x = np.array(0.4)
f(x)

This is using default.qubit and shots=10, but things could still break in a more complicated script. Are you able to share a minimal example?

For lightning.qubit, things work in the above if you update the device line:

dev = qml.device("lightning.qubit", shots=10, wires=1)

We’ve made some improvements to the PennyLane-Lightning device in the most recent release so it might be worth making sure you are using the latest version:

pip install pennylane-lightning --upgrade

I’ve made some modifications to your code block for the sake of computing gradients:

import pennylane as qml
from pennylane import numpy as np # import numpy from pennylane

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev)
def f(x):
    qml.RX(x, 0)
    for i in range(5):
        qml.Hadamard(0)
        qml.measure(0)
    return qml.expval(qml.PauliX(0)) # return expval instead of sample

x = np.array(0.4, requires_grad=True) # make input require grad
print(f(x))
qml.grad(f)(x) # compute the gradient

The error I get in colab is the following:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/autograd/tracer.py in new_box(value, trace, node)
    117     try:
--> 118         return box_type_mappings[type(value)](value, trace, node)
    119     except KeyError:

KeyError: <class 'int'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
15 frames
<ipython-input-15-324f31e5cf29> in <cell line: 16>()
     14 x = np.array(0.4, requires_grad=True)
     15 print(f(x))
---> 16 qml.grad(f)(x)

/usr/local/lib/python3.10/dist-packages/pennylane/_grad.py in __call__(self, *args, **kwargs)
    163             return ()
    164 
--> 165         grad_value, ans = grad_fn(*args, **kwargs)  # pylint: disable=not-callable
    166         self._forward = ans
    167 

/usr/local/lib/python3.10/dist-packages/autograd/wrap_util.py in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

/usr/local/lib/python3.10/dist-packages/pennylane/_grad.py in _grad_with_forward(fun, x)
    181         difference being that it returns both the gradient *and* the forward pass
    182         value."""
--> 183         vjp, ans = _make_vjp(fun, x)  # pylint: disable=redefined-outer-name
    184 
    185         if vspace(ans).size != 1:

/usr/local/lib/python3.10/dist-packages/autograd/core.py in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

/usr/local/lib/python3.10/dist-packages/autograd/tracer.py in trace(start_node, fun, x)
      8     with trace_stack.new_trace() as t:
      9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
     11         if isbox(end_box) and end_box._trace == start_box._trace:
     12             return end_box._value, end_box._node

/usr/local/lib/python3.10/dist-packages/autograd/wrap_util.py in unary_f(x)
     13                 else:
     14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
     16             if isinstance(argnum, int):
     17                 x = args[argnum]

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/qnode.py in __call__(self, *args, **kwargs)
   1096 
   1097         try:
-> 1098             res = self._execution_component(args, kwargs, override_shots=override_shots)
   1099         finally:
   1100             if old_interface == "auto":

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/qnode.py in _execution_component(self, args, kwargs, override_shots)
   1050 
   1051         # pylint: disable=unexpected-keyword-arg
-> 1052         res = qml.execute(
   1053             (self._tape,),
   1054             device=self.device,

/usr/local/lib/python3.10/dist-packages/pennylane/workflow/execution.py in execute(tapes, device, gradient_fn, interface, transform_program, config, grad_on_execution, gradient_kwargs, cache, cachesize, max_diff, override_shots, expand_fn, max_expansion, device_batch_transform, device_vjp)
    796         )
    797 
--> 798     return post_processing(results)
    799 
    800 

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in _apply_postprocessing_stack(results, postprocessing_stack)
     86     """
     87     for postprocessing in reversed(postprocessing_stack):
---> 88         results = postprocessing(results)
     89     return results
     90 

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in _batch_postprocessing(results, individual_fns, slices)
     56 
     57     """
---> 58     return tuple(fn(results[sl]) for fn, sl in zip(individual_fns, slices))
     59 
     60 

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/core/transform_program.py in <genexpr>(.0)
     56 
     57     """
---> 58     return tuple(fn(results[sl]) for fn, sl in zip(individual_fns, slices))
     59 
     60 

/usr/local/lib/python3.10/dist-packages/pennylane/transforms/dynamic_one_shot.py in processing_fn(results, has_partitioned_shots, batched_results)
    157         mcm_samples = np.zeros((len(results), n_mcms), dtype=np.int64)
    158         for i, res in enumerate(results):
--> 159             mcm_samples[i, :] = [res] if single_measurement else res[-n_mcms::]
    160         mcm_mask = qml.math.all(mcm_samples != -1, axis=1)
    161         mcm_samples = mcm_samples[mcm_mask, :]

/usr/local/lib/python3.10/dist-packages/autograd/tracer.py in f_wrapped(*args, **kwargs)
     44             ans = f_wrapped(*argvals, **kwargs)
     45             node = node_constructor(ans, f_wrapped, argvals, kwargs, argnums, parents)
---> 46             return new_box(ans, trace, node)
     47         else:
     48             return f_raw(*args, **kwargs)

/usr/local/lib/python3.10/dist-packages/autograd/tracer.py in new_box(value, trace, node)
    118         return box_type_mappings[type(value)](value, trace, node)
    119     except KeyError:
--> 120         raise TypeError("Can't differentiate w.r.t. type {}".format(type(value)))
    121 
    122 box_types = Box.types

TypeError: Can't differentiate w.r.t. type <class 'int'>

Hi @ahirth,

Thanks for sharing your code. I can replicate your error.

If I understand correctly your original issue was having an error related to the number of qubits when using @qml.defer_measurements right? In that case you need to add one extra qubit per measurement. In your code above that means 5 extra qubits. You mentioned needing many mid-circuit measurements. Is it in a range where you could simply add more extra qubits or are they so many that you require an option without adding extra ones?

import pennylane as qml
from pennylane import numpy as np # import numpy from pennylane

dev = qml.device("default.qubit",wires=6, shots=10)

@qml.defer_measurements
@qml.qnode(dev)
def f(x):
    qml.RX(x, 0)
    for i in range(5):
        qml.Hadamard(0)
        qml.measure(0)
    return qml.expval(qml.PauliX(0)) # return expval instead of sample

x = np.array(0.4, requires_grad=True) # make input require grad
print(f(x))
qml.grad(f)(x) # compute the gradient

Thanks for taking a look!

I specifically wanted to avoid adding one qubit per measurement with defer_measurements, as this would be untenable when there are many mid-circuit measurements (1000+). This is why I’m trying to use finite-shot simulations, parameter-shift gradients and configurations that use dynamic_one_shot instead of defer_measurements.

Hi @ahirth , as @Tom_Bromley mentioned, I’m currently working on a bug fix to add better compatibility for pytorch with mid-circuit measurements. While the bug fix is still open, it has been updated to support the torch interface with parameter shift, so you’re welcome to install pennylane from the branch for the bug fix. You can find the pull request for the bug fix here, and you can install pennylane from that branch using pip install git+https://github.com/PennyLaneAI/pennylane.git@dos-interfaces. Feel free to continue the discussion here if you come across any other issues.

In the mean time, I will continue investigating the issues you brought up with the autograd interface.

1 Like

Thanks @mudit, awesome work ! :raised_hands:

Hi @mudit,

Thank you very much! I’ve switched to this branch and done some preliminary testing. In case you haven’t seen this yet, I’m getting the following issue on it… Here’s a modification of the previously discussed script, with highlighted changes, tested with !pip install git+https://github.com/PennyLaneAI/pennylane.git@dos-interfaces

import pennylane as qml
import torch

dev = qml.device("default.qubit", shots=10)

@qml.qnode(dev, interface='torch') # switch to torch interface
def f(x):
    qml.RX(x, 0) # remove extraneous instructions
    return qml.expval(qml.measure(0)) # REPLACE PauliX with measure

x = torch.tensor(0.4, requires_grad=True) # switch to torch tensor
result = f(x) 
result.backward() # replace with torch gradient computation
x.grad

And the associated error:

/usr/local/lib/python3.10/dist-packages/autoray/autoray.py:81: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return func(*args, **kwargs)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-879285d05025> in <cell line: 13>()
     11 x = torch.tensor(0.4, requires_grad=True) # switch to torch tensor
     12 result = f(x)
---> 13 result.backward() # replace with torch gradient computation
     14 x.grad

1 frames
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    520                 inputs=inputs,
    521             )
--> 522         torch.autograd.backward(
    523             self, gradient, retain_graph, create_graph, inputs=inputs
    524         )

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    264     # some Python versions print out the first line of a multi-line function
    265     # calls in the traceback and some print out the last line
--> 266     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    267         tensors,
    268         grad_tensors_,

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Note that this code block works just fine if you swap shots=10 to shots=None.

Hi @ahirth, thank you for your message. Mudit is taking a look at this issue. We should be back in the next couple of days with an answer.

Hi @ahirth ! I have opened an issue to track the differentiability error you have spotted here. I’m not actively working on this right now, but this is on the team’s radar, and we’re aiming to have this fixed for the next PennyLane release.

Thank you very much! Looking forward to a fix. Please let me know if there’s anything I can do to push it forward.