I’d like to simulate very long circuits with many mid-circuit measurements as space and time-efficiently as possible, and am having trouble figuring out what the best approach is in terms of simulator settings.
As a toy example, here is some code that generates a parameterized circuit that can potentially have very many mid-circuit measurements.
import pennylane as qml
import torch
def qnode_gen(device, diff_method='backprop', postselect=None):
@qml.qnode(device, interface='torch', diff_method=diff_method)
def qnode(inputs, weight):
measurements = []
for real in inputs:
qml.RY(real, 0)
qml.RY(weight, 0)
m = qml.measure(0, reset=True, postselect=postselect)
measurements.append(m)
return tuple(qml.expval(m) for m in measurements)
return qnode
ITERATIONS = 4
torch.manual_seed(6)
inputs = torch.rand(ITERATIONS)
weight = torch.rand(1).requires_grad_()
print(inputs)
qnode = qnode_gen(qml.device('default.qubit', wires=1))
fig, _ = qml.draw_mpl(qnode)(inputs, weight)
This produces the following circuit diagram for a circuit with four measurements:
If you’re interested enough to see a testing script for this toy example, I have one here.
I’ve tested a number of combinations of different torch and pennylane device settings without finding one that runs satisfyingly quickly for larger values of ITERATIONS
. In summary:
qml.device('default.qubit', diff_method='backprop')
has exponentially-sized saved tensors in the computation graph (Saved byBmmBackward0
andPowBackward0
). This makes sense since it has to maintain a statistical mixture of measurement outcomes.qml.device('default.qubit', diff_method='parameter-shift')
doesn’t have exponentially-sized saved tensors in the computation graph, but still scales poorly. I expect for the same reason: the forward pass requires maintaining the statistical mixture.qml.device('default.qubit', diff_method='parameter-shift', shots=10)
gives me a “probabilities do not sum to 1” error.
qml.device('default.qubit.torch')
with either method causes an error where the fixed number of qubits isn’t enough to support the additional qubits that the automatic call todefer_measurements
requires.- Using a
torch.device('cuda')
instead ofcpu
presents an issue in case 1) but not in case 2) involving not all tensors being on the same device. - It occurs to me that if all these measurements are post-selected, then there shouldn’t be an exponential scaling issue in cases 1) and 2), which makes me think my justification is wrong.
In more summary:
I have a number of issues or errors trying different options and figured it would be more time-efficient to ask what the best approach is before going down the debugging rabbit hole:
What is the best simulator and torch device configuration for circuits with many mid-circuit measurements, like above example?
I suspect it will be qml.device('default.qubit', diff_method='parameter-shift', shots=some_int)
on torch.device('cuda')
.
Thank you very much for reading this, and any advice you can provide!