Faster sampling of CV-circuit expectations


I am currently trying to sample from a CV circuit and i am looking to speed up the process (at least on the simulator). My problem is that the circuit has to be run multiple times in a for loop in order to generate the samples. I thought about using just-in-time compilation provided by the library numba to speed up the for loop but it seems that it is only compatible with python functions and not pennylane qnode or strawberry fields engine. Does anyone already used numba with strawberryfields, or has a better idea?

I saw a post proposing to use the Qulacs plugin. However, this only acts on the circuit, if i am not mistaken. Here i am trying to speed up an external for loop

Thanks a lot

Hey @Oriel,

Interesting idea to add a numba decorator to a QNode (and sad, but maybe not surprising, that it does not work out-of-the box).

I wonder if you want to post a small example that explains what your for loop runs over? We have thought a lot about how to cache or jit qnodes in future, and the example may give us a good use case!

Another option you could try is to set the mutable=False kwarg in the qnode, in which case the circuit representation is not recalculated behind the scenes - just make sure the circuit structure does not depend on the arguments.

Hi Maria,
Thanks for the quick response!

I am trying to see where we could potentially get a computational speed up and my guess is to look at pythonic for loop, since they are usually awfully slow. In my application, i have 2 main for loop, one to generate sample with the phonotic device:

n_wires = 3
n_shot = 1000
dev = qml.device("strawberryfields.fock", wires = n_wires, cutoff_dim=10, shots=1)

def qnn(parameter):
   #some CV circuit depending on parameter
   return [qml.expval(qml.QuadOperator(0),wires = k) for k in range(n_wires)]

def sample(parameter):
    sample = []
    for i in range(n_shot):
    return sample

Here i think we can’t not avoid the for loop since on the photonic device, we do not have access to the wavefunction as in the qubit system.

The second for loop is over the parameters when computing the gradient with the shift rule (i am not using autograd because i am not computing expectation value). So it consists of a for loop of length (2n), while n is the number of parameters.

The idea of using numba to perform these loops in parallel comes from this QONN paper, where they used it to speed up their calculations. They used a custom code. From what i see, all these for loop are independent so there is no reason why we could not compute them in parallel.
The problem is that numba only works for a small set of python code and i do not know straweberry.fields enough, to know if it could be easiliy adapted.

PS: setting mutable=False gave me a 10% speed up so thank you for that!

Aha lucky I asked! If I understand correctly, you want to draw single samples from an observable measurement. Of course, simulating the circuit 1000 times and using shots=1 is really wasteful. I doubt that any solution (besides maybe heavy parallelisation) will help here, instead, PennyLane should only simulate the circuit once.

Unfortunately, you are a victim here of our lower priority treatment of CV devices. I wonder if you could open an issue with a feature request on the pennylane-SF repo, if we see demand then it is easier to justify distributing resources!

For some context, in qubit-based devices what you want is a breeze:

You can either just use the sample() measurement type:

n_wires = 3
n_shot = 1000
dev = qml.device("default.qubit", wires=n_wires, shots=1000)

def qnn():
   return [qml.sample(qml.PauliX(i)) for i in range(n_wires)]

# [[ 1  1  1 ...  1  1  1]
#  [ 1 -1  1 ...  1  1 -1]
#  [-1 -1 -1 ...  1  1  1]]

Note that, if I remember correctly, the samples are not trainable though, since gradients of samples are ill defined.

This leads to the second idea, which uses the fact that gradients of 1-shot estimated observables are well-defined (the gradients are just very crude estimates of the real gradients). Your code on a qubit device could be rewritten using shots batching, which simulates the circuit once, and instead of returning one set of measurement results it returns 1000 sets. Each set is still interpreted as a single-shot estimation of an expectation:

n_wires = 3
dev = qml.device("default.qubit", wires=n_wires, shots=[(1, 1000)]) 
# can also use `shots=[1]*1000`, but the above syntax is a bit faster for technical reasons

def qnn():
   return [qml.expval(qml.PauliX(i)) for i in range(n_wires)]


# [[ 1.  1.  1.]
# [ 1.  1. -1.]
# [ 1.  1. -1.]
# ...
# [ 1.  1. -1.]
# [ 1. -1.  1.]
# [ 1.  1.  1.]]

The difference between the two are subtle and very conceptual, but the latter should be trainable and resembles your example more.

I am not sure how easy the second one would be to support on CV devices, but if there is demand we will try it!

1 Like

I already discussed shot batching for CV circuits a while ago, and an issue has been raised. However, I understood that it is not trivial to do since the wavefunction is not directly accessible.

From what you say, I think my best shot is to discretize my problem and to use the qubit system.
Thanks again!

Ah thanks for linking the issue. I don’t think there is a deeper problem preventing us from supporting this - if you can get one shot, we should be able to just return a batch of shots instead by repeating whatever is done behind the scenes! But I’ll check with my colleagues to be sure.

Let’s move the discussion to the issue and see what we can do. In the meantime, if you can switch to qubits that might be easier for now! :slight_smile:

1 Like