Differentiable circuit with sampling

Python functions that define circuits which return sampling are not differentiable. This is quite understandable. But doesn’t qml.expVal() use sampling to estimate the energy? Why circuits that return qml.expVal() can be differentiated?

Hey @Einar_Gabbassov!

But doesn’t qml.expVal() use sampling to estimate the energy?

Expectation values can be calculated either using sampling or analytically with access to the underlying state (only compatible with simulators). Analytic mode is specified by setting shots=None when loading your PennyLane device and is the default.

Why circuits that return qml.expVal() can be differentiated?

While it doesn’t make sense to differentiate a given sample from a random process (e.g. a quantum circuit), it is ok to differentiate a statistic defined on the random process (such as the expectation value).

The expectation value is a function \mu = f(\boldsymbol{\theta}) that maps from the circuit parameters \boldsymbol{\theta} to the corresponding value. With an exact simulator, we know \mu exactly while on an approximate simulator or quantum hardware we find an estimate \hat{\mu}.

The gradient of the expectation value g(\boldsymbol{\theta}) := \nabla_{\boldsymbol{\theta}} f(\boldsymbol{\theta}) is also well defined. Again, we can find g(\boldsymbol{\theta}) exactly with an exact simulator or estimate it when we have a limited number of shots. Check out Sec. 4 of this paper to get a deeper understanding.

In other words, having a finite number of shots just means that the gradient of the expectation value is approximate.

1 Like

hi @Tom_Bromley
Thank you for the detailed reply!
This brings me to the question which I wanted to ask but decided to differ until I get the answer for the first question.

In practice, one might want to use bias estimates. As far as I know expVal() gives unbiased estimate. This is not flexible.
Ideally, a circuit like this would be very useful

samples = qml.sample(Operator)
return compute_biased_average(samples)

The above could be differentiated. However, as far as I know we can’t do this for two reasons:

  1. Pennylane gives warning that circuits which use sampling can’t be differentiated.
  2. According to the Docs, circuit functions must not contain arbitrary code, except multiplication operations.

Unless, I miss some technicality (which is very likely), the above puzzles me a lot. We can use samples to compute expVal and do differentiation, but we can’t use samples to compute other estimates such that the circuit is still differentiable.
I hope I’m wrong, otherwise it makes my day so much harder :sweat_smile:

Hi @Einar_Gabbassov!

I believe what you want to do is possible, using PennyLane’s concept of ‘shot-batching’.

For example, consider the following QNode:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=2, shots=[(1, 100)])

def circuit(params):
    for i in range(2):
        qml.RX(params[i, 0], wires=i)
        qml.RY(params[i, 1], wires=i)

    qml.CNOT(wires=[0, 1])

    for i in range(2):
        qml.RX(params[2 + i, 0], wires=i)
        qml.RY(params[2 + i, 1], wires=i)

    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

We have specified that the device has shots [(1, 100)], which corresponds to [(size_of_batch, number of batches)]. This means that the QNode will be executed for 100 shots on the remote hardware/simulator, however, these 100 shots will be post-processed in batches of 1. In essence, we are computing single-shot expectation values:

>>> params = np.ones([4, 2], dtype=np.float64)
>>> circuit(params)
[ 1.  1.  1. -1.  1. -1. -1. -1. -1.  1.  1. -1. -1. -1. -1.  1.  1. -1.
  1.  1. -1.  1.  1.  1. -1. -1.  1. -1. -1.  1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1.  1. -1.  1.  1.  1. -1. -1. -1. -1.  1. -1.  1. -1.
 -1. -1. -1. -1. -1. -1. -1.  1. -1.  1. -1.  1. -1. -1.  1. -1. -1. -1.
  1.  1. -1.  1. -1.  1. -1.  1. -1.  1.  1. -1. -1.  1. -1. -1. -1. -1.
  1.  1. -1.  1. -1.  1. -1.  1.  1.  1.]

Note that we are not using sample, we are using expval, so this is differentiable!

Let’s create a cost function that processes this to compute the population variance:

def cost(params):
    samples = circuit(params)
    return np.var(samples, ddof=1)

This can be differentiated:

>>> cost(params)
>>> qml.grad(cost)(params)
[[-0.03636364  0.14343434]
 [ 0.13131313  0.04848485]
 [-0.02626263 -0.05252525]
 [ 0.07474747 -0.03030303]]

More details on shot-batching is available here: https://pennylane.readthedocs.io/en/stable/introduction/circuits.html#shot-batches

1 Like

@josh thanks a lot! This is great!

Hi, thanks everyone for the nice discussion.
I am trying to do the same with the strawberry.fock backend but I get a single output, even if I specify shots = [(1,100)].

More specifically, I am using

dev = qml.device("strawberryfields.fock", wires=1, cutoff_dim=10, shots=[(1,100)])
and the same photonic quantum neural network as in the tutorial (with expval(qml.X(0))

Does anyone have an idea why i do not get 100 output like in the answer from Josh? Thanks a lot

Hi @Oriel, unfortunately I believe that this is because the strawberryfields.fock device has not yet been updated to support shot batching.

Thanks for catching this! I will create a GitHub issue to alert the team.

Here is the issue:

I’ve been trying the proposed approach but it messes up outputs from the qml.ExpvalCost().

For example, I have a device with shots = 100.
I call qml.ExpvalCost(), this returns a single value, ok!
Next, I modify my device so that it does batch shots, dev.shots=[(1,100)]
and I run my custom cost function which is now differentiable because I use batches with qml.expval(). Then, I try to set the device back to the default behaviour dev.shots=100.
But this breaks qml.ExpvalCost() as it does not output a single value but an average energy for each Hamiltonian term.

I would post an error, but it is pretty hard to pin point a meaningful error trace.

How I could temporarily set batch shots and then revert back to default single batch behaviour without qml.ExpvalCost() doing funny stuff?

ah, I just realized that I have a single objective function which consists of two functions. One function tries to make use of the default device and the other function is trying to use the device with batch shots.
Trying manipulate the device from inside the functions doesn’t work.

Hey @Einar_Gabbassov, could you post a small code example showing what you mean?

On the same note, i would like to efficiently sample from circuits with shifted parameters, where the shifts correspond to the shift in the shift rule. I mean, if I have a circuit with a single qubit rotation with angle theta, I would like to sample from the circuit with angle theta + pi/2 and theta - pi/2.
I tried to do it myself but I am lacking efficiency since I basically do a for loop over all parameters. Does anyone know if this is already implemented or if there is a more efficient way to do so?
Thanks a lot

Hey @Oriel, there is a manual implementation of the shift rule available in the backpropagation tutorial:

def parameter_shift_term(qnode, params, i):
    shifted = params.copy()
    shifted[i] += np.pi/2
    forward = qnode(shifted)  # forward evaluation

    shifted[i] -= np.pi
    backward = qnode(shifted) # backward evaluation

    return 0.5 * (forward - backward) 

def parameter_shift(qnode, params):
    gradients = np.zeros([len(params)])

    for i in range(len(params)):
        gradients[i] = parameter_shift_term(qnode, params, i)

    return gradients

This assumes that the QNode only takes a single parameter, and this single parameter is a 1-dimensional NumPy array :slight_smile:

hi @josh

Sorry for taking so long, been busy.

I attached the snippet of the code at the end of this post. Basically, the code has a Hamiltonian with 4 terms and the device is configured to use batch shots i.e, shots=[1, num_batches]. Strangely, the output from the expValCost function are 4 values. This doesn’t make sense, because based on the default mechanics, expValCost should output expectations of energy of the entire Hamiltonian and not expectations of each Hamiltonian term.

I was able to rectify this by writing my own expValCost which works with batch shots. However, the fundamental problem arises when trying to do gradient descent optimization.

From what I understand, suppose we use MyExpValCost() that produces Hamiltonian energy samples (each energy sample is evaluated from a single shot but we have many batches just as you recommended).
This would yield an array of Hamiltonian energies:

energy_samples = MyExpValCost(params)
print("energy samples:", energy_samples)

The output for 10 batches is 10 expected energies of the Hamiltonian H. All 10 expectations are estimated from 1 shot.
That is, we have,
\{<H>_1, <H>_2, ..., <H>_{10}\}.
Next, I create a cost function like so
cost = \sum_{i=1}^{10} <H>_i.
As you claimed this is differentiable.
The fundamental problem with this is that it takes orders of magnitude more time to evaluate. Let’s say I do 100000 batches (each batch is one shot). Then the autograd should differentiate a sum which contains 100000 differentiable terms.

Just to conclude, there are two problems:

  1. ExpValCost has strange output when batch shots are used.
  2. Even if 1. is not the case, the differentiation with batch shots is extremely slow compared to just differentiation of a cost function which uses expValCost with NO batch shots.

I still need a differentiable circuit which can do sampling then process sampling for producing some biased estimates.

samples = qml.sample(Operator)
return compute_biased_average(samples)

Here is the code which shows that expValCost produces 4 terms instead of Hamiltonian energies:

objective_hamiltonian = qml.Hamiltonian([1,1,1,1], [qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(2), qml.PauliZ(3)])

def ansatz(params, wires, depth, h_objective_coeffs, **kwargs):
    params = params.reshape((depth, 2))
    # Initial state
    qml.BasisState(np.array([0]*num_qubits, requires_grad=False), wires=wires)
    # Add Hadamard gates
    for i in wires:

num_qubits = 4      
num_shots = 10000
# Use batch shots for differentiability
dev = qml.device('default.qubit', wires=num_qubits, shots=[1,num_shots])

depth = 1
penalty = 2

ansatz_settings = {
    "h_objective_coeffs": objective_hamiltonian.coeffs,
    "wires": dev.wires,
    "depth": depth

ansatz = partial(ansatz, **ansatz_settings)

objective_cost_fn = qml.ExpvalCost(ansatz, objective_hamiltonian, dev, optimize=True)

params = np.zeros((1,2))

Hi @Einar_Gabbassov. It seems that using optimize=True, which enables separating the Hamiltonian observable terms into qubit-wise commuting groups, causes the issue with shot-batching. You can disable the grouping by using optimize=False which might slightly slow down your simulations depending on the Hamiltonian used.

Please note that we are working on a functionality that supports using expval(H) directly and after having that functionality implemented in PennyLane, ExpvalCost will be very likely deprecated. Please let us know if you have any other questions. Thanks.

hi @sjahangiri
Thanks for the reply. Unfortunately, this workaround is quite costly as it will lead to too many measurements during gradient descent. I would still report this as a bug.

Importantly, my original question is still open. I’m trying to get a differentiable circuit which can use custom expval. As I mentioned before, besides the bug, the batch shots give extremely slow computation.

Could you guys please help me with this?

Just as a refresher, here is what I would like to have in a differentiable circuit body:

samples = qml.sample(Operator)
return compute_biased_average(samples)

Hey @Einar_Gabbassov,

Thanks for pointing out this issue, we have posted a bug report here.

I also agree with what you are saying about the shot batching approach for biasing the expectation value, i.e., that it presents a bottleneck for gradient calculations as you increase the number of samples. I’m not sure what the best approach is here. Perhaps you can work directly with the output probability distribution of the circuit using qml.probs (possibly in multiple bases), rather than accessing samples.

Dear Tom,
I just wanted to know if shot batching provides any computational speed up over a for loop on user’s side. If yes, i could be tempted to look into the issue. Thanks

Hi @Oriel,

Thanks for your question!

There should indeed be an advantage, namely that we “draw samples” once and then distribute the statistics.

A shot batch of the form shots=[100, 200, 5] will send a single job of size 305 shots to be executed, and then partition the results as the shot vector specifies.

Alternatively, having a for loop over separate executions, each of size 100, 200, 5 , will result in 3 jobs being submitted to the remote server.

Would you be interested in having a deeper look at how to directly work with the output probability distribution of the circuit using qml.probs, as Tom suggested?