Gradient using samples

Hi!
I am currently creating a hybrid quantum-classical processing model wherein the outputs of a variational circuit are inputs to a neural network from which a cost is calculated. My idea is that I want to take a certain number of shots on the quantum computer which generates a set of samples of binary bitstrings. Using these as inputs to a neural network and then calculating the cost in a vectorized manner to get the gradients. I hope this schematic figure gives an overview of what I want to accomplish:

What I technically want to do is to generate a certain number of samples from the circuit using the qml.sample() function using a device with say 100 shots. Using these essentially as a “batch” when performing “backpropagation” to update the variational parameters of the circuit. I have tried to use the sample function this way however the gradient components are all zeros.

However, a current workaround that I found was to define the cost using a for-loop where I loop over a “shots” amount of times, then sample from the circuit using a single-shot device, pass it through the neural network and evaluate the average cost using these single-shot sampled instances. Sampling the single shot device is essentially done by

def QAOAreturnPauliZExpectation(gammas,betas,G):
    QAOAcircuit(gammas,betas,G)
    return [qml.expval(qml.PauliZ(wires = i)) for i in range(len(G.nodes))]

There are obvious reasons for why this type of workaround causes slow training. Firstly the cost function has a for loop in it (which I needed just to check if my approach works, I’ll see if I can somehow vectorize later).

def customcost(gammas,betas,G,qcircuit,neuralNet,adjacencymatrix):
    cost = 0
    for i in range(100):
        x = (qcircuit(gammas,betas,G)).float() #one shot to the circuit
        x = neuralNet(x) #pass it through the neural network
        cost += EvaluateCutValue(adjacencymatrix,x)/100
    return cost #returns a float

Here, qcircuit is a single shot device using the QAOAreturnPauliZExpectation function mentioned earlier.

Secondly, this approach hinges on runninng the entire circuit num_shots times and performing expval at the end for only a single shot device when essentially the wanted behavior can be achieved using the qml.sample() function instead. Additionally, I use parameter-shift as the diff-method, however I don’t think I can get around that fact since the approach is heavily shot-based. Note that I am not interested in defining a QNode and taking the expectation Pauli-Z value at each wire since this causes the resulting string to be zeros for the most part due to the symmetry properties of the MaxCut problem.

I am therefore curious if there is a way to use the qml.sample() function as a batch or something like that to aid in training, or if there are other smarter approaches to get the same behaviour on an analytic device.

Thanks!

AdressingSlowTraining.py (11.0 KB)
Here I have attached the code that I use. The training procedure where the variational circuit and Neural network are combined happens in the code chunk between lines 219 and 260. Particularly the “customcost” function that I defined is the cause of the bottleneck where I wish to implement the above logic of training on a batch of bitstring samples.

1 Like

Hi @Viro.

I’m not sure if this can be helpful but since PennyLane version 0.19 you can add a batch dimension to QNodes with @qml.batch_params. Here you can find an example of how to use it.

The basics are:

You first transform a QNode to support an initial batch dimension for operation parameters:

@qml.batch_params
@qml.beta.qnode(dev)
def circuit(weights):
    qml.StronglyEntanglingLayers(weights, wires=[0, 1, 2])
    return qml.expval(qml.Hadamard(0))

By applying the @qml.batch_params decorator to the QNode, you can now include a batch dimension for all QNode arguments when executing the QNode. The evaluated QNode will have an output of shape (batch_size,) :

>>> batch_size = 3
>>> weights = np.random.random((batch_size, 10, 3, 3))
>>> circuit(x, weights)
tensor([-0.30773348 0.23135516 0.13086565], requires_grad=True)

I hope this helps!

2 Likes

Hi @CatalinaAlbornoz

This was not exactly what I had in mind, and now after some further consideration I have an idea of what I want to achieve. So, as the image above suggests, I want to perform pauli-Z gate/basis measurements of all qubits. Using these, I want to pass it through a regular feed-forward neural network, which then calculates a cost function. The way I wish to compute the gradient is to use something similar to a parameter shift rule as I use a shot based device. Calling it a parameter shift rule might be misleading and it might instead be a finite-difference approach. Either way, taking inspiration from the Quantum gradients with backpropagation — PennyLane documentation tutorial, I wanted to create my own finite difference version. In particular, the gradient is calculated by shifting the i-th quantum parameter by two forward evaluations of the quantum circuit that is then processed by a NN from which a cost is calculated. To get the expectation value, I simply use qml.sample() from the circuit, pass all samples through the NN and calculate the average cost. The code snippet that conveys the idea is the following:

def parameter_shift_term(qnode,params,i):
#Parameter shift method with respect to i-th parameter
shifted = torch.clone(params)

#Forward shift
shifted[i]+=np.pi/2
forward = EvaluateCutOnDataset(torch.reshape(2*qnode(shifted)-1,shape = (100,len(G.nodes))).type(torch.float32),adjacecymatrix)

#Backward shift
shifted[i] -=np.pi/2
backward = EvaluateCutOnDataset(torch.reshape(2*qnode(shifted)-1,shape = (100,len(G.nodes))).type(torch.float32),adjacecymatrix)

return 0.5*(torch.mean(forward)-torch.mean(backward))

As you can see, I process the information that is returned from the circuit using a classical procedure in “EvaluateCutOnDataset” which essentially calculates the MaxCut-value for a given bit configuration. Repeating this for all variational parameters in the circuit should give me the entire gradient. This small working example of this “custom gradient procedure” is attached below: paramshift.py (4.2 KB)

Naturally, this approach looses all gradient information that is inherently built into pennylane since I use the qml.sample(), however I hope to use this as the gradient function during training since it essentially does the same thing if one were to use a shot-based device to calculate the gradient of quantum variational parameters.

There are two things I would love some input/help on. Firstly, is it possible to use this custom gradient function as the gradient function for a torch optimizer such as adam or SGD? If so, how do I go about creating it? If not, is there other ways using built in pennylane functionalities to get the same behaviour during training?

Secondly, I’ll also try this myself, but some input on how to potentially parallelize this procedure over all parameters that need to be trained would be incredibly helpful. I fear that training becomes slow due to the for-loop over all the parameters.

Note that I can make the switch to other interfaces other than torch if that helps realize this procedure. At the most I’ll have 16 parameters. My hope is that I can scale this procedure to use qubit-counts of around 30.

Hope I have conveyed the approach that I wish to attempt and that you have any insight into how I can optimize using this custom funciton. Thanks in advance for any help :smiley:

Hi @Viro, thanks for the details.

You cannot really take the derivative of samples. You can take the derivative of the expectation value calculated from samples, and that’s fine. Just not the samples themselves.

If you are using finite differences on a post-processed sample, you may be fine. But PennyLane cannot help you take the derivative of that, because PennyLane doesn’t know how it is post-processing the samples.

Please let me know if this helps or if you have any additional questions.