# Gradient using samples

Hi!
I am currently creating a hybrid quantum-classical processing model wherein the outputs of a variational circuit are inputs to a neural network from which a cost is calculated. My idea is that I want to take a certain number of shots on the quantum computer which generates a set of samples of binary bitstrings. Using these as inputs to a neural network and then calculating the cost in a vectorized manner to get the gradients. I hope this schematic figure gives an overview of what I want to accomplish:

What I technically want to do is to generate a certain number of samples from the circuit using the qml.sample() function using a device with say 100 shots. Using these essentially as a â€śbatchâ€ť when performing â€śbackpropagationâ€ť to update the variational parameters of the circuit. I have tried to use the sample function this way however the gradient components are all zeros.

However, a current workaround that I found was to define the cost using a for-loop where I loop over a â€śshotsâ€ť amount of times, then sample from the circuit using a single-shot device, pass it through the neural network and evaluate the average cost using these single-shot sampled instances. Sampling the single shot device is essentially done by

``````def QAOAreturnPauliZExpectation(gammas,betas,G):
QAOAcircuit(gammas,betas,G)
return [qml.expval(qml.PauliZ(wires = i)) for i in range(len(G.nodes))]
``````

There are obvious reasons for why this type of workaround causes slow training. Firstly the cost function has a for loop in it (which I needed just to check if my approach works, Iâ€™ll see if I can somehow vectorize later).

``````def customcost(gammas,betas,G,qcircuit,neuralNet,adjacencymatrix):
cost = 0
for i in range(100):
x = (qcircuit(gammas,betas,G)).float() #one shot to the circuit
x = neuralNet(x) #pass it through the neural network
return cost #returns a float
``````

Here, qcircuit is a single shot device using the QAOAreturnPauliZExpectation function mentioned earlier.

Secondly, this approach hinges on runninng the entire circuit num_shots times and performing expval at the end for only a single shot device when essentially the wanted behavior can be achieved using the qml.sample() function instead. Additionally, I use parameter-shift as the diff-method, however I donâ€™t think I can get around that fact since the approach is heavily shot-based. Note that I am not interested in defining a QNode and taking the expectation Pauli-Z value at each wire since this causes the resulting string to be zeros for the most part due to the symmetry properties of the MaxCut problem.

I am therefore curious if there is a way to use the qml.sample() function as a batch or something like that to aid in training, or if there are other smarter approaches to get the same behaviour on an analytic device.

Thanks!

Here I have attached the code that I use. The training procedure where the variational circuit and Neural network are combined happens in the code chunk between lines 219 and 260. Particularly the â€ścustomcostâ€ť function that I defined is the cause of the bottleneck where I wish to implement the above logic of training on a batch of bitstring samples.

Hi @Viro.

Iâ€™m not sure if this can be helpful but since PennyLane version 0.19 you can add a batch dimension to QNodes with `@qml.batch_params`. Here you can find an example of how to use it.

The basics are:

You first transform a QNode to support an initial batch dimension for operation parameters:

``````@qml.batch_params
@qml.beta.qnode(dev)
def circuit(weights):
qml.StronglyEntanglingLayers(weights, wires=[0, 1, 2])
``````

By applying the `@qml.batch_params` decorator to the QNode, you can now include a batch dimension for all QNode arguments when executing the QNode. The evaluated QNode will have an output of shape `(batch_size,)` :

``````>>> batch_size = 3
>>> weights = np.random.random((batch_size, 10, 3, 3))
>>> circuit(x, weights)
tensor([-0.30773348 0.23135516 0.13086565], requires_grad=True)
``````

I hope this helps!

2 Likes

This was not exactly what I had in mind, and now after some further consideration I have an idea of what I want to achieve. So, as the image above suggests, I want to perform pauli-Z gate/basis measurements of all qubits. Using these, I want to pass it through a regular feed-forward neural network, which then calculates a cost function. The way I wish to compute the gradient is to use something similar to a parameter shift rule as I use a shot based device. Calling it a parameter shift rule might be misleading and it might instead be a finite-difference approach. Either way, taking inspiration from the Quantum gradients with backpropagation â€” PennyLane documentation tutorial, I wanted to create my own finite difference version. In particular, the gradient is calculated by shifting the i-th quantum parameter by two forward evaluations of the quantum circuit that is then processed by a NN from which a cost is calculated. To get the expectation value, I simply use qml.sample() from the circuit, pass all samples through the NN and calculate the average cost. The code snippet that conveys the idea is the following:

``````def parameter_shift_term(qnode,params,i):
#Parameter shift method with respect to i-th parameter
shifted = torch.clone(params)

#Forward shift
shifted[i]+=np.pi/2
forward = EvaluateCutOnDataset(torch.reshape(2*qnode(shifted)-1,shape = (100,len(G.nodes))).type(torch.float32),adjacecymatrix)

#Backward shift
shifted[i] -=np.pi/2
backward = EvaluateCutOnDataset(torch.reshape(2*qnode(shifted)-1,shape = (100,len(G.nodes))).type(torch.float32),adjacecymatrix)

return 0.5*(torch.mean(forward)-torch.mean(backward))
``````

As you can see, I process the information that is returned from the circuit using a classical procedure in â€śEvaluateCutOnDatasetâ€ť which essentially calculates the MaxCut-value for a given bit configuration. Repeating this for all variational parameters in the circuit should give me the entire gradient. This small working example of this â€ścustom gradient procedureâ€ť is attached below: paramshift.py (4.2 KB)

Naturally, this approach looses all gradient information that is inherently built into pennylane since I use the qml.sample(), however I hope to use this as the gradient function during training since it essentially does the same thing if one were to use a shot-based device to calculate the gradient of quantum variational parameters.

There are two things I would love some input/help on. Firstly, is it possible to use this custom gradient function as the gradient function for a torch optimizer such as adam or SGD? If so, how do I go about creating it? If not, is there other ways using built in pennylane functionalities to get the same behaviour during training?

Secondly, Iâ€™ll also try this myself, but some input on how to potentially parallelize this procedure over all parameters that need to be trained would be incredibly helpful. I fear that training becomes slow due to the for-loop over all the parameters.

Note that I can make the switch to other interfaces other than torch if that helps realize this procedure. At the most Iâ€™ll have 16 parameters. My hope is that I can scale this procedure to use qubit-counts of around 30.

Hope I have conveyed the approach that I wish to attempt and that you have any insight into how I can optimize using this custom funciton. Thanks in advance for any help

Hi @Viro, thanks for the details.

You cannot really take the derivative of samples. You can take the derivative of the expectation value calculated from samples, and thatâ€™s fine. Just not the samples themselves.

If you are using finite differences on a post-processed sample, you may be fine. But PennyLane cannot help you take the derivative of that, because PennyLane doesnâ€™t know how it is post-processing the samples.

Please let me know if this helps or if you have any additional questions.