Using PyTorch Gradients

Hello, I hope everyone on the Xanadu team is having a good holiday season.

I have a few questions regarding using PyTorch gradients with PennyLane:

  • I cannot find the source of this at the moment, but I recall seeing that if you want to calculate the gradient in a loss function you will need to use PennyLane with PyTorch. Is this still the case?
  • If you use PennyLane with PyTorch you use have to use torch.autograd and all the PyTorch optimisers. How does this effect the use of the Parameter Shift rules. If we were to use torch.autograd, then send this to real quantum hardware, would we still use parameter shift rules and do the gradient calculation on real hardware? Or, is does the gradient calculation end up happening classically somewhere?
  • We can use Strawberry Fields as a backend to PennyLane, and also use PyTorch at the same time. I have run into a problem with this specific setup. I can create a model like:
def layer(v):
    qml.Rotation(v[0], wires=0)
    qml.Squeezing(v[1], 0.0, wires=0)
    qml.Rotation(v[2], wires=0)

    qml.Displacement(v[3], 0.0, wires=0)
    qml.Kerr(v[4], wires=0)

@qml.qnode(dev, interface='torch')
def quantum_neural_net(var, x=None):
    qml.Displacement(x, 0.0, wires=0)

    for v in var:
        layer(v)

    return qml.expval(qml.X(0))

Then probe it with:

num_layers = 2
theta_weights = torch.tensor(0.05*np.random.randn(num_layers, 5)
i = Variable(torch.tensor(1.0), requires_grad=True)
O = quantum_neural_net(theta_weights, x=i) 

In this toy model though, when i try to find the gradient, it thinks I have broken the graph:

dx = torch.autograd.grad(outputs=O, inputs=i, 
                                  grad_outputs=O.data.new(O.shape).fill_(1),
                                  create_graph=True, retain_graph=True)[0]

RuntimeError: One of the differentiated Tensors
appears to not have been used in the graph.
Set allow_unused=True if this is the desired behavior.

So, my question is, is the combination of PyTorch/SF and torch.autograd gradient calculation not available at the moment?

Thank you!

ps, I understand you are all on your break now, I don’t expect a reply or anything until you are all back! Have a good break.

Hi @andrew!

I cannot find the source of this at the moment, but I recall seeing that if you want to calculate the gradient in a loss function you will need to use PennyLane with PyTorch. Is this still the case?

You might need to elaborate here, but if you mean simply compute the gradient of a hybrid classical-quantum cost function, PennyLane supports both autograd (the default) and TensorFlow.

Autograd

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev)
def circuit(weights):
    qml.RX(weights[0], wires=0)
    qml.RY(weights[1], wires=0)
    return qml.expval(qml.PauliZ(0))

def cost(weights):
    return np.sum(circuit(weights) ** 2 - np.sin(weights))

weights = np.array([0.1, 0.2], requires_grad=True)
grad_fn = qml.grad(cost)
print(grad_fn(weights))

TensorFlow

import pennylane as qml
import tensorflow as tf

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev, interface="tf")
def circuit(weights):
    qml.RX(weights[0], wires=0)
    qml.RY(weights[1], wires=0)
    return qml.expval(qml.PauliZ(0))

def cost(weights):
    return tf.reduce_sum(circuit(weights) ** 2 - tf.sin(weights))

weights = tf.Variable([0.1, 0.2], dtype=tf.float64)

with tf.GradientTape() as tape:
    loss = cost(weights)

grad = tape.gradient(loss, weights)
print(grad)

Whichever framework you choose is up to you and your personal preference :slight_smile: More details are available here: Gradients and training — PennyLane 0.33.0 documentation

If we were to use torch.autograd, then send this to real quantum hardware, would we still use parameter shift rules and do the gradient calculation on real hardware?

That’s correct — if you use PyTorch and PennyLane, and your device is a real hardware device, then the hardware-compatible parameter-shift rule is used automatically.

If you would like some more fine-grained control, you can specify the diff_method when creating a QNode:

@qml.qnode(dev, diff_method="parameter-shift")

Other options include:

  • "finite-diff" - uses numerical finite differences
  • "reversible" - uses a form of backpropagation that is more memory efficient (simulator only)
  • "backprop" - uses standard backpropagation (simulator only).

This demo compares and contrasts some of these methods in more detail if you are interested: Quantum gradients with backpropagation | PennyLane Demos

So, my question is, is the combination of PyTorch/SF and torch.autograd gradient calculation not available at the moment?

This should work, yes!

Your example appears to work for me after changing the x argument to be non-differentiable. In PennyLane, keyword arguments to QNodes are always non-differentiable parameters — if you wish for it to be differentiable, it will need to be a positional argument.

Can you try the following code snippet, and let me know if it works for you?

import pennylane as qml
import torch
import numpy as np

dev = qml.device("strawberryfields.fock", wires=1, cutoff_dim=5)

def layer(v):
    qml.Rotation(v[0], wires=0)
    qml.Squeezing(v[1], 0.0, wires=0)
    qml.Rotation(v[2], wires=0)

    qml.Displacement(v[3], 0.0, wires=0)
    qml.Kerr(v[4], wires=0)

@qml.qnode(dev, interface='torch')
def quantum_neural_net(var, x=None):
    qml.Displacement(x, 0.0, wires=0)

    for v in var:
        layer(v)

    return qml.expval(qml.X(0))

num_layers = 2
theta_weights = torch.tensor(0.05*np.random.randn(num_layers, 5), requires_grad=True)

i = torch.tensor(1.0, requires_grad=False)
loss = quantum_neural_net(theta_weights, x=i)

print("Loss:", loss)

loss.backward()
print("Gradient:", theta_weights.grad)

This gives me the following output:

Gradient: tensor([[-0.0924, -1.5613, -0.0922,  1.7454, -0.3835],
        [-0.0905, -1.6156, -0.0655,  1.7957, -0.2594]], dtype=torch.float64)

Note: The Kerr interaction does not support the parameter-shift rule, and its inclusion in the circuit will cause PennyLane to fallback to finite differences as a fallback.

Note: The Kerr interaction does not support the parameter-shift rule, and its inclusion in the circuit will cause PennyLane to fallback to finite differences as a fallback.

A thought I just had based on my ‘note’ above; if you would prefer analytic gradients for a non-Gaussian photonic circuit, you could switch over to the strawberryfields.tf device and TensorFlow.

This combination allows you to set diff_method="backprop", so TensorFlow will perform standard backpropagation for analytic results:

import pennylane as qml
import tensorflow as tf
import numpy as np

dev = qml.device("strawberryfields.tf", wires=1, cutoff_dim=5)

def layer(v):
    qml.Rotation(v[0], wires=0)
    qml.Squeezing(v[1], 0.0, wires=0)
    qml.Rotation(v[2], wires=0)

    qml.Displacement(v[3], 0.0, wires=0)
    qml.Kerr(v[4], wires=0)

@qml.qnode(dev, interface='tf', diff_method="backprop")
def quantum_neural_net(var, x=None):
    qml.Displacement(x, 0.0, wires=0)

    for v in var:
        layer(v)

    return qml.expval(qml.X(0))

num_layers = 2
theta_weights = tf.Variable(0.05*np.random.randn(num_layers, 5))

i = tf.constant(1.0)

with tf.GradientTape() as tape:
	weights = tf.convert_to_tensor(theta_weights)
	loss = quantum_neural_net(weights, x=i)

print("Loss:", loss)

grad = tape.gradient(loss, theta_weights)
print("Gradient:", grad)

with output

Loss: tf.Tensor([1.4142735], shape=(1,), dtype=float32)
Gradient: tf.Tensor(
[[-0.94458413 -0.32569608 -0.8647036   0.59947348 -3.35271811]
 [-0.87427759 -1.62147141 -0.83985192  1.61316371 -3.1067791 ]], shape=(2, 5), dtype=float64)

(it won’t agree with the output from the post above, as I did not set the random seed in between runs!).

For more details, see the strawberryfields.tf documentation.

You might need to elaborate here, but if you mean simply compute the gradient of a hybrid classical-quantum cost function, PennyLane supports both autograd (the default) and TensorFlow.

Ah apologies for not being clear. I mean if I created a cost function that itself depends on a loss, something that could look something like:

def cost(circuit_out, circuit_in):
    grad = torch.autograd.grad(outputs=circuit_out, inputs=circuit_in)
    return np.sum(grad - circuit_in)

You would then treat this cost “normally”, as you have in your examples. I ask as a follow up to a comment made here :

Yes, that constraint comes from Autograd, which is the default interface in PennyLane

I just wanted to be sure I interpreted this correctly, and that this is still the case.

Can you try the following code snippet, and let me know if it works for you?

Yes, that works now! Thank you very much!

Ah apologies for not being clear. I mean if I created a cost function that itself depends on a loss, something that could look something like:

Ah, I see!

Currently, differentiating loss functions that depend on the gradient is not supported in PennyLane. This is because the second derivative of QNodes is not yet supported, however this is something we are working on adding: https://github.com/PennyLaneAI/pennylane/pull/961

There is one exception, however: if you use diff_method="backprop", then higher derivatives and these types of loss functions will work. The following simulators support backpropagation:

  • default.qubit.tf (must be used with interface="tf")
  • default.qubit.autograd (must be used with interface="autograd")
  • strawberryfields.tf (must be used with interface="tf").

Note that in this case, ‘autograd’ refers to the NumPy-based Autograd library that is available via from pennylane import numpy as np, not PyTorch Autograd! Apologies for the confusion.

At the moment, there is no PyTorch-compatible simulator that supports backprop, since this requires complex number support which is not yet fully available in PyTorch.

1 Like

Thanks Josh!

So, if i used something like: strawberryfields.tf with interface="tf" and diff_method="backprop" it should work, then in the future there may be a diff_method="parameter-shift" option that will run on real hardware?

To further clarify, using PyTorch backend will not work? I have a small example running using PyTorch with PennyLane where the loss function depends on calculated gradients. This doesn’t give any errors, but also never seems to converge to a result. Is this just because it is simply not possible when using PyTorch, because of its lack of complex number support, which is necessary for the simulators?

So, if i used something like: strawberryfields.tf with interface="tf" and diff_method="backprop" it should work

Yep, that’s right!

then in the future there may be a diff_method="parameter-shift" option that will run on real hardware?

We have a parameter-shift rule that works with Xanadu’s current generation of hardware (the GBS X-series chips), but the parameter-shift rule currently does not extend to non-Gaussian operations available in the TensorFlow backend (such as the Fock and Kerr operations). This is an active area of research, however, so I can’t comment too long term!

I have a small example running using PyTorch with PennyLane where the loss function depends on calculated gradients. This doesn’t give any errors, but also never seems to converge to a result.

I’m somewhat surprised this works. Could you post a small minimal example?

the parameter-shift rule currently does not extend to non-Gaussian operations available in the TensorFlow backend

ok I see - so to run something on the X8 chip your model must be made of only Gaussian operations? Normal procedure for non-guassian operations is a simulator with a classical gradient finding method.

Could you post a small minimal example?

Sure, of course! As I said though, it doesn’t give an error, but it also does not converge to the result I expect. The method though does work if just use “pure” PyTorch (ie, a classical NN)

The point of the model is to solve an ODE. By minimising a loss function that compares the gradient of the network output and the value of the original function you can do this.

import pennylane as qml
import torch
import numpy as np

dev = qml.device("strawberryfields.fock", wires=1, cutoff_dim=10)

# make our model:
def layer(v):
    # Matrix multiplication of input layer
    qml.Rotation(v[0], wires=0)
    qml.Squeezing(v[1], 0.0, wires=0)
    qml.Rotation(v[2], wires=0)
    # Bias
    qml.Displacement(v[3], 0.0, wires=0)
    # Element-wise nonlinear transformation
    qml.Kerr(v[4], wires=0)

@qml.qnode(dev, interface='torch')
def quantum_neural_net(var, x):
    # Encode input x into quantum state
    qml.Displacement(x, 0.0, wires=0)
    # "layer" subcircuits
    for v in var:
        layer(v)
    return qml.expval(qml.X(0))

get some data:

n_events = 51
x_data = torch.linspace(-1,1,51,requires_grad=True)

declare ode we want to solve (very minimal example):

#dy/dx = x
def f_x(x):
    return x

This is our loss function. It has 2 main parts. It calculates the gradient of the network output against the input. Then, compares this against the value of the above function we wish to solve

def func_cost(X):
    output, dxdy = [], []

    # find the gradient between the network output and input
    for x in X:
        model_output = quantum_neural_net(theta_weights, x) 
        output.append(model_output)
        _d_psy_t = torch.autograd.grad(outputs=model_output, inputs=x, 
                              grad_outputs=model_output.data.new(model_output.shape).fill_(1),
                              create_graph=True, retain_graph=True)[0]
        dxdy.append(_d_psy_t)
    
    # find error between gradient and value of original function
    error = 0
    for x, y in zip(dxdy, X):
        err_sqr = (x - f_x(y))**2
        error += err_sqr
    # enforce a boundary condition:
    _loss = error/n_events +  (output[25] -1)**2

    return _loss

and then this runs it:

num_layers = 3
theta_weights = torch.tensor(0.05*np.random.randn(num_layers, 5), requires_grad=True)#torch.rand(num_layers, 5, requires_grad=True)
opt = torch.optim.Adam([theta_weights], lr = 0.005)

opt.zero_grad()
loss = func_cost(x_data)
loss.backward()
opt.step()

print(loss.data.item())

This was based on a PyTorch script i have that does find a solution, and at the moment the code above does not give an error. Maybe an error should be getting flagged and it isn’t? Or the graph is getting broken at some point, so SF does count the gradient inside the loss function?

I have examples now though that work if I use a wider network and strawberryfields.tf interface, but I am curious as to whats happening here

Normal procedure for non-guassian operations is a simulator with a classical gradient finding method

On further reading of “Continuous-variable quantum neural networks” I see this is what is done there (“The networks are numerically
simulated using the Strawberry Fields software platform”).

Hey @andrew!

ok I see - so to run something on the X8 chip your model must be made of only Gaussian operations?

Yes, the current X8 chip is composed of Gaussian input states, Gaussian operations, and photon counting (non-Gaussian) measurements. You can find more information here. The chip is accessed in PennyLane using the PennyLane-SF plugin.

Regarding the code snippet you shared, it does indeed seem odd that you’re getting a gradient of a function that includes the gradient of the quantum circuit. It would be interesting to see how the gradient compares to an identical circuit run using strawberryfields.tf and the TF interface? I also wonder if Torch is just ignoring the contributions from the gradient when differentiating the cost function.