Issues with backpropagation when using Parameter Broadcasting with PyTorch

Hello I am working on a quantum-classical model which has got the following structure : [Classical] → [QVC] . Following is the minimal working code that is representative of my actual model.

import math
import pennylane as qml
import torch
import torch.nn as nn
import numpy as np
dev = qml.device("default.qubit", wires=1)
@qml.qnode(dev, interface = 'torch')
def simple_qubit_circuit(inputs, theta):
    qml.RX(inputs, wires=0)
    qml.RY(theta, wires=0)
    return qml.expval(qml.PauliZ(0))

class QNet(nn.Module):
    def __init__(self):
        super().__init__()
        quantum_weights = np.random.normal(0, np.pi)
        self.quantum_weights = nn.parameter.Parameter(torch.tensor(quantum_weights,\
                                    dtype=torch.float32,requires_grad=True))
        shapes = {
            "theta": 1
        }
        self.q = qml.qnn.TorchLayer(simple_qubit_circuit, shapes)
    
    def forward(self, input_value):
        return self.q(input_value)

class Q_PPO1(nn.Module):
    def __init__(self):
        super().__init__()
        self.shared_network = nn.Linear(5, 1)
        self.value_network = QNet()

    def forward(self, obs):
        features = self.shared_network(obs).squeeze(1).to(torch.float64)
        value = self.value_network(features)
        return value

PPO_model = Q_PPO1()
x_train = torch.rand(10, 5) # Batch of 10 input vectors of dimension 5
y_train = torch.rand(10, 1)
y_out = PPO_model(x_train)
loss = (y_out - y_train).mean()
loss.backward()

loss.backward() returns the following error message.


RuntimeError: Function ExecuteTapesBackward returned an invalid gradient at index 0 - got [] but expected shape compatible with [1]

Note that in the forward() method I had to convert features vector to torch64 datatype. If I don’t do this, I get the following error in forward propagation.

ValueError: probabilities do not sum to 1

I am not sure what’s wrong here. Following are the details of qml.info()

Name: PennyLane
Version: 0.31.1
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: c:\users\aksi01\appdata\roaming\python\python38\site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning, PennyLane-qiskit
Platform info:           Windows-10-10.0.19045-SP0
Python version:          3.8.15
Numpy version:           1.23.5
Scipy version:           1.10.1
Installed devices:
- default.gaussian (PennyLane-0.31.1)
- default.mixed (PennyLane-0.31.1)
- default.qubit (PennyLane-0.31.1)
- default.qubit.autograd (PennyLane-0.31.1)
- default.qubit.jax (PennyLane-0.31.1)
- default.qubit.tf (PennyLane-0.31.1)
- default.qubit.torch (PennyLane-0.31.1)
- default.qutrit (PennyLane-0.31.1)
- null.qubit (PennyLane-0.31.1)
- lightning.qubit (PennyLane-Lightning-0.31.0)
- qiskit.aer (PennyLane-qiskit-0.29.0)
- qiskit.basicaer (PennyLane-qiskit-0.29.0)
- qiskit.ibmq (PennyLane-qiskit-0.29.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.29.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.29.0)

Hi @imakash, thank you for your question!

I ran the code just as you shared it and I don’t get any errors. I’m running on Python 3.10.12

Would you be able to create a new virtual environment with Python 3.10 and try again to see if this might be causing the issue? Let me know if you need guidance on how to create an environment. Alternatively you could also use Google Colab.

Let me know if this solves your issue!

Hello @CatalinaAlbornoz ,

Thanks for your response.

I can try out a new environment with an upgraded Python version. The only problem is that I am currently using Pennylane with other libraries that support only Python 3.8.

If we could make this work for Python v3.8, it would save me a lot of effort.

Hi @imakash,

Unfortunately our next release of PennyLane (v0.32) will no longer support Python 3.8 so you will need to upgrade your Python version to use the newest PennyLane features. You can still choose to stay with an older version of PennyLane if this is the only option that works for you, but I would strongly suggest updating your Python version if you can.

Hello @CatalinaAlbornoz ,

I am using CARLA which has their latest version (0.9.14) compatible only with Python versions 3.7 and 3.8. As a result, I can’t upgrade my Python version until they upgrade their build. It would very helpful of you if you could suggest me a fix to make the code work with Python 3.8.

I have another question. In the circuit specified in this question, I am wondering how are the gradient for the parameters of classical layer is calculated if the diff_method is set as “paramter_shift” for the Qnode.

@qml.qnode(dev, interface='torch', diff_method="parameter-shift")

Does the gradient calculation for the classical layer parameters happen through parameter shift rule?

I understand @imakash.

I just ran your code above with Python 3.8 and the latest PennyLane version and I don’t get any errors so the problem wasn’t the Python version after all. I would recommend that you create a new Python environment and install only the needed packages there to avoid any conflicts.

In the future it may become tricky to provide you with support since you will need to keep using the last PennyLane version that supports Python 3.8, which would be PennyLane v0.31.1. However you can follow the following tips to try to find a solution:

  1. Decouple your programs as much as possible so that it’s easier to debug (like the example code you sent before).
  2. If you need our support please provide the full context, a minimal but self-contained version of your code that can allow us to replicate the error, and the full error traceback. This makes it easier to uncover the source of the error.
  3. Start with simple examples and use print statements to verify that you’re getting the expected outputs. For example, your error message says that you’ve probably got a dimension mismatch somewhere so you can verify this by printing the outputs you’re getting.

I hope this can help you move forward. Please let me know if you have any other questions.

Regarding your question on gradients @imakash,

PennyLane will calculate the gradients for the quantum part (QNode) using the method you specify in diff_method. We give Torch the vjps for the quantum component, and Torch handles tracking the classical component and combining the two together.

Basically we give gradients to Torch in a format that it can accept, and Torch combines it with its own backprop.

Let me know if you have any further questions!

1 Like

Hello @CatalinaAlbornoz ,

Thank you for your responses. Can you please explain what is “vjps”? Also, if possible, can you please share references to help me understand how “parameter shift” works in Hybrid QVC?

I have already tried out the third step. There is no problem with forward propagation. The problem is caused by backward propagation, which I cannot debug :frowning: . Anyway, thanks for your help.

Hello @imakash,

Vjps could refer to:
https://pytorch.org/functorch/nightly/generated/functorch.vjp.html

This document has a good explanation of parameter shift rule: Parameter-shift rules — PennyLane

requires_grad=True is for backward passes of the optimization process (OpenAI), and can not work correctly if there is an issue with other parts of the code. The Quanvolutional Neural Networks and Quantum transfer learning Demos have examples of requires_grad being used.

1 Like

Hey @imakash! VJP = vector-jacobian product. I think this article in the Jax documentation does a great job of explaining things: The Autodiff Cookbook — JAX documentation

For parameter-shift rule information, we have a number of resources. However, I recommend checkout out our YT video: https://youtu.be/CRafKy6wsbY (there are two parts).

Let us know if that helps!

2 Likes

You’ve got great answers @imakash!

Regarding your original question, since your code works for me with Python v3.8 and PennyLane v0.31.1 then I would still encourage you to run this section of the code until you manage to get it working in your system.

Let us know if you manage to make it work.

Hey @isaacdevlugt @CatalinaAlbornoz ,

I have a fundamental question regarding the training of a hybrid quantum classical model. Imagine there are two cases. In case 1, I train the entire model using backpropagation, and in case 2, I use the parameter-shift rule to calculate the gradients of the parameters of the QVC while the gradient for the rest of the model is calculated using backpropagation. If the weight initialization is done using the same seed, should I expect to see an identical loss curve for both cases?

Hello @imakash,

The ‘Quantum gradients with backpropagation’ demo shows different outputs of the same circuit for backpropagation vs. parameter-shift rule examples seen in ‘print(circuit(params))’. The loss curves would be different as the way circuits are calculated are different, I believe.

Reference:

  1. Quantum gradients with backpropagation | PennyLane Demos

Hello @kevinkawchak ,

Thanks for your response. I understand that the methods are different. However, if the loss curve is different then it means that the value of gradients calculated by the two approaches are different because everything else in the two cases is the same. I don’t understand why should this be the case.

According to the this paper and also the paper by Dr. Schuld on the parameter shift rule, the parameter-shift rule gives the exact gradient of the QVC if the ansatz is composed of Rx, Ry, and Rz gates(basically gates represented by unitaries having just 2 eigenvalues). I believe, gradient calculation by backpropagation also gives the exact value and doesn’t make any approximation.

To conclude, my concern is why should the gradient values be different in the two approaches.

Hi @imakash, the parameter-shift rule indeed gives exact gradients. Did you find a case where this gives you a different result from backpropagation? In the demo on Quantum gradients with backpropagation the graphs indicate a comparison in time, not loss.

Please let me know if this answers your question!

Hello @CatalinaAlbornoz ,

In my work, I tried backpropagation as well as the parameter shift rule for gradient calculation and I got different results even for the same seed value (weight initialization). Therefore, I asked. My question is not related to the demo.

Looking at the curves, I think that parameter shift is suffering from the overshooting problem due to a large step size; a similar problem that exists with naive gradient descent without using Adam/RMS optimizer. Therefore, I think that even though the gradients are the same, probably, optimizers are probably different in both cases (Even though I explicitly define the optimizer as Adam optimizer in both cases). I am not sure though about my hypothesis and therefore would need someone from Pennylane to comment on this.

Hi @imakash, if you define the optimizer as Adam then that’s the optimizer being used. If you share a minimal but self-contained version of your code showing the different gradient for parameter-shift and backprop then I can try to replicate your error. Here’s a small example showing that the gradients are indeed the same:

import pennylane as qml
import torch

dev = qml.device("default.qubit", wires=1)
@qml.qnode(dev, interface = 'torch', diff_method='backprop')
def simple_qubit_circuit(inputs, theta):
    qml.RX(inputs, wires=0)
    qml.RY(theta, wires=0)
    return qml.expval(qml.PauliZ(0))

inputs = torch.tensor(0.5, requires_grad=False)
theta = torch.tensor(0.2, requires_grad=True)
result = simple_qubit_circuit(inputs, theta)

print(result)

result.backward()
print(theta.grad)

@qml.qnode(dev, interface = 'torch', diff_method='parameter-shift')
def simple_qubit_circuit(inputs, theta):
    qml.RX(inputs, wires=0)
    qml.RY(theta, wires=0)
    return qml.expval(qml.PauliZ(0))

inputs = torch.tensor(0.5, requires_grad=False)
theta = torch.tensor(0.2, requires_grad=True)
result = simple_qubit_circuit(inputs, theta)

print(result)

result.backward()
print(theta.grad)

My guess is that in your code the random parameters are being initialized differently even though you have a seed, or something is being coded differently in the optimization, or the comparison you’re doing is not correct. You can learn more about using the Torch interface here.

I hope this helps!