Backpropagation with lightning.qubit and PyTorch

Hi,

I wanted to ask if there are any plans to add support for backprop for lightning.qubit on PyTorch?

At the moment I use default.qubit with backprop. I switched from lightning.qubit with parameter-shift, since compared to parameter-shift, backprop is of course miles faster. However, using default.qubit with backprop makes forward evaluation of my circuit slower, which I suspect would be faster if using lightning.qubit instead. In some setups (when there aren’t many parameters), the gain I make by using backprop is lost due to forward quantum circuit evaluation.

Following Backpropagation with Pytorch - #2 by josh, I tried lightning.qubit with adjoint, however there are some operations I use that are not supported by adjoint, so that’s not a valid option for my use case.

To roughly illustrate the idea, here are combined run times for 1000 iterations for a simple 2 qubit circuit with 3 learnable parameters:

  • default.qubit + parameter-shift
    • Circuit forward evaluation: 3s-4s
    • .backwards(): 13s-15s
  • lightning.qubit + parameter-shift
    • Circuit forward evaluation: 2s
    • .backwards(): 7s-9s
  • default.qubit + backprop
    • Circuit forward evaluation: 6s-7s
    • .backwards(): 1s

So it is clear that there is some overhead for forward evaluation because of backprop, however I am wondering how this overhead would look like for lightning.qubit.

P.S. I also noticed that using default.qubit with backprop on PyTorch (as well as default.qubit.torch) gives the following warning message:

…/lib/python3.8/site-packages/torch/autograd/__init__.py:154: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at …/aten/src/ATen/native/Copy.cpp:244.)

Is this supposed to be happening?

Hi @karolishp — a very well-founded question!

Everything you note is correct:

  • backprop definitely has an overhead on the forward pass, since every intermediate stage of the computation is being stored in memory, to be accumulated later during the backwards pass.

  • The adjoint method is a version of backprop that is designed for unitary/reversible computation — as a result, we are able to remove the memory caching requirements, and replace it with some additional computation. In effect, we are trading a reduction in memory for an increase in computational time, however, this allows us to scale up the regimes where we can use adjoint beyond standard backprop, to ~30 qubits or more.

    However, like you mention, we are still working on adding support for more operations.

  • parameter-shift will have the fastest forward execution time (since there is no ‘bookkeeping’ that must be done on the forward pass), but requires 2P separate circuit evaluations on the backwards pass, for all P parameters in the circuit. So useful for small circuits with few parameters, but rapidly becomes unscalable as the number of parameters/qubits increase.

Following Backpropagation with Pytorch , I tried lightning.qubit with adjoint , however there are some operations I use that are not supported by adjoint, so that’s not a valid option for my use case.

Which operations/measurements do you currently need which aren’t supported with adjoint yet? This will help us build up the adjoint to ensure feature parity.

P.S. I also noticed that using default.qubit with backprop on PyTorch (as well as default.qubit.torch ) gives the following warning message:

Would you be able to post:

  • A small QNode example that generates this warning?
  • Your Torch, PennyLane, and NumPy versions?

This will help us track down the issue :slight_smile:

Hi @josh and @karolishp. I wanted to mention that I am experiencing the same warning that @karolishp has mentioned. Here is a version of the code to reproduce this message in the context of a simple hybrid network. It is not that minimal, but I wanted to write in a way that resembles more complex models and pennylane tutorials :smile: (sorry I cannot upload as an attachment)

import numpy as np
from sklearn.datasets import make_moons
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
import pennylane as qml
import sys

class Model(nn.Module):
    def __init__(self, dev, diff_method="backprop"):
        super().__init__()

        self.cnet_in = self.cnet()
        self.qcircuit = qml.qnode(dev, interface="torch", 
                                  diff_method=diff_method)(self.qnode)
        
        weight_shape = {"weights":(2,)}
        self.qlayer = qml.qnn.TorchLayer(self.qcircuit, weight_shape)
        self.cnet_out = self.cnet()

    def cnet(self):
        layers = [nn.Linear(2,10), nn.ReLU(True), nn.Linear(10,2), nn.Tanh()]
        return nn.Sequential(*layers)   

    def qnode(self, inputs, weights):
        # Data encoding:
        for x in range(len(inputs)):
            qml.Hadamard(x)
            qml.RZ(2.0 * inputs[x], wires=x)
        # Trainable part:
        qml.CNOT(wires=[0,1])
        qml.RY(weights[0], wires=0)
        qml.RY(weights[1], wires=1)
        return [qml.expval(qml.PauliZ(wires=0)), qml.expval(qml.PauliZ(wires=1))]

    def forward(self, x):
        x1 = self.cnet_in(x)
        x2 = self.qlayer(x1)
        x_output = self.cnet_out(x2)
        return x_output

def train(X, y_hot, dev_name, diff_method):
    
    dev = qml.device(dev_name, wires=2, shots=None)
    model  = Model(dev, diff_method)
    
    # Train the model
    opt = torch.optim.SGD(model.parameters(), lr=0.2)
    loss = torch.nn.L1Loss()

    X = torch.tensor(X, requires_grad=False).float()
    y_hot = y_hot.float()

    batch_size = 5
    batches = 200 // batch_size

    data_loader = torch.utils.data.DataLoader(
        list(zip(X, y_hot)), batch_size=5, shuffle=True, drop_last=True
    )

    epochs = 6

    for epoch in range(epochs):

        running_loss = 0

        for xs, ys in data_loader:
            opt.zero_grad()

            loss_evaluated = loss(model(xs), ys)
            loss_evaluated.backward()

            opt.step()

            running_loss += loss_evaluated

        avg_loss = running_loss / batches
        print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))

    y_pred = model(X)
    predictions = torch.argmax(y_pred, axis=1).detach().numpy()

    correct = [1 if p == p_true else 0 for p, p_true in zip(predictions, y)]
    accuracy = sum(correct) / len(correct)
    print(f"Accuracy: {accuracy * 100}%")

if __name__ == "__main__":	
	torch.manual_seed(42)
	np.random.seed(42)
	
	X, y = make_moons(n_samples=200, noise=0.1)
	y_ = torch.unsqueeze(torch.tensor(y), 1)  # used for one-hot encoded labels
	y_hot = torch.scatter(torch.zeros((200, 2)), 1, y_, 1)
	train(X, y_hot, str(sys.argv[1]), str(sys.argv[2]))

Using default.qubit and backprop with the above code (python pnl_test_devices_torch.py default.qubit backprop) I get the following:

/work/vabelis/miniconda3/envs/ae_qml_pnl/lib/python3.8/site-packages/torch/autograd/__init__.py:154: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at  /opt/conda/conda-bld/pytorch_1640811757556/work/aten/src/ATen/native/Copy.cpp:244.)
  Variable._execution_engine.run_backward(
Average loss over epoch 1: 0.5108
Average loss over epoch 2: 0.4007
Average loss over epoch 3: 0.2492
Average loss over epoch 4: 0.2128
Average loss over epoch 5: 0.1894
Average loss over epoch 6: 0.1919
Accuracy: 88.0%

I am using:

  • pytorch=1.10.2
  • pennylane=0.21.0
  • numpy=1.22.2

Looking forward for your insights. It seems that this warning has been observed also within the Torch community too. It would be great to identify why this occurs within pennylane.

1 Like

Hi @vabelis, welcome to the Forum!

Thank you for adding your code and versions.

This issue seems to be a Torch issue. Since it’s not a PennyLane issue it would be best to follow the Torch issue to see if they can find a solution.

Another option might be to use older versions of Torch and seeing if the problem persists.

We will however keep an eye on this and see if there’s something we can do about this in the future.

I hope this helps.