Batching in TorchLayer

Hello guys,

I’m using a TorchLayer to use the new support for native backpropagation using PyTorch. However, the torchlayer trains much more slowly compared to a simple torch fully connected layer, for the same problem. I was wondering if batching inputs is occurring under the hood and if it’s not, what can I do to speed up training?

Best regards.

Hey @Andre_Sequeira!

Although the TorchLayer accepts batched inputs, no batch-level optimization is going on under the hood. You can check out how things work in the forward method of TorchLayer.

There might be a couple of reasons why the hybrid model you are using is taking longer to train than a simple fully connected classical layer. From a fundamental perspective, we do expect the training times to increase exponentially on a simulator as we scale the number of qubits. This is what provides the nice motivation to construct the quantum hardware.

On the other hand, for a small number of qubits we can still try a couple of things to extract more performance. One approach is to optimize the way we differentiate the circuit. In older versions of PennyLane, the diff_method="parameter-shift" method was used for Torch, you can check out more details here. Luckily, in the new version of PennyLane released a few days ago, we added support for backpropagation in the Torch interface. This simulator-only approach can provide a big speedup! In fact, I just tried running this tutorial and it took 8 seconds to train in the latest version of PennyLane and 44 seconds with an older version :rocket:

So in summary, although there are some fundamental reasons why we might expect training to be tough on quantum simulators, you could try upgrading your PennyLane version and you might get a speedup without having to change any code!

1 Like

Hey @Tom_Bromley,

thank you for your support.

There might be a couple of reasons why the hybrid model you are using is taking longer to train than a simple fully connected classical layer. From a fundamental perspective, we do expect the training times to increase exponentially on a simulator as we scale the number of qubits. This is what provides the nice motivation to construct the quantum hardware.

The problem that i’m working on is relatively small though, only 4 qubits and 3 layers , each with 8 parameters to train, so 24 trainable parameters. The backpropagation from the new version of pennylane provided indeed a massive speedup, however, i have a big batch of data to feed into the quantum neural network. This is where i see the quantum model taking longer to train.

Although the TorchLayer accepts batched inputs, no batch-level optimization is going on under the hood.

Is there anything that i can do with respect to data in order to speed up things a little bit? Is it possible to do batch-level optimization?

Hi @Andre_Sequeira,

I’m glad the PennyLane update provided some speedup! Unfortunately there is not a lot that we can do in terms of optimizing iteration over a batch dimension in incoming tensors. This is not a feature we have prioritized so far, partly due to the limitations of quantum hardware.

However, it’s useful feedback to know that you’re interested in more efficient tensor batching. We have recently been working on a batch_transform decorator, which is helpful for things like supporting differentiability and submitting multiple circuit executions to hardware as one job. This functionality may eventually help us allow batching over a tensor dimension on supported devices.

Thanks,
Tom

Hey @Tom_Bromley,

ok, that is unfortunate, but keep up the good work !!

Thank you for your help.

1 Like

Hi,

I have updated my version of Pennylane from 0.29.0 to 0.31.0. However, it looks like I am getting errors while running Batches of Input on a torchlayer in a hybrid set-up. The same code works fine on V0.29.0 but getting dimension error along the line of “Input size [batch_number,-1] not valid for n_wires” when I upgraded my version to 0.31.0. Is that a common problem for the latest version?
Thank you
Hevish

Hello @Tom_Bromley,

Is it now possible to get back propagation for both ResNet and VQC for the QTL model? Best regards.

Source:

Hey @Hevish_Cowlessur! Welcome to the forum :muscle:

It’s tough to say what’s going on without seeing a code example. Can you respond with something minimal that replicates the behaviour you’re seeing?

@kevinkawchak

Is it now possible to get back propagation for both ResNet and VQC for the QTL model?

Hybrid classical-quantum models with PennyLane will use backprop when optimizing parameters. Here is a good article: Hybrid computation — PennyLane

… hybrid computations are compatible with techniques like the famous backpropagation algorithm (also known as reverse-mode automatic differentiation), the workhorse algorithm for training deep learning models. This means that we can differentiate end-to-end through hybrid quantum-classical computations. Quantum algorithms can thus be trained the same way as classical deep learning models.

Hope this helps!

Hello @isaacdevlugt,

When keeping the default param.requires_grad = False to only update the quantum circuit parameters, the model runs correctly for training and validation.

When changing to param.requires_grad = True to update both ResNet and quantum circuit parameters, the training results are identical to the prior scenario, and does not complete validation.

Name: PennyLane
Version: 0.31.1
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
Author:
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning

Platform info: Linux-5.15.109±x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.22.4
Scipy version: 1.10.1
Installed devices:

Hey @kevinkawchak , could I ask you to share a minimal self-contained code example that shows the behavior you’re trying to solve?

Hello @Ivana_at_Xanadu, are you able to run the QTL model with param.requires_grad = True? Best regards.

Hi @kevinkawchak , which code would that be? :smiley: I don’t think you included anything in your post. If you could share a standalone (minimal) code example, that would be a great first step.

Hey @kevinkawchak,

When keeping the default param.requires_grad = False to only update the quantum circuit parameters, the model runs correctly for training and validation. When changing to param.requires_grad = True to update both ResNet and quantum circuit parameters, the training results are identical to the prior scenario, and does not complete validation.

In the transfer learning demo, the ResNet18 model used is pretrained.

We focus on the CQ transfer learning scheme discussed in the previous section and we give a specific example.

  1. As pre-trained network A, we use ResNet18, a deep residual neural network introduced by Microsoft in Ref. [3], which is pre-trained on the ImageNet dataset.

So, the behaviour you see seems to make sense :slight_smile:.

Hello, ResNet is used twice:

  1. For pretrained model
  2. For user dataset
    I am trying to have both trainable weights from 2) and Quantum circuit trainable weights be updated throughout the run in the same model. This may be possible by setting param.requires_grad = True. Best regards.

Hey @kevinkawchak,

I ran the quantum transfer learning demo and changed this:

model_hybrid = torchvision.models.resnet18(pretrained=True)

for param in model_hybrid.parameters():
    param.requires_grad = True

The training went like this:

Training started:
Phase: train Epoch: 1/3 Loss: 0.7073 Acc: 0.5000        
Phase: validation   Epoch: 1/3 Loss: 0.6457 Acc: 0.6732        
Phase: train Epoch: 2/3 Loss: 0.6230 Acc: 0.6885        
Phase: validation   Epoch: 2/3 Loss: 0.5425 Acc: 0.7908        
Phase: train Epoch: 3/3 Loss: 0.5700 Acc: 0.7869        
Phase: validation   Epoch: 3/3 Loss: 0.4972 Acc: 0.8954        
Training completed in 2m 41s
Best test loss: 0.4972 | Best test accuracy: 0.8954

Is this the behaviour you’re seeing as well? You mentioned this:

When changing to param.requires_grad = True to update both ResNet and quantum circuit parameters, the training results are identical to the prior scenario, and does not complete validation.

For fun, if I change pretrained to be False, then I get this:

Training started:
Phase: train Epoch: 1/3 Loss: 0.7102 Acc: 0.5000        
Phase: validation   Epoch: 1/3 Loss: 0.6813 Acc: 0.5621        
Phase: train Epoch: 2/3 Loss: 0.6970 Acc: 0.5164        
Phase: validation   Epoch: 2/3 Loss: 0.6783 Acc: 0.6209        
Phase: train Epoch: 3/3 Loss: 0.6945 Acc: 0.5287        
Phase: validation   Epoch: 3/3 Loss: 0.6897 Acc: 0.4837        
Training completed in 2m 40s
Best test loss: 0.6783 | Best test accuracy: 0.6209

This kind of goes against the spirit of transfer learning, though.

Transfer learning is a well-established technique for training artificial neural networks (see e.g., Ref. [2]), which is based on the general intuition that if a pre-trained network is good at solving a given problem, then, with just a bit of additional training, it can be used to also solve a different but related problem.

Thank you for running the model. This worked for this dataset.

1 Like

Awesome! Glad to hear :+1:

Hello,

I’m experiencing issues when changing the batch_size to some greater than one while using the pennylane pytorch interface.

I keep getting the error:

RuntimeError: shape ‘[10, -1]’ is invalid for input of size 1

But I have checked that the q_node is getting input of size [batch_size, num_qubits] ([10,2]) for me.

Is this an issue with the current version of Pennylane?

Here’s some of my code. Any help would be greatly appreciated:

Train Dataset

-------------

Set train shuffle seed (for reproducibility)

manual_seed(42)

batch_size = 10
n_samples = 500 # We will concentrate on the first 100 samples

Use pre-defined torchvision function to load MNIST train data

X_train = datasets.MNIST(
root=“./data”, train=True, download=True, transform=transforms.Compose([transforms.ToTensor()])
)

Filter out labels (originally 0-9), leaving only labels 0 and 1

idx = np.append(
np.where(X_train.targets == 0)[0][:n_samples], np.where(X_train.targets == 1)[0][:n_samples]
)
X_train.data = X_train.data[idx]
X_train.targets = X_train.targets[idx]

Define torch dataloader with filtered data

train_loader = DataLoader(X_train, batch_size=batch_size, shuffle=True)

Test Dataset

-------------

Set test shuffle seed (for reproducibility)

manual_seed(5)

n_samples = 250 # was 50

Use pre-defined torchvision function to load MNIST test data

X_test = datasets.MNIST(
root=“./data”, train=False, download=True, transform=transforms.Compose([transforms.ToTensor()])
)

Filter out labels (originally 0-9), leaving only labels 0 and 1

idx = np.append(
np.where(X_test.targets == 0)[0][:n_samples], np.where(X_test.targets == 1)[0][:n_samples]
)
X_test.data = X_test.data[idx]
X_test.targets = X_test.targets[idx]

Define torch dataloader with filtered data

test_loader = DataLoader(X_test, batch_size=batch_size, shuffle=True)

import pennylane as qml
from pennylane.templates import AngleEmbedding, BasicEntanglerLayers
import torch

n_qubits = 2
dev = qml.device(“default.qubit”, wires=n_qubits)

Simple circuit based off Pennylane example

@qml.qnode(dev)
def qnode(inputs, weights):
# Feature map (Manually defined ZZFeatureMap-like circuit)

for i in range(n_qubits):
    qml.RX(weights[i],wires=i)
    qml.RX(weights[i+2],wires=i)
qml.CNOT(wires=[0, 1])

return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]

n_layers = 1 # Set the number of ansatz layers as needed

n_params = 4
weight_shapes = {“weights”: (n_params,)} # Single parameter tuple

print(weight_shapes)

qlayer = qml.qnn.TorchLayer(qnode, weight_shapes)

Visualize the circuit

print(qml.draw(qlayer, expansion_strategy=“device”)(torch.tensor([0.1, 0.2])))

Define torch NN module

class Net(Module):
def init(self):
super().init()
self.conv1 = Conv2d(1, 2, kernel_size=5)
self.conv2 = Conv2d(2, 16, kernel_size=5)
self.dropout = Dropout2d()
self.fc1 = Linear(256, 64)
self.fc2 = Linear(64, 2) # 2-dimensional input to QNN
self.qlayer_1 = qml.qnn.TorchLayer(qnode, weight_shapes)
# No need for self.fc3 in this context

def forward(self, x):
    x = F.relu(self.conv1(x))
    x = F.max_pool2d(x, 2)
    x = F.relu(self.conv2(x))
    x = F.max_pool2d(x, 2)
    x = self.dropout(x)
    x = x.view(x.shape[0], -1)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)


    # print("Shape of x:", x.shape)  # Debugging line

    quantum_output = self.qlayer_1(x)

    # print("Shape of quantum_output:", quantum_output.shape)  # Debugging line
    
    # Concatenate quantum_output with the original x
    concatenated_output = torch.cat((x, quantum_output), dim=1)
    
    return concatenated_output

model4 = Net()

start = timeit.default_timer()

Define model, optimizer, and loss function

optimizer = optim.Adam(model4.parameters(), lr=0.001)
loss_func = CrossEntropyLoss() # CHANGE ??? change back???

Start training

epochs = 10 # Set number of epochs
loss_list = # Store loss history
model4.train() # Set model to training mode

for epoch in range(epochs):
total_loss =
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad(set_to_none=True) # Initialize gradient
output = model4(data) # Forward pass, ???
loss = loss_func(output, target) # Calculate loss. MIGHT NEED TO SWITCH AS WAS BINARY
loss.backward() # Backward pass
optimizer.step() # Optimize weights
total_loss.append(loss.item()) # Store loss
loss_list.append(sum(total_loss) / len(total_loss))
print(“Training [{:.0f}%]\tLoss: {:.4f}”.format(100.0 * (epoch + 1) / epochs, loss_list[-1]))

Stop timer

stop = timeit.default_timer()
print('Time: ', stop - start)

Hey @kieran_mcdowall , welcome to the forum!
Do you mind sharing the shortest version of your code that reproduces the problem? If we’re able to run a simplified version of your code directly, we can help you much quicker. :grin:

1 Like