Hybrid neural network speed suggestions


I am currently testing out some hybrid neural networks with the MNIST dataset. The problem is that it is very slow.

5 epochs takes around 4/5 hours.

I have tried Forest pyQVM because it was shown to be the fastest in Speeding up grad computation but it was a lot slower.

Any suggestions would be appreciated.

Thanks for your help!

import torch

import torch.nn as nn

import torchvision

import torchvision.transforms as transforms

import pennylane as qml

from pennylane import numpy as np

# Params

q_depth = 2

n_qubits = 4             

q_delta = 1 

input_size = 784

num_classes = 10

num_epochs = 5

batch_size = 100

learning_rate = 0.001

dev = qml.device("default.qubit", wires=4)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def RY_layer(w):

    """Layer of parametrized qubit rotations around the y axis.


    for idx, element in enumerate(w):

        qml.RY(element, wires=idx)

@qml.qnode(dev, interface="torch")

def q_net(q_in, q_weights_flat):

    # Reshape weights

    q_weights = q_weights_flat.reshape(q_depth, n_qubits)

    # Embed features in the quantum node


    for i in range(n_qubits - 1):

        qml.CNOT(wires=[i, i+1])


    for i in range(n_qubits):

        qml.RX(q_weights[0][i], wires = i)


    for i in range(n_qubits):

        qml.RY(q_weights[1][i], wires = i)


    return tuple(qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits))

# MNIST dataset 

train_dataset = torchvision.datasets.MNIST(root='../data', 




test_dataset = torchvision.datasets.MNIST(root='../data', 



train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 



test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 



class NeuralNet(nn.Module):

    def __init__(self, input_size, num_classes):

        super(NeuralNet, self).__init__()

        self.relu = nn.ReLU()

        self.pre_net = nn.Linear(input_size, n_qubits)

        self.q_params = nn.Parameter(q_delta * torch.randn(q_depth * n_qubits))

        self.post_net = nn.Linear(n_qubits, 10)


    def forward(self, x):

        q_in = self.pre_net(x)

        q_in = self.relu(q_in)

        q_out = torch.Tensor(0, n_qubits).to(device)

        for elem in q_in:

            q_out_elem = q_net(elem, self.q_params).float().unsqueeze(0)

            q_out = torch.cat((q_out, q_out_elem))

        q_out = self.relu(q_out)

        out = self.post_net(q_out)

        return out

model = NeuralNet(input_size, num_classes).to(device)

# Loss and optimizer

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

# Train the model

total_step = len(train_loader)

for epoch in range(num_epochs):

    for i, (images, labels) in enumerate(train_loader):  

        # Move tensors to the configured device

        images = images.reshape(-1, 28*28).to(device)

        labels = labels.to(device)

        # Forward pass

        outputs = model(images)

        loss = criterion(outputs, labels)

        # Backward and optimize





        if (i+1) % 100 == 0:

            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 

                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test

with torch.no_grad():

    correct = 0

    total = 0

    for images, labels in test_loader:

        images = images.reshape(-1, 28*28).to(device)

        labels = labels.to(device)

        outputs = model(images)

        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)

        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

# Save

torch.save(model.state_dict(), 'model1.ckpt')

Hi @James_Ellis!

Since that post, we have invested some time improving the built-in default.qubit simulator, and managed to get an approximate two orders of magnitude speed improvement. So default.qubit will now be significantly faster than pyQVM :slight_smile:

With respect to the training time, the largest factor is the number of parameters in the quantum circuit. As PennyLane uses the parameter shift rule to differentiate quantum nodes in a hardware-friendly manner, the number of quantum evaluations required to compute the gradient of all p parameters scales as 2p\Delta t, where \Delta t is the time taken for one forward pass/quantum simulation.

Some suggestions for improving the speed of training:

  1. PennyLane always treats positional QNode arguments as differentiable, and keyword arguments as non-differentiable. You may see some speed improvement if you change q_in to be a keyword argument:

    @qml.qnode(dev, interface="torch")
    def q_net(q_weights_flat, q_in=None):
  2. You could try a high performance simulator, such as Qulacs. However, the PennyLane-Qulacs plugin is experimental, and needs more work to ensure its accuracy.

  3. Finally, a new experimental feature in the latest version of PennyLane is the PassthruQNode. Instead of using the parameter-shift rule, the PassthruQNode is simply a white box, passing tensors to a compatible simulator where classical backpropagation occurs.

    • This scales with only constant overhead compared to the parameter-shift rule, but is not hardware compatible.

    • The PassthruQNode currently only works with the default.tensor.tf simulator, coded in TensorFlow, so must be used with the TensorFlow interface.

    See this post for an example of the PassthruQNode being used.

Thank you again for your replies they are very helpful.

For 1) I am getting an error

TypeError: q_net() got multiple values for argument 'q_in'

The code works fine when I run as
def q_net(q_weights_flat, q_in):

Do you have any ideas why it might be doing that?

For 3.) what is actually happening in PasthruQNode that means it doesn’t have to use the parameter-shift rule?

Thanks :smiley:

For 1) I am getting an error

TypeError: q_net() got multiple values for argument 'q_in'

The code works fine when I run as
def q_net(q_weights_flat, q_in):

Do you have any ideas why it might be doing that?

When you define a keyword argument in a QNode, you must then always call the QNode the argument as a keyword argument. For example, if you have

def q_net(q_weights_flat, q_in=None):

Then, when you call it inside your layer, you need to call it like this:

q_out_elem = q_net(self.q_params, q_in=elem).float().unsqueeze(0)

what is actually happening in PasthruQNode that means it doesn’t have to use the parameter-shift rule?

It’s more a question of what ‘doesn’t’ happen inside a PassthruQNode :slight_smile:

In a standard QNode, the QNode is a black box — PyTorch has no access to the internal working of the QNode. It simply asks the QNode for the gradient, the QNode performs the hardware-friendly parameter-shift rule to get the gradient, and returns the gradient to PyTorch.

The PassthryQNode however is a ‘white box’ — TensorFlow/PyTorch do not even see the QNode, they control the entire computation all the way down to the individual quantum operations. This allows classical backpropagation to determine the gradient, at the expense that it won’t work with hardware.

Thanks! The qulacs simulator and keyword argument are making a big difference!

For PassthruQNode, how do you apply this to pytorch? Is there a quick example you could provide?

Thanks! :smiley:

Hi @James_Ellis,

Unfortunately, until pytorch has native support for complex number, we can’t make a compatible PassthruQNode :frowning_face: