Hybrid Quantum-Classical network with pytorch

Hi, I am trying to train a simple hybrid network with a quantum layer composed of 3 strongly entangling layers on 4 qubits connected to a classical layer with 2 output neurons. I am using pytorch. The problem is that it is really slow to obtain gradients of this network and I am using the simulator, not quantum hardware, so i suppose that backpropagation is being used.

If I just use the quantum layer and the pennylane optimizers, the training is fast and i notice a big difference in using default.qubit.autograd rather than default.qubit. In the hybrid network however, i didn’t notice any difference. When i use the qulacs simulator, instead of faster results, it gets even worse.

Any suggestions would be appreciated.

Thanks for your help!

class Hybrid_Network(nn.Module):
    def __init__(self, nqubits, nlayers ,  output_size):
        super(Hybrid_Network, self).__init__()

        device = qml.device('default.qubit', wires=nqubits)

        @qml.qnode(device)
        def qnode(inputs,weights):

            qml.templates.AngleEmbedding(inputs, wires=range(nqubits))
           
            #strongly entangling layer - weights = {(n_layers , n_qubits, n_parameters)}
            qml.templates.StronglyEntanglingLayers(weights,wires=range(nqubits))

            #return expectation value
            return [qml.expval(qml.PauliZ(i)) for i in range(nqubits)]

        # weights of the quantum layer are randomly initialized by PyTorch using the uniform distribution over [0,2pi]
        #{(n_layers , n_qubits, n_parameters)}
        weight_shapes = {"weights":(nlayers,N,3)}
        self.qlayer = qml.qnn.TorchLayer(qnode,weight_shapes)

        self.clayer = nn.Linear(4,output_size)
        self.hybrid_network = nn.Sequential(
            self.qlayer,
            self.clayer
        )

    def forward(self, input):
        return self.hybrid_network(input)

Hey @Andre_Sequeira, welcome to the forum!

This sounds like a setting where backpropagation is outperforming the parameter shift rule, which we expect as the number of trainable parameters increases (see this tutorial).

The code you shared is evaluating the gradient using the parameter shift rule, so this might explain why you are seeing the slow down. We do not currently have a default.qubit.torch device to support backpropagation in Torch. This is something we’d like to add soon, as complex support makes its way into torch. Unfortunately, we also do not yet support backpropagation as a differentiation method in the qnn module, but this will likely be fixed soon.

For now, you may have to stick with using TensorFlow and the default.qubit.tf device for backpropagation support, as well as using core TF functionality to make a hybrid rather than Keras. I had a go at prototyping what this might look like:

import pennylane as qml
import tensorflow as tf
from pennylane import numpy as np

nqubits = 4
n_layers = 3
output_size = 2
batches = 10

# Define backprop-compatible device
device = qml.device('default.qubit.tf', wires=nqubits)

# Define QNode
@qml.qnode(device, interface="tf")
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(nqubits))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(nqubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(nqubits)]

# Define inputs and qnode trainable weights
inputs = tf.ones(nqubits)
weights = tf.Variable(qml.init.strong_ent_layers_uniform(n_layers, nqubits))
clayer_matrix = tf.Variable(np.random.random((output_size, nqubits)))
clayer_bias = tf.Variable(np.random.random((output_size, 1)))

# Define hybrid function
def hybrid(inputs):
    x = qnode(inputs, weights)
    x = tf.expand_dims(x, axis=1)
    x = clayer_matrix @ x
    x = x + clayer_bias
    # Could optionally add a nonlinearity here
    return tf.squeeze(x)

# Define optimizer
opt = tf.keras.optimizers.SGD(learning_rate=0.1)

# Begin optimization
steps = 5

for i in range(steps):
    opt.minimize(lambda: hybrid(inputs), [weights, clayer_matrix, clayer_bias])

Hi @Tom_Bromley, first of all, thank you so much for your answer. Secondly, congratulations on the forum, I’ve been following other users questions for a few weeks and you guys are awesome.

If you don’t mind, I have a couple more questions. I am trying to minimize the cross-entropy loss function and I’m not used to TensorFlow, could you give me a simple example of how to use softmax and the optimizer?
Another question, can I run the PyTorch version in a Qiskit quantum machine? or there’s not compatibility?

No problem!

can I run the PyTorch version in a Qiskit quantum machine? or there’s not compatibility?

Yes the choice of interface (Autograd/NumPy, Torch, or TensorFlow) is independent of the choice of device, so this should not be a problem. Let us know if you need a hand.

I am trying to minimize the cross-entropy loss function and I’m not used to TensorFlow, could you give me a simple example of how to use softmax and the optimizer?

Following from the code I shared above, you could have something like

def cost(inputs, outputs_expected):
    x = hybrid(inputs)
    return tf.nn.softmax_cross_entropy_with_logits(logits=x, labels=outputs_expected)

# Define optimizer
opt = tf.keras.optimizers.SGD(learning_rate=0.1)

# Begin optimization
steps = 5

for i in range(steps):
    opt.minimize(lambda: cost(inputs, outputs), [weights, clayer_matrix, clayer_bias])

It could also be done using Keras if we use parameter-shift differentiation rather than backprop:

import pennylane as qml
import tensorflow as tf
from pennylane import numpy as np

nqubits = 4
n_layers = 3
output_size = 2
batches = 10

device = qml.device('default.qubit', wires=nqubits)

# Define QNode
@qml.qnode(device)
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(nqubits))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(nqubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(nqubits)]

# Define inputs and qnode trainable weights
inputs = tf.constant(np.random.random((batches, nqubits)), dtype=tf.float32)

# output_size is the number of classes
outputs = tf.one_hot(np.random.randint(output_size, size=batches), depth=output_size)

# define weight_shapes
weight_shapes = {"weights": (n_layers, nqubits, 3)}

qlayer = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=nqubits)
clayer = tf.keras.layers.Dense(output_size)

model = tf.keras.Sequential([qlayer, clayer])
opt = tf.keras.optimizers.SGD(learning_rate=0.05)
model.compile(opt, loss=tf.keras.losses.CategoricalCrossentropy())

model.fit(inputs, outputs, epochs=8, batch_size=5)

Note that these examples probably still need a bit of work, prototyping and testing before I expect a model to train successfully!

1 Like

Hi @Tom_Bromley, thank you very much, you have been a huge help.

1 Like

Hello, are you using a specific template about this?

I am trying to do the same thing (differences are that i want to try with 2 qubits first) . The goal is to benchmark the variational classifier and the one qubit re-uploading vs this hybrid QC network using Iris dataset.

Hi @NikSchet, i do not follow if you want to do it in tensorflow also, but if you want to use pytorch maybe you could take a look at the \href{https://pennylane.ai/qml/demos/tutorial_quantum_transfer_learning.html}{Quantum\ Transfer\ Learning} demo.

Thank you very much, I am already trying to implement this demo but for a different classical neural network for my iris dataset.

I am now trying the codes above. My problem is in the data transformation part to create the input. My dataset is typical Iris dataset. I guess i need to transform my dataset in a tensorflow accepted form. Are there any tutorials on that? Thanks in advance (sorry if it is a basic question)

Hey @NikSchet and thanks @Andre_Sequeira,

You could load Iris data like this:

import tensorflow as tf
import tensorflow_datasets as tfds

ds = tfds.load('iris', split='train')
ds = ds.shuffle(1024).batch(32)

for datapoint in ds.take(1):
    features = datapoint["features"]
    labels = datapoint["label"]
    
    labels_one_hot = tf.one_hot(labels, 3)

This is done using the TensorFlow Datasets package.

1 Like

Hi @Tom_Bromley, the goal of the hybrid network is to train a reinforcement learning agent. The network produced interesting results, but I was wondering, the classical layer is used just for the output of a probability distribution over the possible actions to take, which in this case is binary. Now, is it possible to remove the classical layer and obtaining the probability distribution directly from the qnn? The quantum circuit has 4 qubits, so I would need to narrow the output to a single qubit and thereafter normalize the output to be between 0 and 1.

I also have perhaps a more theoretical question. By training these quantum circuits, we are basically training periodic functions, right? It heavily depends on the appropriate choice of the learning rate. Is there any way to figure out the “optimal” learning rate, or we’re left with an empirical analysis?

I apologize for the long text. Thank you very much.

Hi @Andre_Sequeira,

Now, is it possible to remove the classical layer and obtaining the probability distribution directly from the qnn?

One thing that could be done here, is to create a quantum circuit that samples basis states with the same probability distribution that was previously produced by the classical layer. In such a case, each basis state could correspond to the action that needs to be taken.

For getting the probability distribution, qml.probs could be used for example the previously mentioned QNode as follows:

@qml.qnode(device, interface="tf")
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(nqubits))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(nqubits))
    return qml.probs(wires=range(nqubits))

By training these quantum circuits, we are basically training periodic functions, right? It heavily depends on the appropriate choice of the learning rate. Is there any way to figure out the “optimal” learning rate, or we’re left with an empirical analysis?

Indeed, choosing a favourable learning rate for gradient descent as part of hyperparameter initialization is something that is an open question and is specific to the problem at hand. The hyperparameters chosen are closely related to the geometry used in the parameter space.

Worth checking out the Quantum natural gradient tutorial which describes an alternative optimization method that uses a different geometry than Euclidean geometry and can perform better than gradient descent.

Hope this helps a bit, let us know if you have further questions! :slight_smile:

1 Like

Hi @antalszava, thank you for your answer.

I am sorry , I don’t understand. In this case, the quantum circuit has 4 qubits, therefore it has 16 possible basis states which would represent 16 different actions, but in my case the action set has only two actions.

Yes, thank you very much, I came across this optimization method while researching. Very interesting :star_struck:

Hi @Andre_Sequeira,

In this case, the quantum circuit has 4 qubits, therefore it has 16 possible basis states which would represent 16 different actions, but in my case the action set has only two actions.

That’s correct! Not entirely sure about the specific use case, but one approach here could be to consider the marginal probability distribution of computational basis states on a single qubit (as the subsystem of the 4 qubit quantum circuit).

Let’s consider a circuit that prepares the two-qubit computational basis states (|00\rangle, |01\rangle, |10\rangle, |11\rangle) with equal probabilities:

import pennylane as qml

dev = qml.device('default.qubit', wires=2)

@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    qml.CNOT(wires=[0,1])
    qml.Hadamard(1)
    return qml.probs(wires=[0,1])

print(circuit())

[0.25 0.25 0.25 0.25]

The marginal probabilities of computational basis states on the first qubit can be obtained:

import pennylane as qml

dev = qml.device('default.qubit', wires=2)

@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    qml.CNOT(wires=[0,1])
    qml.Hadamard(1)
    return qml.probs(wires=[0])

print(circuit())

[0.5 0.5]

The first entry here corresponds to the probability of the first qubit being |0\rangle, while the second entry corresponds to the probability of the first qubit being |1\rangle.

Hope this helps with a bit of direction. :slight_smile:

Hello @antalszava ,

this is interesting, thank you very much. Assuming that we prepare the uniform superposition state over the 2-qubit system, how many times do we need to prepare the quantum circuit in order to have a good approximation of the probability distribution? In a sense what is qml.probs() doing under the hood?

Hi @Andre_Sequeira,

In PennyLane state vector simulator devices (such as default.qubit) allow computations in two ways:

  1. by sampling the circuit a certain number of times and post-processing the samples (analytic=False case)
  2. by returning the analytic value of the output that is the probability distribution of basis states in this case (analytic=True case)

In the first case, you can further specify the shots attribute to define the number of circuit evaluations.

import pennylane as qml

dev = qml.device('default.qubit', wires=2, analytic=False, shots=100)

@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    qml.CNOT(wires=[0,1])
    qml.Hadamard(1)
    return qml.probs(wires=[0])

print(circuit())

[0.53 0.47]

As the number of shots is increased for the device, the probability distribution is nearing its analytic value. However, for only a few shots, the results may deviate greatly from the analytic values.

By default, default.qubit performs analytic computations (as per the example from the previous answer). Once analytic is set to False, the default value of 1000 is used for the shots attribute .

@antalszava is there a way to know the optimal shots for a quantum circuit? Because depending on the circuit, and if I run it on a real quantum device or simulator, 1000 might be a small/large number and it is important for the performance.

Thanks.

Hi @Andre_Sequeira,

This I would say greatly comes down to trial and error and running separate simulations.

As you’ve mentioned, this is highly dependent on the device. For hardware devices, inaccuracies can stem from the fact that applying gates and measurements in a quantum circuit introduces errors (how good a gate is often quantified by the fidelity of a gate). Furthermore, the quantum state itself can also change over time (decoherence). Each architecture performs operations in a different way (i.e., gate fidelities differ). Noise inherently influences measurement outcomes and hence the statistics that can be drawn from the observed outcomes (samples). This noisiness comes on top of the stochastic nature of running a quantum simulation.

Overall one approach could be to first run quantum circuits on simulators and try to mimic real hardware. In PennyLane, noise can be simulated using the new default.mixed device (see Release notes for an example) by applying quantum channels that can represent the noise.

Hi @antalszava, thank you very much for your help.