Quantum Neural Networks / Quanvolutional NN

Hello,

I have been going through the official tutorials on quantum neural networks (QNN) and Quanvolution neural network(QuNN): I have some questions regarding both in comparsion to classical NN and Classical convolutional NN:

QNN: Referred Tutorial

  1. I have tried running the same model for MNIST dataset (both with quantum layer(s) and without) and it turns out that quantum layer(s) do not have any impact on the validation accuracy, I even tried adding more quantum layers and even increasing the number of qubits/layer, but still the validation accuracy turns out to be the same (more or less) in both with and without quantum layers. Now my question is what is the point of quantum circuit/layers addition if it is not affecting the model training at all?
    I believe the quantum parameters are not being trained, if I am right, and this is the reason that quantum layers are not affecting the model learning, then how can we make them trainable and see if it is improving the training as compared to classical NN?
  2. is it necessary to have the first and last layers of hybrid QNN to be the classical layers? Can we have complete QNN without any classical layers at the moment?

QuNN: This tutorial

  1. More suitable for image classification problems, right? However, the Quanvolution of MNIST images takes way too much time, I have tried for 5000 training and 1000 test images and it took around 5+ hours roughly on google colab, which roughly maps to around 60 hours for complete dataset (70000 images). Is it even practical for MNIST dataset, which is considered to be the Hello world dataset for classical CNNs.
    we need a reasonably large dataset to guage the performance since for this small data, even a very simple CNN model overfits in a couple of epochs, and the same would be the case with QuNNs, i suppose.
  2. Even for small (5000/1000) dataset, the quanvoluted data performs almost the same as convoluted dataset. What is the point of doing quantum convolution (seems very impractical compared to classical counterpart). In tutorial the quanvoluted images are processed by classical NN so I thought may be some quantum layers addition would better process the quanvoluted images but no significant effect(might be because of non-trainable quantum parameters, discussed above).

In both models (QNN and QUNN), the optimizer used is a classical one, if we can make the quantum parameters trainable, it sure would need a quantum optimizer, right? But I am not sure how to integrate it here, particularly in hybrid setting, any light on that would be beneficial?

Thanks for this great discussion forum. It has been and hopefully will be quite helpful.

Best,
Muhammad Kashif

Hey @Muhammad_Kashif!

I have tried running the same model for MNIST dataset (both with quantum layer(s) and without) and it turns out that quantum layer(s) do not have any impact on the validation accuracy, I even tried adding more quantum layers and even increasing the number of qubits/layer, but still the validation accuracy turns out to be the same (more or less) in both with and without quantum layers. Now my question is what is the point of quantum circuit/layers addition if it is not affecting the model training at all?
I believe the quantum parameters are not being trained, if I am right, and this is the reason that quantum layers are not affecting the model learning, then how can we make them trainable and see if it is improving the training as compared to classical NN?

There are potentially two questions here: (1) in theory what can a quantum circuit add to improve a hybrid model, and (2) is the model set up correctly so that the weights of the quantum parts are being trained?

For (1), it is in fact quite an open research question on how to best use quantum circuits within the machine learning toolchain. There are lots of interesting insights coming in, for example showing that you can think of these circuits as kernel methods and how you might understand their expressive power. Nevertheless, the best practices are not fully established and using the typical templates like StronglyEntanglingLayers may not be performant with all datasets.

To answer (2), we’d have to see a minimum working code for your hybrid model.

is it necessary to have the first and last layers of hybrid QNN to be the classical layers? Can we have complete QNN without any classical layers at the moment?

It is not necessary and you should be able to create a Keras model composed purely of quantum circuits. However, the classical layers are useful for scaling the number of features and for applying familiar nonlinearities like softmax. The approach of adding a classical layer before and after the quantum circuit is discussed here.

More suitable for image classification problems, right? However, the Quanvolution of MNIST images takes way too much time, I have tried for 5000 training and 1000 test images and it took around 5+ hours roughly on google colab, which roughly maps to around 60 hours for complete dataset (70000 images). Is it even practical for MNIST dataset, which is considered to be the Hello world dataset for classical CNNs.
we need a reasonably large dataset to guage the performance since for this small data, even a very simple CNN model overfits in a couple of epochs, and the same would be the case with QuNNs, i suppose.

Yes, the Quanvolutional layer was created by the authors with image recognition in mind. You are correct that training time will be much slower, primarily due to the overhead of trying to simulate a quantum device which scales exponentially with the number of qubits. The natural approach is to use quantum hardware, but due to noise and current limitations we typically aim to train more prototypical models on simulator.

For now, your options are to use a smaller dataset (number of images, or resolution) as well as to make sure you are squeezing the best performance from your computer. Make sure that you are using the backprop or adjoint differentiation methods, since this will provide a significant speedup during training (note that you may already be using backprop).

Even for small (5000/1000) dataset, the quanvoluted data performs almost the same as convoluted dataset. What is the point of doing quantum convolution (seems very impractical compared to classical counterpart). In tutorial the quanvoluted images are processed by classical NN so I thought may be some quantum layers addition would better process the quanvoluted images but no significant effect(might be because of non-trainable quantum parameters, discussed above).

Similar to my previous response, currently we’re still scouting out a lot of ideas in QML and building up the theory and understanding for what works best. I’d say there isn’t a universal version of a quantum convolutional layer yet, with the one introduced in the paper an interesting candidate. I’d have to dig deeper into the paper to understand their motivations, but right now the field is definitely open for new ideas on how to compose circuits in an advantageous way.

In both models (QNN and QUNN), the optimizer used is a classical one, if we can make the quantum parameters trainable, it sure would need a quantum optimizer, right?

In a sense you can think of the circuit as a function f(theta) that has a well defined gradient f'(theta). This way of thinking allows you to use any classical optimizer. On the other hand, there are quantum aware optimizers being developed, for example see this demo.

Hi @Tom_Bromley,

Thanks for answering and referring the corresponding papers.

Yes, it is quite right that the best strategy to use quantum circuit in ML context is yet to be developed, and the best practice so far while building such circuits is that they should incorporate the properties of quantum mechanics having no classical counterparts (superposition and entanglement).

To answer (2), we’d have to see a minimum working code for your hybrid model.

Sure, I can share the code, its very much similar to that provided in QNN (here) and QuNN tutorials and here. What I am doing is to perform quantum convolution following the QuNN tutorial, infact exactly the same. The data size is (5000 train_images and 1000 test_images). Afterwards, I am training the following model:

n_qubits = 3
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev)
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(1), qml.expval(qml.PauliX(2)))

n_layers = 6
weight_shapes = {"weights": (n_layers, n_qubits)}        
clayer_1 = tf.keras.layers.Dense(12,activation ='relu')

qlayer_1 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
qlayer_2 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
qlayer_3 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
qlayer_4 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
clayer_2 = tf.keras.layers.Dense(10, activation="softmax")

# construct the model
inputs = tf.keras.Input(shape=(784,))
keras.layers.Flatten(),
x = clayer_1(inputs)
x_1, x_2, x_3,x_4 = tf.split(x, 4, axis=1)
x_1 = qlayer_1(x_1)
x_2 = qlayer_2(x_2)
x_3 = qlayer_2(x_3)
x_4 = qlayer_2(x_4)


x_1, x_2, x_3,x_4 = tf.split(x, 4, axis=1)
x = tf.concat([x_1, x_2, x_3,x_4], axis=1)
outputs = clayer_2(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

opt = Adam(learning_rate=0.001)
model.compile(
    optimizer=opt,
    loss=sparse_categorical_crossentropy,
    metrics=['accuracy'],
)
bs = 8
n_epoch = 10
model.fit(
    q_train_images,
    y_train,
    batch_size=bs,
    epochs=n_epoch,
    validation_data=(q_test_images, y_test),
)

I am omitting the data preparation part, downloaded MNIST, perfored quantum convolution exactly like in tutorial and developed the model above.
The above model has total 9550 trainable parameters model.summary()the training accuracy of 0.8452 and validation accuracy of 0.7580.

The corresponding the classical model having same number of trainable parameters (9550) is:

cmodel = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(784,)),
  tf.keras.layers.Dense(12,activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

cmodel.compile(
    optimizer=opt,
    loss=sparse_categorical_crossentropy,
    metrics=['accuracy'], )
cmodel.fit(
    x_train,
    y_train,
    batch_size=8,
    epochs=10,
    validation_data=(x_test, y_test),
)

The training accuracy in purely classical model is 0.9750 and validation accuracy is 0.8890.

  1. why does both hybrid and classical model have the same number of trainable parameters?
  2. Although the convolution is performed using quantum circuit, still a very simple classical model performs way better than hybrid model, and classical convolution NN with the same trainable parameters would be even better, I believe. Is there any way that hybrid networks can be made close to classical CNN’s, if not better?

I tried adjoint, no significant effect on time being taken to perform quantum convolution.

Thanks for the help…

Hey @Muhammad_Kashif, thanks for sharing!

Looking at the code you shared, the model-setup part seems to make the output of the model independent of the qlayers. For the following:

inputs = tf.keras.Input(shape=(784,))
keras.layers.Flatten(),
x = clayer_1(inputs)
x_1, x_2, x_3,x_4 = tf.split(x, 4, axis=1)
x_1 = qlayer_1(x_1)
x_2 = qlayer_2(x_2)
x_3 = qlayer_2(x_3)
x_4 = qlayer_2(x_4)


x_1, x_2, x_3,x_4 = tf.split(x, 4, axis=1)
x = tf.concat([x_1, x_2, x_3,x_4], axis=1)
outputs = clayer_2(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

If I understand correctly, isn’t the second line of x_1, x_2, x_3,x_4 = tf.split(x, 4, axis=1) taking x as an input and hence bypassing the outputs of the quantum layers? I believe the quantum part of the code actually isn’t being called, since I had to update the following lines to get the KerasLayer to evaluate:

return qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(1)), qml.expval(qml.PauliX(2))

and

weight_shapes = {"weights": (n_layers, n_qubits, 3)}   

The way I normally approach these problems is to slowly introduce the quantum part to the model without worrying about performance. E.g.,

  • Set up model with all classical layers and check it trains
  • Introduce the simplest quantum circuit to one part of the model. Check that the model outputs and that you can access its gradient.
  • Scale the model up by making the quantum part more complex.

I think it might be an idea to avoid the splitting into 4 quantum circuits here (at first), and just see if you can pass through one quantum layer successfully.

Hi

I just got a paper last night. It is worth thinking about the issue of QNN vs CNN. I just scanned through the paper. It seems that they put a lots of efforts in the work.

“The dilemma of quantum neural networks”

1 Like

Hi @SuFong_Chien,

Thanks for sharing the paper, it definitely would be quite helpful moving forward.

Hi @Tom_Bromley,
Following the suggestions I only used one quantum layer and it seems to be working since the training is alittle slower in comparison with classical. Just a couple of simple questions based on the previous conversation in this thread:

  1. In tutorials what is n_layers (only used while defining weight_shapes) and weight_shapes, in the first tutorial weight_shapes is two parameters and in the second one, it has three-parameter. Does it depend on the no. of qubits used, if yes then what of we want to lets say more than 10 qubits, what would be the weight_shapes then?

  2. Woud the different data embedding techniques like amplitude embedding, angle embedding among others effect the underlying model’s performance? and also what could be the potential effect of StronglyEntanglingLayers or BasicEntanglingLayers on model performance?

while printing the model summary in keras (for a model with quantum layers) does not show any zero trainable paramters for quantum layer and “unused” notification as well, why is that so? Below is the model code and corresponding screenshot.

Code:
qlayer_1 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)

inputs = tf.keras.Input(shape=(784,))
x = tf.keras.layers.Dense(2, activation="relu")(inputs)
x = qlayer_1(x)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs) 
model.summary()

Output summary:

Screenshot%20(386)

Thanks for the help.

Hey @Muhammad_Kashif,

In tutorials what is n_layers (only used while defining weight_shapes) and weight_shapes, in the first tutorial weight_shapes is two parameters and in the second one, it has three-parameter. Does it depend on the no. of qubits used, if yes then what of we want to lets say more than 10 qubits, what would be the weight_shapes then?

The weight_shapes parameter is intended to let the qml.qnn.KerasLayer know the shapes of the trainable parameters in the QNode. It should be a dictionary that maps argument name to shape, for example:

@qml.qnode(dev)
def qnode(inputs, w1, w2, w3):
    ...
    qml.RX(w1, wires=0)
    qml.Rot(w2, wires=1)
    qml.templates.StronglyEntanglingLayers(w3, wires=range(2))
    ...

In this case, we should have weight_shapes = {"w1": 1, "w2": 3, "w3": (n_layers, 2, 3)}. It is easy to see this for w1 and w2, since w1 feeds into the single-parameter RX gate and w2 feeds into the three-parameter Rot gate. The shape of w3 is a bit more complicated because it feeds into StronglyEntanglingLayers. In this case, the shape must be (n_layers, n_wires, 3). For StronglyEntanglingLayers, we have multiple layers that look like:


Each qubit has a Rot gate applied followed by an entangling block. We must hence specify the number of layers and number of wires. Since Rot has three parameters, the overall shape is (n_layers, n_wires, 3).

Woud the different data embedding techniques like amplitude embedding, angle embedding among others effect the underlying model’s performance? and also what could be the potential effect of StronglyEntanglingLayers or BasicEntanglingLayers on model performance?

Definitely! There is a lot of room to play about with different embeddings and layers - check out the literature above to get more of an understanding.

while printing the model summary in keras (for a model with quantum layers) does not show any zero trainable paramters for quantum layer and “unused” notification as well, why is that so? Below is the model code and corresponding screenshot

:thinking: That’s odd. I just tried the code below and the summary printed ok:

import pennylane as qml
import tensorflow as tf

n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev)
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.templates.BasicEntanglerLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]

n_layers = 6
weight_shapes = {"weights": (n_layers, n_qubits)}

qlayer = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)

clayer_1 = tf.keras.layers.Dense(4)
qlayer_1 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
qlayer_2 = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
clayer_2 = tf.keras.layers.Dense(2, activation="softmax")

# construct the model
inputs = tf.keras.Input(shape=(2,))
x = clayer_1(inputs)
x_1, x_2 = tf.split(x, 2, axis=1)
x_1 = qlayer_1(x_1)
x_2 = qlayer_2(x_2)
x = tf.concat([x_1, x_2], axis=1)
outputs = clayer_2(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.predict(tf.ones((5, 2)))

with the result

>>> model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 2)]          0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 4)            12          input_1[0][0]                    
__________________________________________________________________________________________________
tf.split (TFOpLambda)           [(None, 2), (None, 2 0           dense[0][0]                      
__________________________________________________________________________________________________
keras_layer_1 (KerasLayer)      (None, 2)            12          tf.split[0][0]                   
__________________________________________________________________________________________________
keras_layer_2 (KerasLayer)      (None, 2)            12          tf.split[0][1]                   
__________________________________________________________________________________________________
tf.concat (TFOpLambda)          (None, 4)            0           keras_layer_1[0][0]              
                                                                 keras_layer_2[0][0]              
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 2)            10          tf.concat[0][0]                  
==================================================================================================
Total params: 46
Trainable params: 46
Non-trainable params: 0
__________________________________________________________________________________________________

(note the code is adapted from this tutorial).

Hi @Tom_Bromley,

Thanks for your input on weight-shape argument. it was quite helpful. I have a few more questions, sorry but as I am progressing step by step, questions/confusions arise. Thanks to this forum which helps get them clear.

The n_layer=n argument basically cascades the quantum circuit (stronglyEntanglingLayer, BasicEntanglingLayer or customized circuit), right? the other way of doing this can be to setn_layer=1, and then create as many quantum layers as we want and then add them to model, it will do the same, right? what is more preferred and maybe more useful, since the n_layer argument is also passed as in weight_shapes.

Both the procedures above to add Q-layers cascade the Q-circuits, if we need to have parallel quantum circuits we need to split the first classical neuron layer and connect to subsequent quantum layers, right?

BasisEmbedding can only be performed when the input data is onehotencoded, not otherwise, right? if yes, would other embedding techniques (AmplitudeEmbedding and AngleEmbedding) would be equivalent to one-hot encoded data as BasisEmbedding?

Does the qubits measurements like expval, probs etc would effect the overall model performance?

if hybrid neural networks are run on quantum processors (IBM devices for instance), would it potentially reduce the training time, since we have classical layers as well in the network which could be hard to compute for quantum processors just like Q-layers for classical computers?

Regarding the summary printing the same code you shared in previous message as a demo for printing summary is not working if we remove the following command model.predict(tf.ones((5, 2))) and the error that pops up is below:

Thanks for the great help…

Hey @Muhammad_Kashif,

The n_layer=n argument basically cascades the quantum circuit ( stronglyEntanglingLayer , BasicEntanglingLayer or customized circuit), right?

Right, the StronglyEntanglingLayers and similar layers simply define a fundamental unit circuit block and them stamps out repetitions (with different parameters) based on the shape of their input weights argument. This is the argument that you must specify the shape of in weight_shapes when creating your KerasLayer.

the other way of doing this can be to set n_layer=1 , and then create as many quantum layers as we want and then add them to model, it will do the same, right? what is more preferred and maybe more useful, since the n_layer argument is also passed as in weight_shapes .

You could set n_layer=1 and create multiple KerasLayer, but there would be multiple intermediate measurements where classical information is extracted and then put back into the next circuit. This may indeed work, but is different to using a single KerasLayer with many StronglyEntanglingLayers, which keeps everything on the quantum side. The problem may be the nomenclature is slightly confusing - we have quantum layers (e.g., StronglyEntanglingLayers) which are repetitions of circuit blocks all on the quantum side, and we also have layers in the sense of interfacing with the rest of Keras.

Both the procedures above to add Q-layers cascade the Q-circuits, if we need to have parallel quantum circuits we need to split the first classical neuron layer and connect to subsequent quantum layers, right?

I think so, I believe this is also the standard approach in an all classical network where you have one layer splitting into parallel layers.

BasisEmbedding can only be performed when the input data is onehotencoded , not otherwise, right? if yes, would other embedding techniques ( AmplitudeEmbedding and AngleEmbedding ) would be equivalent to one-hot encoded data as BasisEmbedding ?

Yes, the input data to BasisEmbedding can only be a binary string. You can also get AmplitudeEmbedding to behave the same by inputting a unit vector (which corresponds to the binary string).

Does the qubits measurements like expval , probs etc would effect the overall model performance?

Most likely! Both the measurement type and observable (e.g., Z or XZ for example) will have an effect. We typically have gone for [qml.expval(qml.PauliZ(i)) for i in range(wires)].

if hybrid neural networks are run on quantum processors (IBM devices for instance), would it potentially reduce the training time, since we have classical layers as well in the network which could be hard to compute for quantum processors just like Q-layers for classical computers?

Note that the whole model isn’t run on a quantum or classical processor, we simply split up so that a classical processor helps evaluate the gradient of classical layers and a quantum processor for the quantum layers. So the main choice in this setting is which device to execute your quantum circuit on - e.g., built in PennyLane simulators or plugin devices such as hardware backends.

Regarding the summary printing the same code you shared in previous message as a demo for printing summary is not working if we remove the following command model.predict(tf.ones((5, 2))) and the error that pops up is below:

I had a similar error if the model.predict(tf.ones((5, 2))) line isn’t run. This is an easy way to build the model. If you’re still getting that error with that line included, it may be a case of upgrading your version of TensorFlow perhaps?

Hi Tom

Suddenly, I have two questions in mind for any QNNs. First, the classical one can keep all the trained weights that forms a model to be used for prediction. How does this work in QNN? Second question is that can QNN be trained in real time and use the model on the spot for prediction?

Hi @SuFong_Chien,

Yes, both of the things you ask about, which are common in classical ML, are also possible with quantum models.

  • One can certainly save the trained weights of a model, even if it involves running on a quantum computer. After all, the weights are still classical variables! There are no conceptual barriers to this. One point to keep in mind is that classical ML libraries are older and more heavily developed than QML libraries, so while there is no barrier to saving quantum models & parameters, QML libraries are not quite as fully featured in this department as classical ML libraries. For sure you could manually save a quantum model’s parameters, but automatic methods for doing this might not be available quite yet

  • You can definitely use a QML model in an online learning setting as you describe. However, since access to quantum computers can be costly, with today’s devices it might be more economical to update the model as sizable enough batches of training data come in, rather than continually update a model’s parameters in real time :slight_smile: