Quantum convolution neural network using Keras

Hi,

I’m trying to build a quantum convolution model to do image classificaiton. My way of doing this is

  1. First using qml.qnn.KerasLayer to buld the trainable quantum filter called qlayer.

  2. Then implementing the convolution operation in my custom Keras Model reference to here.

My problem is that although the output dimension seems reasonable to me, when calling model.fit(), the gradient of training parameters in qlayer doesn’t exist. Detailed code and error is shown in the following:

First constructing the quantum filter circuit and transforming it to KerasLayer. The comment out line is used for testing.

import pennylane as qml
from pennylane import numpy as np
from tensorflow import keras

n_train = 50 
n_test = 30 
filter_size = 2
n_qubits = filter_size**2

dev = qml.device("lightning.qubit", wires=n_qubits, shots=None)

def encoding(pixel):
    [qml.RY(pixel[i], wires=[i]) for i in range(n_qubits)]

def circuit_1(params):
    qml.BasicEntanglerLayers(weights=params, wires=range(n_qubits))

@qml.qnode(dev)
def q_filter(inputs,params):
    encoding(inputs)
    circuit_1(params)
    return qml.expval(qml.PauliZ(0))

# print(q_filter([1.0,4,54,44], [[3.44,0.45,4.44,5.43]]))

n_layers = 1
weight_shapes = {"params": (n_layers, n_qubits)}
qlayer = qml.qnn.KerasLayer(q_filter, weight_shapes,output_dim=1)

# test
# fig, ax = qml.draw_mpl(q_filter)([1.0,4,54,44], [[3.44,0.45,4.44,5.43]])
# fig.show()

And the following is my QCNN model

class QCNN(tf.keras.Model):

    def __init__(self, num_filters, filter_size, num_params):
        super(QCNN,self).__init__()
        self.num_filters = num_filters
        self.filter_size = filter_size
        # using two filter now
        self.quantum_filter_1 = qml.qnn.KerasLayer(q_filter, weight_shapes,output_dim=1, name='quantum_filter')
        self.quantum_filter_2 = qml.qnn.KerasLayer(q_filter, weight_shapes,output_dim=1, name='quantum_filter')
        self.hidden = Dense(128, activation = 'relu')
        self.flatten = Flatten()
        self.dense = Dense(10, activation='softmax')


    def call(self, inputs):
        # The output length of one sample after the convolution operation with stride = 1
        output_length = inputs.shape[1] - self.filter_size + 1
        num_sample = inputs.shape[0]
        output_all = []
        quantum_filter_list = [self.quantum_filter_1, self.quantum_filter_2]
        
        for a in range(num_sample):
                # output shape for one image
                output = np.zeros((output_length, output_length, self.num_filters))
                for i in range(output_length):
                    for j in range(output_length):
                            for f in range(self.num_filters):
                                # convolution windows, now only applying to one channel image
                                sub_input =  inputs[a,i:i+self.filter_size, i:i+self.filter_size, :]
                                # flatten into 1-D
                                sub_input = tf.reshape(sub_input, [-1])
                                quantum_filter = quantum_filter_list[f]
                                output[i][j][f] = quantum_filter(sub_input)

                output = tf.convert_to_tensor(output)
                # list of tensor
                output_all.append(output)

        # Stacks a list of rank-R tensors into one rank-(R+1) tensor.
        output_all = tf.stack(output_all)

        x = self.flatten(output_all)
        x = self.dense(x)
        return x

Now test for the model output dimension

# test on some data

#(batch_size, image_height, image_width, channel)
input_shape = (4, 10, 10, 1)
i = tf.random.normal(input_shape)
model = QCNN(2,2,4)
print(f"model_output_shape: {model(i).shape}")
model.summary()
model_output_shape: (4, 10)
Model: "qcnn_21"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantum_filter (KerasLayer)  multiple                 4         
                                                                 
 quantum_filter (KerasLayer)  multiple                 4         
                                                                 
 dense_40 (Dense)            multiple                  20864     
                                                                 
 flatten_20 (Flatten)        multiple                  0         
                                                                 
 dense_41 (Dense)            multiple                  1290      
                                                                 
=================================================================
Total params: 22,162
Trainable params: 22,162
Non-trainable params: 0
_________________________________________________________________

The output dimension is correct (4,10), and the Param# catch the training parameters in quantum_filter. The output shape say it’s multiple because I use a quantun_filter layer multiple times to convolve the whole image, this will have multiple output according to Guide to the Functional API - Keras 2.0.2 Documentation (faroit.com). I’m not sure whether it will be a problem to have multiple output for each layer, but seems the final output dimension looks great so I keep going to train the model:

model = QCNN(2,2,4)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
r = model.fit(x_train_small, y_train, epochs=10, validation_data=(x_test_small, y_test))

and the output says

Epoch 1/10
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
1/2 [==============>...............] - ETA: 36s - loss: 2.3712 - accuracy: 0.0312WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
2/2 [==============================] - 80s 44s/step - loss: 2.3464 - accuracy: 0.1000 - val_loss: 2.3679 - val_accuracy: 0.1333
Epoch 2/10
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
1/2 [==============>...............] - ETA: 37s - loss: 2.2755 - accuracy: 0.1562WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
2/2 [==============================] - 81s 44s/step - loss: 2.2671 - accuracy: 0.1400 - val_loss: 2.3573 - val_accuracy: 0.1333
Epoch 3/10
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
1/2 [==============>...............] - ETA: 37s - loss: 2.2450 - accuracy: 0.1250WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['qcnn_22/quantum_filter/params:0', 'qcnn_22/quantum_filter/params:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
2/2 [==============================] - ETA: 0s - loss: 2.2473 - accuracy: 0.1200 

where the dataset is the downscale mnist dataset:

mnist_dataset = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist_dataset.load_data()
# Reduce dataset size
x_train = x_train[:n_train]
y_train = y_train[:n_train]
x_test = x_test[:n_test]
y_test = y_test[:n_test]
# Normalize pixel values within 0 and 1
x_train = x_train / 255
x_test = x_test / 255
# # Add extra dimension for "color" channels
x_train = np.array(x_train[..., tf.newaxis])
x_test = np.array(x_test[..., tf.newaxis])
print(f"train_images_shape: {x_train.shape}")
# use Bilinear Interpolation for downscaling
x_train_small = tf.image.resize(x_train, (10,10)).numpy()
x_test_small = tf.image.resize(x_test, (10,10)).numpy()
print(f"x_train_reshape: {x_train_small.shape}")

How can I solve this warning saying that the gradient of my trianing parameter in my quantum_fliter layer doesn’t exist? And reading the loss, it looks like my model indeed isn’t training. I doubt that maybe this occur because of the multiple output, but couldn’t understand why because when print out the output dimension of the model it indeed give me (batch_size, 10), which match to my y_train dimension. I have read Data re-uploading impelementation in hybrid NN with keras layer - PennyLane Help - Xanadu Discussion Forum, but I think my way using keras with Qnode is very similar to here Turning quantum nodes into Keras Layers — PennyLane documentation, so it should be no problem.

Thanks for you patient reading, the code I povide is very long, hope I explain it clearly :joy: If there is any problem to reproduce myt result please let me know.

1 Like

I forgot to provide the environment! It is python==3.9.0, pennylane==0.29.1, tensorflow==2.11.0.

1 Like

Hey @b0976960890! Welcome to the forum!

I think this issue is similar to this forum post: Data re-uploading impelementation in hybrid NN with keras layer - #6 by Kuma-quant

I think what’s happening is that there is / are non-tensorflow arithmetic / operations present in your call function (e.g., appending to lists). When using Keras with QNodes, you will need to ensure that all array/tensor manipulations use TensorFlow. Otherwise, Keras will not be able to differentiate your QNode.

Let me know if this helps!

1 Like

Hi Isaac,

Thanks for providing the idea! Bur I’m not quit understand the reason behind the idea. Originally when I read Data re-uploading impelementation in hybrid NN with keras layer - #4 by Kuma-quant, I though what it mean is ensuring all array/tensor manipulations in the QNode should use TensorFlow rather than NumPy, because in the QNode what you deal with is the training parameter, it should be the form of a differentiable tf.tensor.

But why the operation outside QNode, in my case is in keras call function, should also include tensorflow arithmetic / operations only? We have already transformed the QNode to a KerasLayer object, I have seen people use non-tensorflow arithmetic / operations when using ordinary keras layer in call function. I will be appreciate if you can help me with this question!

But still I have changed the operation to using tensorflow only, including:

  • Instead of appending to list I use tf.concat

  • output is initialized using tf.zeros instead of np.zeros

Here is the modified call function:

    def call(self, inputs):
        # The output length of one sample after the convolution operation with stride = 1
        output_length = inputs.shape[1] - self.filter_size + 1
        num_sample = inputs.shape[0]
        output_all = None
        quantum_filter_list = [self.quantum_filter_1, self.quantum_filter_2]

        count = 0
        
        for a in range(num_sample):
                print(f"Runing {a}th data sample...")
                # output shape for one image
                # output = np.zeros((output_length, output_length, self.num_filters))
                output = tf.zeros((output_length, output_length, self.num_filters), dtype=tf.dtypes.float32)
                for i in range(output_length):
                    for j in range(output_length):
                            for f in range(self.num_filters):
                                # convolution windows, now only applying to one channel image
                                sub_input =  inputs[a,i:i+self.filter_size, j:j+self.filter_size, :]
                                # flatten into 1-D
                                sub_input = tf.reshape(sub_input, [-1])
                                quantum_filter = quantum_filter_list[f]
                                # output[i][j][f] = quantum_filter(sub_input)
                                output = tf.tensor_scatter_nd_update(output, [[i, j, f]], [quantum_filter(sub_input)])
                            
                                # output_all[a,i,j,f].assign(quantum_filter(sub_input))
                output = tf.convert_to_tensor(output)
                output = tf.expand_dims(output, axis=0)
                # print(f"output_shape: {output.shape}")
                # print(output)
                if count == 0:
                     output_all = output
                    #  print(count)
                    #  print(output_all.shape)
                else:
                     output_all = tf.concat([output_all, output], 0)
                    #  print(count)
                    #  print(output_all.shape)
                count += 1
        print("All input data for one batch have been convolved!")
        # output_all = tf.cast(output_all, tf.float32)
        x = self.flatten(output_all)
        print("after flatten")
        # x = tf.cast(x, tf.float32)
        x = self.hidden(x)
        print("after hidden")
        x = self.dense(x)
        print("after dense")
        print(x.dtype)
        return x

But now it have some dtype error when I call model.fit() to compute the loss, so I’m not sure whether the gradient problem is solved or not :

InvalidArgumentError: cannot compute Mul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:Mul]

I’m still working on this dtype error, will share the result after I solve it. Any suggestion about this dtype error is welcomed.

Again thanks @isaacdevlugt for helping me!

Regarding this:

InvalidArgumentError: cannot compute Mul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:Mul]

Unless it’s explicitly set otherwise, most machine learning frameworks will default to calculating everything in float32 — aka float. A double tensor is a float64 tensor :slight_smile:. Usually all this means is that somewhere you’re asking for a TensorFlow operation to be computed with a float32 object and a float64 object, which it cannot do.

Sometimes this can happen when you’re dancing in and out of TensorFlow and NumPy. Just make sure that you use tf.cast to convert tensors to the proper dtype or specify dtype='float32' (e.g., a = tf.zeros(10, dtype='float32').

To try and diagnose where this is occurring, it’s a lengthy process of printing the types of your inputs and outputs that are used in the call function. Hope this helps!

1 Like

Thanks for your quick and detailed explanation! In fact the code I provide above have already used tf.cast trying to make sure my output tensor is float32. I’ll follow your suggestion to check thoroughly include any opaeration in call function. Thanks again!

1 Like

Awesome! Let us know if you fix it :slight_smile:

No problem!
I haven’t fix it, but have some findings. The dtype error seems happen when computing the gradient. For now I have made sure every tensor in my call function have dtype float32. Also, I override the method train_step(self, data) in my keras Model according to this tutorial : Customize what happens in Model.fit  |  TensorFlow Core, because I want to know which step in the training process give me the dtype problem.

After that I call my model.fit() function and result says that the error occur when computing gradient with loss and the training variables :

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[33], line 4
      2 model = QCNN(2,2,4)
      3 model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
----> 4 r = model.fit(x_train_small[:3], y_train[:3], epochs=10, validation_data=(x_test_small, y_test))

File ~/anaconda3/envs/qcnn_v2/lib/python3.9/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

Cell In[32], line 94, in QCNN.train_step(self, data)
     92 # tf.debugging.assert_type(trainable_vars, tf.float32)
     93 print(trainable_vars)
---> 94 gradients = tape.gradient(loss, [tf.cast(v, tf.float32) for v in trainable_vars])
     95 # Update weights
     96 self.optimizer.apply_gradients(zip(gradients, trainable_vars))

InvalidArgumentError: cannot compute Mul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:Mul]

But in train_step(self, data) I have check that all trainable_vars are float32. Here is the new QCNN model code to make sure all tensor computation all have dype=float32, and added in the train_step() funciton I just mentioned:

class QCNN(tf.keras.Model):
    
    def __init__(self, num_filters, filter_size, num_params):
        super(QCNN,self).__init__()
        self.num_filters = num_filters
        self.filter_size = filter_size
        # using two filter now
        self.quantum_filter_1 = qml.qnn.KerasLayer(q_filter, weight_shapes,output_dim=1, name='quantum_filter',dtype='float32')
        self.quantum_filter_2 = qml.qnn.KerasLayer(q_filter, weight_shapes,output_dim=1, name='quantum_filter',dtype='float32')
        self.hidden = Dense(128, activation = 'relu',dtype='float32')
        self.flatten = Flatten()
        self.dense = Dense(10, activation='softmax')

    def call(self, inputs):

        print(f"inputs dtype: {inputs.dtype}")
        output_length = inputs.shape[1] - self.filter_size + 1
        num_sample = inputs.shape[0]
        output_all = None
        quantum_filter_list = [self.quantum_filter_1, self.quantum_filter_2]

        # Used in call() fucntion  to concat the output
        count = 0

        
        for a in range(num_sample):
                print(f"Runing {a}th data sample...")

                output = tf.zeros((output_length, output_length, self.num_filters), dtype=tf.float32)

                for i in range(output_length):
                    for j in range(output_length):
                            for f in range(self.num_filters):
                                sub_input =  inputs[a,i:i+self.filter_size, j:j+self.filter_size, :]
                                # flatten into 1-D
                                sub_input = tf.reshape(sub_input, [-1])
                                quantum_filter = quantum_filter_list[f]
                                # print(f"After quantum filter: {quantum_filter(sub_input).dtype}")

                                # make sure the output of quantum_filter is float32
                                value = tf.cast(quantum_filter(sub_input), tf.float32)
                                # print(f"After quantum filter: {value.dtype}")
                                output = tf.tensor_scatter_nd_update(output, [[i, j, f]], [value])
                                # print(f"output dtype: {output.dtype}")

                output = tf.convert_to_tensor(output)
                output = tf.expand_dims(output, axis=0)
                # print(f"output_shape: {output.shape}")
                # print(output)

                if count == 0:
                     output_all = output
                    #  print(count)
                    #  print(output_all.shape)
                else:
                     output_all = tf.concat([output_all, output], 0)
                    #  print(count)
                    #  print(output_all.shape)
                count += 1
        print("All input data have been convolved!")
        # print(output_all)
        # output_all = tf.cast(output_all, tf.float32)
        x = self.flatten(output_all)
        print("after flatten")
        # x = tf.cast(x, tf.float32)
        x = self.hidden(x)
        print("after hidden")
        x = self.dense(x)
        print("after dense")
        print(f"final_output_tdype: {x.dtype}")
        return x

    def train_step(self, data):
        # Unpack the data. Its structure depends on your model and
        # on what you pass to `fit()`.
        x, y = data

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  # Forward pass
            # Compute the loss value
            # (the loss function is configured in `compile()`)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
            print(f"loss: {loss}")

        # Compute gradients
        trainable_vars = self.trainable_variables

        # Make sure every trainable_vars has dtype=float32
        for v in trainable_vars:
             count = 1
             tf.debugging.assert_type(v, tf.float32)
             print(count)
             print(f"{count}th layer of trainable_vars is float32")
             count += 1


        print(trainable_vars)
        gradients = tape.gradient(loss, [tf.cast(v, tf.float32) for v in trainable_vars])
        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))
        # Update metrics (includes the metric that tracks the loss)
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value
        return {m.name: m.result() for m in self.metrics}

Error happen in tf.GradientTape().gradient, so I dig into the source code here. I try to find where error happen by printing something after some code paragraph, and find the error occur in line flat_grad = imperative_grad.imperative_grad, I provide an image of source code below.

So I check the dtype of flag_targets and flat_sources it says

flat_targets dtype: <dtype: 'float32'>
flat_sources dtype: <dtype: 'resource'>

And this is how far I walk now…I try my best to make sure everything is float32, but somehow Tensorflow still think somewhere my tensor have dtype=double. Is there anything I missing to check?

Again sorry for the lenghthy debugging process, I want to make you easy to understand the situation. If something you think is too detail or hard to read, please let me know :sweat_smile:!

Hey @b0976960890! Interesting. There might be something to do with using lightning with tensorflow, but if I hard-set the backend that TF uses to be float64, then everything runs fine! Just put this at the top of your code:

tf.keras.backend.set_floatx('float64')

Hope this helps :slight_smile:

Hi @isaacdevlugt ! Thanks for the solution, it works ! Now I finally begin to train my model :laughing: :laughing:

One more quesiton regarding to training speed and simulator+diff_method. I notice that when using lightning.qubit as quantum circuit simulator, the speed of training one batch to compute loss with batch_size=5 using diff_method="adjoint" is much slower than using diff_method="parameter-shift". This is tested using 1 filter (1 quantum circuit).

  • lightning.qubit with "parameter-shift" : ~10.5s

  • lightning.qubit with "adjoint": ~43s

  • default.qubit with "backprop" : 17s

I fount this post explaning the possible reason, "parameter-shift" is faster when the number of qubit and training parameters are small, which is my quantum filter case, is this understanding right? Or something happen when combining lighting.qubit with tensorflow will slow down the "adjoint" method.

Also, I’m wondering when using diff_method='best', what diff_method will be use in my quantum circuit? I notice that after transforming to KerasLayer, the interface of the qnode will change to tf. What diff_method will be used if I using lightning.qubit now ? I reference to here, so my guess is it will backprop on lightning.qubit.tf.

I will be appreciated if you can tell me whether my understanding is right or not!

1 Like

Glad this worked! :rocket:

I think the forum post that you mentioned does explain what you’re observing, yes :slight_smile:.

Also, I’m wondering when using diff_method='best', what diff_method will be use in my quantum circuit?

Great question! "best" will use backprop or the device directly to compute the gradient if it’s supported, otherwise it will use the parameter-shift rule where possible with finite-difference as a fallback. But, if you’re curious about what is being used when you select "best", you can always do something like this:

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev, interface="tensorflow")
def circuit(theta1, theta2):
    qml.PauliX(0)
    qml.Hadamard(1)
    qml.RX(theta1, 0)
    qml.RY(theta2, 1)
    return [qml.expval(qml.PauliZ(w)) for w in range(2)]

print(circuit.get_best_method(dev, "tensorflow"))
# ('backprop', {}, <DefaultQubitTF device (wires=2, shots=None) at 0x15f0aec40>)