Transfer Learning with pretrained Keras Model

Hello,

I am trying to follow this tutorial to try and connect a quantum circuit to a pretrained CNN. I keep getting this error when trying to use model.add(qlayer):

ValueError: Cannot infer num from shape (None, 2)

For reference I load in a model and then replace the final layers with a Dense(2) layer to try to be consistent with the tutorial. Here are the final few layers of the model architecture:

batch_normalization_11 (Batc (None, 256) 1024


dense_6 (Dense) (None, 512) 131584


dense_7 (Dense) (None, 2) 1026

Does this error have to do with the variable batch size or is there something else that I am missing from the tutorial?

Thanks for your help!

Edit:

I got past the error by changing the imports to be consistent. I was importing from keras and tensorflow. I changed my imports to be:

import tensorflow as tf
from tensorflow import keras

That seemed to solve the Value error I was encountering but is brought up another problem. When I add the Quantum Keras Layer It has 0 params and is unused. How can I fix that?

import tensorflow as tf
from tensorflow import keras
import pennylane as qml

tf.keras.backend.set_floatx('float64')

base_model = keras.models.load_model('mnist_base_model.h5')

model = keras.models.Sequential()
for layer in base_model.layers[:-1]: # go through until last layer probably better way to do this
    model.add(layer)
    model.layers[-1].trainable = False
flat = tf.keras.layers.Flatten()
downsize_layer = tf.keras.layers.Dense(10)
model.add(Dense(10))

n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev)
def qnode(inputs, weights):
    qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.templates.BasicEntanglerLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]

n_layers = 6
weight_shapes = {"weights": (n_layers, n_qubits)}
qlayer = qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits)
clayer = tf.keras.layers.Dense(2, activation="softmax")

model.add(qlayer)
model.add(clayer)
model.summary()

dense_6 (Dense) (None, 512) 131584


dense_40 (Dense) (None, 10) 5130


keras_layer_26 (KerasLayer) (None, 2) 0 (unused)


dense_41 (Dense) (None, 2) 6


Total params: 692,688
Trainable params: 5,136
Non-trainable params: 687,552


Hi @eltabre, welcome to the Forum!

Thank you for posting your edit. What version of PennyLane and TensorFlow are you using?

Hey @CatalinaAlbornoz!
Currently I am using TensorFlow 2.5.1 and PennyLane 0.15.1

Hi @eltabre, could you try updating to PennyLane 0.18 and seeing if the problem persists?

Please let me know if this solves the problem or not.

Hey @CatalinaAlbornoz I tried upgrading the the problem of the quantum layer persists.

Hi @eltabre, I’m not being able to run your model. I think it’s because you haven’t fit the model to any data. Have you tried fitting it to some data? Do you get the same problem?

I didn’t include the .h5 file for loading the pretrained model in. I have it just trained on MNIST data. I downgraded to tensorflow 2.2.0 and circuit now shows as having trainable parameters, but when I try to do a
model.fit(x_train, y_train, epochs=6, batch_size=64, validation=0.20)

It just hangs on the first training epoch.

Edit:

@CatalinaAlbornoz Here is the code I use to make the simple CNN.

import tensorflow as tf
#import tensorflow_datasets as tfds
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Lambda, MaxPooling2D # convolution layers
from tensorflow.keras.layers import Dense, Dropout, Flatten # core layers
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

import numpy as np
import matplotlib.pyplot as plt

import requests
requests.packages.urllib3.disable_warnings()
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    # Legacy Python that doesn't verify HTTPS certificates by default
    pass
else:
    # Handle target environment that doesn't support HTTPS verification
    ssl._create_default_https_context = _create_unverified_https_context

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Data Prep

# Grayscale Norm
x_train = x_train / 255.0
x_test = x_test / 255.0

# Reshape
x_test = x_test.reshape(-1, 28,28, 1)
x_train = x_train.reshape(-1, 28,28, 1)

# One-hot encoding
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# architecture from: https://www.kaggle.com/elcaiseri/mnist-simple-cnn-keras-accuracy-0-99-top-1#4.-Evaluate-the-model
model=Sequential()

model.add(Conv2D(filters=64, kernel_size = (3,3), activation="relu", input_shape=(28,28,1)))
model.add(Conv2D(filters=64, kernel_size = (3,3), activation="relu"))

model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(filters=128, kernel_size = (3,3), activation="relu"))
model.add(Conv2D(filters=128, kernel_size = (3,3), activation="relu"))

model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(filters=256, kernel_size = (3,3), activation="relu"))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation="relu"))

model.add(Dense(10,activation="softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images

epochs = 10
batch_size = 64

#datagen.fit(X_train)
train_gen = datagen.flow(x_train, y_train, batch_size=batch_size)
test_gen = datagen.flow(x_test, y_test, batch_size=batch_size)

# Fit the model
#tf.config.run_functions_eagerly(True)
history = model.fit(train_gen,
                              epochs = epochs,
                              steps_per_epoch = x_train.shape[0] // batch_size,
                              validation_data = test_gen,
                              validation_steps = x_test.shape[0] // batch_size)

model.save("mnist_base_model.h5")

I keep getting an warning before epoch one hangs:

array_grad.py:563: _EagerTensorBase.cpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.

I have tried multiple versions on Tensorflow (2.2.0 and 2.3.0). When I use 2.5.1 I get the unused params error above

I am not sure if it helps but in my case:

  1. I pretrained a Neural network (not CNN) called “model”:

    clayerM = tf.keras.layers.Dense(X_train.shape[1], activation=“relu”)
    clayerF = tf.keras.layers.LeakyReLU(alpha=0.3)
    clayerF1 = tf.keras.layers.Dense(n_qubits, activation=“relu”)

model = tf.keras.models.Sequential([clayerM, clayerF,clayerF1, clayerDropout, clayerD])

  1. freeze weights of the three classical layers:
clayerM.trainable = False
clayerF.trainable = False
clayerF1.trainable = False
  1. I defined a new model called modelh which contains the previous layers plus the quantum node and a final decision layer:

modelh = tf.keras.models.Sequential([clayerM,clayerF,clayerF1,qlayer,clayerD])

  1. I trained the new network thus achieving transfer learning. In your case i suppose you just use a more complicated CNN.

@NikSchet Where you able to save off the model you trained with something like model.save() then reload it in a new instance? I am having trouble when I load a pretrained model in and try to attach a quantum circuit to the end.

I did something similar. I pretrained a classical model (modelC), then i saved it. I then re-loaded for a different instance, I froze the weights of the model and i added a Qnode and a final classical decision layer thus obtaning a new hybrid model (modelH. The process it:

  1. Load the model

modelC = load_model(“model.h5”)

  1. Stop the training
    model.trainable = False
  2. Include Qnode and final decision layer thus making a Hybrid model

modelH = tf.keras.models.Sequential([model,qlayer,clayerD])

  1. Trained the new Hybrid model.

Sidenote: Here the pretrained model contains 3 layers (clayer1.clayer2,clayer3) so instead of using the model.trainable = False command you can also freeze the weights using this commands:
clayerM.trainable = False
clayerF.trainable = False
clayerF.trainable = False

It is very important to remember that the last classical layer before the Qnode must have some neurons as qubits in the Qnode (this is why i call it Feeding layer) so step 3. can be something like:

modelH = tf.keras.models.Sequential([model,Feedinglayer,qlayer,clayerD])

and it make sense for the feeding layer:

  1. not to pretrain it
  2. use a custom trigonometric activation function (In my case i use standar scaler to map my data from 0 to pi, so for the feeding layer activation function i use a custom sigmoid, tan etc that has the same range )
1 Like

Hi @eltabre!

When I add the Quantum Keras Layer It has 0 params and is unused. How can I fix that?

This can be fixed by doing a forward pass through the model before printing the summary, e.g., like so:

model.add(qlayer)
model.add(clayer)

model(x_test[:2])

model.summary()


The summary then shows the Keras layer parameters properly loaded. I’m not sure why you see (unused) without the forward pass, perhaps KerasLayer isn’t correctly overriding a required method :thinking:

As a paranoia check, you can confirm that the model is sensitive to the Keras layer parameters by calculating the gradient:

with tf.GradientTape() as tape:
    out = model(x_test[:2])
    
tape.jacobian(out, qlayer.trainable_weights)
2 Likes

Hey @Tom_Bromley!

That seemed to work I can now see that the parameters are showing up! Thanks everyone for their help over the past couple days!

When I was doing training I also changed verbose=1 so I could see how much data that I was loading in and that ended up showing that training was happening just much slower than I anticipated. Usually one training epoch for the MNIST data I was using on a purely classical CNN was about 1 min but a couple of hours for the hybrid-classical network. Is this normal? I thought with the size of the circuit I had wouldn’t effect the speed that much.

Hi @eltabre, I’m glad the parameters are showing up!

The speed depends not only on the size of the circuit but also on the device that you use (and other factors). Using lightning.qubit as your device may improve speed a lot.

Please let me know how much this improves your speed!

Hey @CatalinaAlbornoz!

Thanks for the tip! using the different device sped up the training by three times!

That’s great to hear @eltabre! I encourage you to keep using lightning.qubit in the future.

Enjoy using PennyLane!