Hybrid model (keras) with adjoint differentiation

Hi, everyone!

I recently learned about the adjoint differentiation method and its superior performance on the lightning.qubit simulator.

I’m working with hybrid models (using keras and pennylane), and it would be great to have such performance increases, to iterate faster over my experiments.

The basic code I’m using is the following (this is adapted to the iris dataset, so 4 features and 3 categorical levels for the target):

n_qubits = X_train.shape[1]
n_classes = y_train.nunique()
n_var_layers = 2

# ================================================

dev = qml.device("default.qubit", wires=n_qubits)

def vqc_layer(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

weight_shapes = {"weights": (n_var_layers, n_qubits, 3)}

# ================================================

ql = qml.qnn.KerasLayer(vqc_layer, weight_shapes, output_dim=n_qubits)

cl = tf.keras.layers.Dense(n_classes, activation='softmax')

qcnn = tf.keras.models.Sequential([ql, cl])

# ================================================

opt = tf.keras.optimizers.Adam(learning_rate=0.01)

qcnn.compile(loss='categorical_crossentropy', optimizer=opt, metrics=["accuracy"])

# ===========================================

history = qcnn.fit(X_train, y_train_tf,
                   epochs=10, batch_size=10,
                   validation_data=(X_val, y_val_tf))

The code above works perfectly, but it takes ~2 minutes to run the 10 epochs – and that’s entirely because of the vqc_layer: for a similar but fully classical architecture, the training is almost instantaneous. (And an analogous behavior is observed for larger datasets, although, in these cases, the times for training the hybrid models easily reach hours).

Given that, I had high hopes that adjoint differentiation would help to reduce the training times, so, at first I tried to simply change the device and decorator of the qnode to this (keeping everything else the same):

dev = qml.device("lightning.qubit", wires=n_qubits)

@qml.qnode(dev, diff_method="adjoint")
def vqc_layer(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

But then I got the following error:

InvalidArgumentError: cannot compute Mul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:Mul]

And I think I know the reason: as far as I know, the adjoint differentiation only works for a scalar output of the qnode. But, as you can see, I’m measuring all qubits and returning a list of expectation values, which is the reason why adjoint differentiation is failing, as far as I can tell.

I tried to make a qnode which returns a single measurement (only to see if it works), but it didn’t either (although in this case, the error was due to a dimensionality mismatch between layers). Anyway though, this wouldn’t be the best solution, since it really changes the overall model architecture.

Given all this context, my question is the following: is there a way to use adjoint differentiation to speedup simulations of hybrid models whose quantum layers return a list of measurements?

In case it’s helpful: I’m using pennylane version 0.22.1 and tensorflow version 2.8.0.
If you need any further details, please let me know.

Thank you very much!

Hi @andre_juan, I’m not sure if this will work for you but you could try finding the tensor product of your observables. For instance:


Please let me know if this works for you. If not please let me know!

Hi @CatalinaAlbornoz, thank you for your reply!

This is one of the things I tried, but it doesn’t work: in this case, the qnode does return a scalar (so we can use adjoint differentiation), but then I run into that dimensionality mismatch error that I mentioned:

ValueError: Exception encountered when calling layer "sequential" (type Sequential).

Input 0 of layer "dense_1" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (10,)

Call arguments received:
  • inputs=tf.Tensor(shape=(10, 4), dtype=float64)
  • training=True
  • mask=None

Btw, I did alter the model’s architecutre to account for the fact that now the qnode outputs a scalar:

ql = qml.qnn.KerasLayer(vqc_layer, weight_shapes, output_dim=1)

cl = tf.keras.layers.Dense(n_classes, activation='softmax')

qcnn = tf.keras.models.Sequential([ql, cl])

But the error persists.

I even tried to change the architecture entirely, to account for the fact that the quantum layer outputs a scalar. Something like this:

input_layer = keras.layers.Input(shape=n_qubits)

ql = qml.qnn.KerasLayer(vqc_layer, weight_shapes, output_dim=1)(input_layer)

output1 = keras.layers.Dense(1)(ql)
output2 = keras.layers.Dense(1)(ql)
output3 = keras.layers.Dense(1)(ql)
outputs = keras.layers.concatenate([output1, output2, output3])

output_layer = tf.keras.layers.Dense(n_classes, activation='softmax')(outputs)

qcnn = tf.keras.models.Model(input_layer, output_layer)

But this also raises the same dimensionality mismatch error.

It really seems that Keras can’t handle a quantum layer which outputs a scalar (which seems to be the only way to get adjoint differentiation to work). Is there anything I’m missing?

Thank you!

Hi @andre_juan, I finally managed to run your first code and it runs for me. What version of Python are you using? And I know you mentioned you’re using the iris dataset but how are you importing/modifying you data? Your issue might be there.

I used the dataset here to test.

Hi, @CatalinaAlbornoz!

Yes, the first code indeed runs perfectly! (Notice that my first code uses the default.qubit simulator and the default diff_method (“best”) in the qnode. No problems there!)

It only fails to run when I change the qnode to use the lightning.qubit simulator and adjoint differentiation, i.e.,

dev = qml.device("lightning.qubit", wires=n_qubits)

@qml.qnode(dev, diff_method="adjoint")
def vqc_layer(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

Does the code with these modifications run for you?

Thank you!

Oh sorry I had missed the point :sweat_smile: @andre_juan.

For your “cannot compute Mul …” error you can add the following line and it will fix it. tf.keras.backend.set_floatx('float64')

I tried it with ‘default.qubit’ and ‘adjoint’ and it worked for me. It should work with lightning too. Please let me know if it works for you too!

Hi, @CatalinaAlbornoz

That worked perfectly! With the piece of code that you provided, I was able to use both the lightning.qubit simulator and adjoint differentiation! Thank you so much!

So, in conclusion, one definitely can use adjoint differentiation with qnodes that don’t return scalars. That’s great to know!

And it was a data type matter, after all. That’s interesting…
Maybe changing the data type will impact the overall performance (depending on the processor that will be used). Let’s see how things turn out!
I’ll run some experiments over the weekend, and then report the results here. Hopefully the lightning.qubit simulator and adjoint differentiation will speedup the training.

Once again, thank you so much!

It’s great to hear that it worked @andre_juan!

Lighting.qubit + adjoint should give you very good speed.

Let me know how it goes!

Hi, @CatalinaAlbornoz

I’m coming back with some exciting observations: using lighting.qubit + adjoint differentiation did speed up the training process quite noticeably!

For the iris dataset, the difference wasn’t so large. And, interestingly, by repeating the training process many times, I observed a variable training time with lighting.qubit + adjoint, in contrast to a fixed time for default.qubit and parameter-shift. I think it has something to do with the nature of the simulator (which I still don’t fully understand).

Now, I also trained a hybrid model on a much larger dataset (over 100 thousand observations), and there the speedup was really remarkable: with default.qubit + parameter-shift, it took me over 24 hours; with lighting.qubit + adjoint it took less than 5 hours!

I’ll benchmark and compare both techniques more systematically, and write a more detailed report. When I have it, I’ll let you know.
But, for now, I can say that lighting.qubit + adjoint indeed seems like a great option to iterate faster over hybrid models!

As a side note, I was very excited to know about NVIDIA’s cuQuantum and its integration with PennyLane! I’m sure it’ll be fantastic, can’t wait to try it!

Thank you!

1 Like

Hi @andre_juan, it’s great to see that you had such a huge improvement! Yes, the work with NVIDIA is very exciting :smiley: .