QML Algorithm doesn't learn

Hi Shawn,

Wow, you’re tackling an interesting problem! Here’s my perspective on your questions:

  1. Simulating quantum computers is a resource-intensive task – that’s why it’s worthwhile to put so much effort into building them! In your case, simulating a single wire with cutoff=30 should take considerable effort. You can’t expect training to be as fast as in the classical case. However, 1-2 days seems excessive. My only guess is that you are training over too many steps, perhaps because the learning rate is very low. So maybe try to reduce the number of optimization steps. A good benchmark would be to determine how long it takes to evaluate one run of the circuit, without training.

  2. This is not an issue with PennyLane, but perhaps your quantum model is indeed not powerful enough to tackle this task. This wouldn’t be too surprising: quantum and classical neural networks are different! One of the goals of quantum machine learning is to identify specific situations where quantum models can be advantageous. It’s not easy! One piece of advice would be to look at the literature on quantum reinforcement learning and try to determine what strategies have already been proposed.

  3. It’s definitely possible that you are implementing the wrong ansatz. It may also be that the problem is not well-suited for quantum approaches. This ties in with the second question: part of the job of researchers is to figure these things out!

As a final reflection, I personally worked on training similar models for other tasks. You can find some examples in these papers: https://arxiv.org/abs/1806.06871 and https://arxiv.org/abs/1807.10781. The results reported there work well, but I can tell you, there were many failures along the way! Certainly there were problems where the quantum neural networks were not giving good answers.

So, I’m afraid you are facing an obstacle that is familiar to any research scientist: interesting problems are often difficult to solve, and sometimes we have to solve them ourselves since nobody knows the answer! My personal advice: push a bit further into this example and see if you can get it to work. If you’re still struggling, maybe divert your attention to another problem; there’s plenty to do in QML.

Hi @jmarrazola many thanks for the reply. Just a couple questions to your responses.

  1. I tried various learning rates: 0.12, 0.012, 0.0012, 0.00012 and 0.000012 and there wasn’t a change in the per episode time. Learning rates in RL are usually relatively low but can be anywhere between 0.1 - 10e-6. The code for the game is exactly the same for the classical and quantum algorithms above. The only difference is the neural networks above. So I would assume that that fact would exclude it being a problem with the code (right?). Also, I have two additional programs using a different game and pytorch and the quantum alg took much longer than the classical one.

  2. What do you mean by quantum model? Just the layers that I used? The game is as simple as it gets so I’d be really surprised if it’s not powerful enough.

  3. This is what confuses me. Professor Hu has three papers using RL algorithms (one of them being the same I am using – a DQN) and Pennylane and he used a very similar game to the one I am using and he seems to have had positive results on all of them.

Nonetheless thanks again for the tips and info!

Hey Shawn,

Quick replies:

  1. How long does it take to execute the circuit in your computer? By this I mean, given some fixed parameters in the circuit, how long does it take to compute its output? If that’s slow, then training will be slow

  2. I mean the specific quantum circuit, comprised of the specific set of gates you have chosen.

  3. Yeah, seems like a nice puzzle for you to solve! Scientific discoveries are often just removing confusion about things we previously didn’t understand.

Good luck!

1 Like

Hi all,

Great news! The program is learning! The Ansatz from this paper
i.e. CVNN layer above didn’t work unfortunately and the Ansatz from Professor Hu is working. Here is the layer that I am using:

def layer(inputs, theta, phi, varphi, x, y, z, r, ph):
    qml.templates.DisplacementEmbedding(inputs, wires=range(wires))
    qml.templates.Interferometer([theta], [phi], varphi, wires=range(wires))#qml.templates.Interferometer(theta, phi, varphi, wires=range(wires))
    for i in range(wires):
        qml.Displacement(x[i], 0, wires=i)
        qml.Rotation(y[i], wires=i)
        qml.Squeezing(r[i], ph[i], wires=i)
        qml.Kerr(z[i], wires=i)
    return [qml.expval(qml.X(wires=i)) for i in range(wires)]

qlayer = qml.qnn.KerasLayer(layer, weight_shapes, output_dim=wires)
clayer_in = tf.keras.layers.Dense(wires)
clayer_out = tf.keras.layers.Dense(out_dim)
model = tf.keras.models.Sequential([clayer_in, qlayer,clayer_out])

I am surprised it ended up working and would like to expound a bit as to why this use case wouldn’t/shouldn’t have worked (also for the CVNN layer case) with the structure of the Tensorflow layer stack:

I think the main problem with this task is with the initial classical layer clayer_in. The input into it is in my case a 16-dimensional array and since the out dimension of the clayer_in is wires: clayer_in = tf.keras.layers.Dense(wires), all of the information from the input is essentially lost and truncated into either 1 or 2 dimensions (1 or 2 dimensions because if wires is anything higher than two, it takes a very long time to run). Any thoughts on this? If I ended up using 16 wires and just had the clayer_in as a 16 in 16 out layer, then I would think we would see better results. But I’m just speculating here.

Here is the Ansatz from Professor Hu in his papers:

Screenshot_20200812_094109

And it ended up working. I would still think that the truncation of the clayer_in would cause problems, but the results are almost on par to the classical case.

Lastly, what information can I extrapolate from the program/layer/results to show that there is a quantum advantage or a benefit to use Pennylane/Strawberry Fields? Are there already some benchmarking tools to use? Speed is of course out of the question – we’re on a classical computer. So what does one actually benchmark?

Thanks in advance!

Hi @Shawn,

Great news! The program is learning! The Ansatz from this paper

That’s great! Once you wrap up this project, you might want to consider submitting a demo, although note that we have a high bar for accepting demos.

The Ansatz from this paper i.e. CVNN layer above didn’t work unfortunately and the Ansatz from Professor Hu is working.

That’s strange - I’d expect both layers to be equivalent given enough depth. Although, note that you’re considering just one layer in the code block you shared - what depth where you using for CVNeuralNetLayers? Perhaps the problem with CVNeuralNetLayers is a subtlety regarding implementation rather than something fundamental, e.g., maybe problems with cutoff, or depth. Indeed, two layers of CVNeuralNetLayers should be able to emulate your ansatz exactly by turning on/off some of the gates in the two layers.

I think the main problem with this task is with the initial classical layer clayer_in . The input into it is in my case a 16-dimensional array and since the out dimension of the clayer_in is wires: clayer_in = tf.keras.layers.Dense(wires) , all of the information from the input is essentially lost and truncated into either 1 or 2 dimensions (1 or 2 dimensions because if wires is anything higher than two, it takes a very long time to run). Any thoughts on this? If I ended up using 16 wires and just had the clayer_in as a 16 in 16 out layer, then I would think we would see better results. But I’m just speculating here.

Yes, it’s quite reasonable to expect that greater width in the quantum circuit might lead to better results - it’s a double-edged sword, we expect to do better but at the same time it gets hard to simulate. This really is a nice motivation for hardware. For now, we have to stick with more prototype-level architectures. One thing to mention is that the dimension of the data embedded into the quantum circuit is not limited to the number of wires in the circuit. Although things like DisplacementEmbedding are commonly used, one could embed data using a combination of gates to get past the dim=wires limitation.

Lastly, what information can I extrapolate from the program/layer/results to show that there is a quantum advantage or a benefit to use Pennylane/Strawberry Fields?

This is always a tough bar for prototype circuits, raising questions on what is a fair comparison (e.g. should we compare quantum and classical networks of the same width/depth?). If we compare the current prototype models to cutting edge classical models, we of course wouldn’t expect to do better. In terms of figure of merit, for classification or regression there is normally a nice quantity such as accuracy or mean-squared error to look at. Is there a similar quantity in your RL problem?

Hi @Tom_Bromley

For the Hu Ansatz I used 1 layer, tried both wires=1,2 and played around with the cutoff_dim. For the CVNeuralNetLayers I played a ton with the parameters. I could try again to see if wires=2 and num_layers = 2 changes something but I am pretty sure I tried that already. That’s what you mean right?

That certainly peaks my interest! Is there more information on this? I would like to test this out. So essentially it would be an embedded quantum layer that then sends the embedded data to the real quantum layer? If we can get this to work, I think this would make PL/SF accessible to all RL use cases.

Exactly what I was thinking. Yea the loss function (I also use mse) could be something of interest. I’ll look into that.

Otherwise the reward and average reward over X attempts is usually of interest in RL but these depend heavily on the hyperparameters that are chosen.

For the Hu Ansatz I used 1 layer, tried both wires=1,2 and played around with the cutoff_dim . For the CVNeuralNetLayers I played a ton with the parameters. I could try again to see if wires=2 and num_layers = 2 changes something but I am pretty sure I tried that already. That’s what you mean right?

Yes, what I mean specifically is that CVNeuralNetLayers with depth=2 can exactly simulate the Hu ansatz with depth=1 by carefully setting some of the parameters in CVNeuralNetLayers. For example, you could get layer 1 of CVNeuralNetLayers to do U and D of the Hu ansatz (setting parameters so that the other gates don’t do anything), and then get layer 2 of CVNeuralNetLayers to do R, S and K. So in this way, we know that there exists a set of parameters for CVNeuralNetLayers that realizes the other ansatz. That being said, if the Hu ansatz is working for you, then that’s great - whatever works!

That certainly peaks my interest! Is there more information on this? I would like to test this out. So essentially it would be an embedded quantum layer that then sends the embedded data to the real quantum layer? If we can get this to work, I think this would make PL/SF accessible to all RL use cases.

Right, you could have an embedding layer followed by a trainable layer, you could have alternating embedding and trainable layers. You could even have the embedding at the end of the circuit (although need to think if that makes sense). So there’s a lot of options. You could take a look at this tutorial. It’s still more of a research question on best practices and the implications of doing this.

Hi @Tom_Bromley, I got it to work bypassing the wires constriction but am having troubles understanding how this is solving the dimension problem. My current input has a 9-dim vector, x, and I have added for an example 9 qml.Rotation() gates for the embedding part. And say if I only have one wire then the process would look like this:

0: ──R(2.3)──R(1.7)──R(3.3)──R(4.0)──R(1.9)──R(3.0)──R(4.0)──R(5.9)──R(7.3)──R(1.7)──D(1.9, 0)──R(2.9)──S(2.2, 2.0)──Kerr(1.7)──┤ ⟨x⟩

with RDRSK being the quantum layer. It seems we still have a dimension problem. The first 9 rotation gates each take in a x_i but then the information is still lost because it passes through the same gates with all the other x_i's and it seems that the x_i's at the beginning of the wire then pass through into the other rotation gates.

Any thoughts on this?

Hey @Shawn!

Yes good question, I agree that it seems like the R(2.3) rotation would somehow get washed out by the following 8 rotations. Also, supposing we just had two rotations R(x)R(y), all of the points for which x+y=c for a fixed constant c would result in the same rotation overall :thinking:. I’d have to look more carefully at the paper that motivated the above tutorial, but off the top of my head, how about:

  • Doing a single R + RDRSK repeated 9 times. The first one encodes x_0, the second x_1 and so on. Although, this will result in an increased depth.
  • RDRSK could be used as an encoding. E.g., you could do RDRSK (first 5 params), followed by RDRSK (last 4 params), followed by a trainable RDRSK. This approach encodes different parameters in different gates, so not sure how well it will do.

Overall I’d say it’s an opportunity to get creative and explore different encodings (as well as checking out the existing literature).

Just wanted to point out a recent paper that talks about the expressivity of quantum circuits using this repeated encoding strategy. This paper is mainly focused on qubits, but has a nice result for the power of the CV Rotation gate.