Hello,

I’m using a quantum machine learning program that uses this game:

https://gym.openai.com/envs/CartPole-v1/

to try to keep the brown stick up as long as possible. But the Pennylane qml algorithm doesn’t learn:

The blue is the actual duration of each attempt/episode and the orange line is the average of the last 100 episodes. At the beginning, the high duration could be due to random choices and doesn’t necessarily mean the algorithm is getting worse. To get a perspective how it *should* look, here is a result of the program with only classical machine learning:

The Ansatz that was used is below:

```
import numpy as np
import torch
import torch.nn as nn
from torch.nn.functional import relu
import pennylane as qml
out_dim = 2 # output dimension of model
wires = 1 # this is the width of the quantum element
n_quantum_layers = 2 # this is the depth of the quantum element
def layer(inputs, w0, w1, w2, w3, w4, w5, w6, w7, w8, w9, w10):
qml.templates.SqueezingEmbedding(inputs, wires=range(wires))
qml.templates.CVNeuralNetLayers(w0, w1, w2, w3, w4, w5, w6, w7, w8, w9, w10,
wires=range(wires))
return [qml.expval(qml.X(wires=i)) for i in range(wires)]
class DQN(nn.Module):
def __init__(self, img_height, img_width):
super().__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=img_height * img_width * 3, out_features=12)
self.fc2 = nn.Linear(in_features=12, out_features=8)
# self.fc3 = nn.Linear(in_features=10, out_features=8)
self.clayer_in = torch.nn.Linear(in_features=8, out_features=wires)
self.clayer_out = torch.nn.Linear(wires, out_dim)
dev = qml.device('strawberryfields.fock', wires=wires, cutoff_dim=3)
self.layer_qnode = qml.QNode(layer, dev)
weights = qml.init.cvqnn_layers_all(n_quantum_layers, wires)
weight_shapes = {"w{}".format(i): w.shape for i, w in enumerate(weights)}
self.qlayer = qml.qnn.TorchLayer(self.layer_qnode, weight_shapes)
def forward(self, t):
t = self.flatten(t)
t = self.fc1(t)
t = self.fc2(t)
# t = self.fc3(t)
t = self.clayer_in(t)
t = self.qlayer(t)
t = self.clayer_out(t)
t = t.sigmoid()
return t
```

Does anyone have an idea why the algorithm is not learning?