There is an example presented by PennyLane for `qml.qnn.TorchLayer`

which I copy below. If one modifies only the interface line from `@qml.qnode(dev)`

to `@qml.qnode(dev, interface='torch')`

the convergence behavior drastically changes:

for the case `@qml.qnode(dev)`

:

```
Average loss over epoch 10: 0.1589
Average loss over epoch 20: 0.1331
Average loss over epoch 30: 0.1321
```

and for the case `@qml.qnode(dev, interface='torch')`

:

```
Average loss over epoch 100: 0.1709
Average loss over epoch 200: 0.1593
Average loss over epoch 300: 0.1538
Average loss over epoch 400: 0.1505
Average loss over epoch 500: 0.1482
Average loss over epoch 600: 0.1462
Average loss over epoch 700: 0.1448
Average loss over epoch 800: 0.1434
Average loss over epoch 900: 0.1426
Average loss over epoch 1000: 0.1415
```

## I am wondering **what is the reason behind this and further how do we know in general which interface is the most suitable for the problem?**

code:

```
import numpy as np
import pennylane as qml
import torch
import sklearn.datasets
n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev, interface='torch') # the default was @qml.qnode(dev)
def qnode(inputs, weights):
qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
qml.templates.StronglyEntanglingLayers(weights, wires=range(n_qubits))
return qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(1))
weight_shapes = {"weights": (3, n_qubits, 3)}
qlayer = qml.qnn.TorchLayer(qnode, weight_shapes)
clayer1 = torch.nn.Linear(2, 2)
clayer2 = torch.nn.Linear(2, 2)
softmax = torch.nn.Softmax(dim=1)
model = torch.nn.Sequential(clayer1, qlayer, clayer2, softmax)
samples = 100
x, y = sklearn.datasets.make_moons(samples)
y_hot = np.zeros((samples, 2))
y_hot[np.arange(samples), y] = 1
X = torch.tensor(x).float()
Y = torch.tensor(y_hot).float()
opt = torch.optim.SGD(model.parameters(), lr=0.5)
loss = torch.nn.L1Loss()
epochs = 1000
batch_size = 5
batches = samples // batch_size
data_loader = torch.utils.data.DataLoader(list(zip(X, Y)), batch_size=batch_size,
shuffle=True, drop_last=True)
for epoch in range(epochs):
running_loss = 0
for x, y in data_loader:
opt.zero_grad()
loss_evaluated = loss(model(x), y)
loss_evaluated.backward()
opt.step()
running_loss += loss_evaluated
avg_loss = running_loss / batches
if (epoch+1) % 100 ==0:
print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
```