I am observing a systematic increase in my loss function (square loss) when the parameter shift is big (1.0 or 0.1) but not for smaller parameter shifts. Is there any physical/numerical reason for this?

Here is a minimal working example, where the only parameterized gate is a displacement gate and the target is the approximation of a single function value f(0.5)=1.0

import pennylane as qml
from pennylane import numpy as np
dev = qml.device("strawberryfields.fock", wires=1, cutoff_dim=10, shots=100)
@qml.qnode(dev)
def quantum_neural_net(parameter, x=None):
# Encode input x into quantum state
qml.Displacement(x, 0.0, wires=0)
qml.Displacement(parameter, 0.0, wires=0)
return qml.expval(qml.X(0))
lr = 0.001
input = 0.5
goal = 1.0
# starting value of displacement parameter
parameter = -0.1
costs = []
steps = 500
for it in range(steps):
# feed forward with the parameter value and calculate loss
output = quantum_neural_net(parameter, x=input)
loss=(goal - output) ** 2
costs.append(loss)
# feedforward at shifts of the parameter and calculate partial derivative
output_plus = output = quantum_neural_net(parameter+s, x=input)
output_minus = output = quantum_neural_net(parameter-s, x=input)
output_gradient = 1./(2.*s)*(output_plus-output_minus)
# calculate gradient of loss with respect to parameter using chain rule
gradient = -2*(goal-output)*output_gradient
# update parameter with simple gradient descent
parameter -= gradient*lr

This code leads to the following figures for different values of s (loss as a function of steps):

Does anyone have an idea of why a big s, leads to this error? Any help would be greatly appreciated!

It seems like you’re using finite-differences in your code to calculate the gradient (not to be confused with the parameter shift rule). Since the gradient will be less exact the larger the value of s, the parameters might update incorrectly and thus cause the loss to go up. Using a more exact gradient (i.e. a smaller s) will most likely give more precise gradient-descent steps, thus seemingly working well for s=0.01.

Well, I can see it has exactly the same form as the centered finite difference scheme, but it looks like the parameter shift rule for that gate has that form as well (Schuld). If this is not the case, what is the difference between my version and the parameter shift for the displacement gate?

You are correct that the parameter shift rule for the displacement gate looks exactly like the finite differences rule stated in the paper you linked. I mistakenly thought that you were using finite differences in your code. After looking into it a bit more, the issue seems to be that you’re using the strawberryfields.fock backend with a too small cutoff_dim value, causing large parameter shifts to fall out of scope, so to say. The higher the value of s, the bigger the cutoff needs to be. You could solve this by either using a larger cutoff or by switching to e.g. the strawberryfields.gaussian backend instead.

I updated this specific reply since what I wrote before wasn’t strictly correct. Sorry about any confusion I might have caused. I hope this makes sense, though; otherwise, feel free to keep on asking questions.

Sorry about that. However, just for my understanding: Could you elaborate on how an increased shift falls out of scope? I imagine, that the parameter shift of the displacement gate has the effect of moving the Wigner distribution in x,p phase space without changing it’s shape. But the cutoff dimensions only controls how many Fock states are allowed in the basis to describe the current CV state. How are these related?

Hi Martin — A very restricted rule of thumb in Fock space is that if you apply a displacement by alpha on vacuum then your Fock distribution will be Poisson with mean equal to |alpha|^2. This means that you will need a Fock cutoff of at least |\alpha|^2 + k |\alpha| where k~2,3 to account for most of the photon number probability.