Weird loss increase with big parameter shift

Hi there,

I am observing a systematic increase in my loss function (square loss) when the parameter shift is big (1.0 or 0.1) but not for smaller parameter shifts. Is there any physical/numerical reason for this?

Here is a minimal working example, where the only parameterized gate is a displacement gate and the target is the approximation of a single function value f(0.5)=1.0

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("strawberryfields.fock", wires=1, cutoff_dim=10, shots=100) 

def quantum_neural_net(parameter, x=None):
# Encode input x into quantum state
qml.Displacement(x, 0.0, wires=0)

qml.Displacement(parameter, 0.0, wires=0)

return qml.expval(qml.X(0))

lr = 0.001

input = 0.5
goal = 1.0

# starting value of displacement parameter
parameter = -0.1
costs = []
steps = 500
for it in range(steps):
    # feed forward with the parameter value and calculate loss
    output = quantum_neural_net(parameter, x=input)
    loss=(goal - output) ** 2

    # feedforward at shifts of the parameter and calculate partial derivative
    output_plus = output = quantum_neural_net(parameter+s, x=input)
    output_minus = output = quantum_neural_net(parameter-s, x=input)

    output_gradient = 1./(2.*s)*(output_plus-output_minus)
    # calculate gradient of loss with respect to parameter using chain rule
    gradient = -2*(goal-output)*output_gradient
    # update parameter with simple gradient descent 
    parameter -= gradient*lr

This code leads to the following figures for different values of s (loss as a function of steps):


Does anyone have an idea of why a big s, leads to this error? Any help would be greatly appreciated!

Kind regards,
Martin Knudsen

Hi @martin,

It seems like you’re using finite-differences in your code to calculate the gradient (not to be confused with the parameter shift rule). Since the gradient will be less exact the larger the value of s, the parameters might update incorrectly and thus cause the loss to go up. Using a more exact gradient (i.e. a smaller s) will most likely give more precise gradient-descent steps, thus seemingly working well for s=0.01.

I hope this helps. :slight_smile:

Hi @theodor,

Thanks for your answer :slight_smile:

Well, I can see it has exactly the same form as the centered finite difference scheme, but it looks like the parameter shift rule for that gate has that form as well (Schuld). If this is not the case, what is the difference between my version and the parameter shift for the displacement gate?

Hi @martin,

You are correct that the parameter shift rule for the displacement gate looks exactly like the finite differences rule stated in the paper you linked. I mistakenly thought that you were using finite differences in your code. :slight_smile: After looking into it a bit more, the issue seems to be that you’re using the strawberryfields.fock backend with a too small cutoff_dim value, causing large parameter shifts to fall out of scope, so to say. The higher the value of s, the bigger the cutoff needs to be. You could solve this by either using a larger cutoff or by switching to e.g. the strawberryfields.gaussian backend instead.

I updated this specific reply since what I wrote before wasn’t strictly correct. Sorry about any confusion I might have caused. I hope this makes sense, though; otherwise, feel free to keep on asking questions.

1 Like

Hi @theodor,

Thanks for the reply, I think I found my mistake and it was a pretty stupid one. I wrote this

output_plus = output = quantum_neural_net(parameter+s, x=input)
output_minus = output = quantum_neural_net(parameter-s, x=input)

which certainly does something fishy, so correcting it to this, I don’t get that weird increase

output_plus = quantum_neural_net(parameter+s, x=input)
output_minus = quantum_neural_net(parameter-s, x=input)

Sorry about that. However, just for my understanding: Could you elaborate on how an increased shift falls out of scope? I imagine, that the parameter shift of the displacement gate has the effect of moving the Wigner distribution in x,p phase space without changing it’s shape. But the cutoff dimensions only controls how many Fock states are allowed in the basis to describe the current CV state. How are these related?

Hi Martin — A very restricted rule of thumb in Fock space is that if you apply a displacement by alpha on vacuum then your Fock distribution will be Poisson with mean equal to |alpha|^2. This means that you will need a Fock cutoff of at least |\alpha|^2 + k |\alpha| where k~2,3 to account for most of the photon number probability.