Thanks @theodor. That makes sense now. I just read about the sampling of a qnode to get the derivative. I checked how many times my qlayer is sampled and it shows 500 times per call of the qlayer. Is the purpose of the 500 times just to create the finite-difference gradient? Why is it called so many times?
And is that what we expect to do on hardware or is this just a simulation procedure to calculate the gradient?
Edit: Why does the sf.fock device only support finite-difference? From this post my usecase could support automatic differentiation i.e. parameter shift-rule since the Kerr gate is at the end of the layer.