Tensorflow gradients for homodyne samples

hi,
I wish you all happy holidays and Merry Xmas!
the quantum neural network tutorial uses the state method as an exact measurement. i tried to make a very simple version of it i.e. single layer and no inputs and another version with a single input. then i tried to compute the gradients with respect to the samples. the measurement was homodyne.
the output from tape.gradient is None

to reproduce the error:
the cost function can be somthing like this:

tf.abs(tf.squeeze(results.samples) - 1)

the number of modes are 1
SF version is 0.16.0

Hi @kareem_essafty, thanks for the question.

What output were you expecting here? Stochastic operations have no associated gradient. This is not even specific to quantum; the ops in TensorFlow that sample from probability distributions also have no gradients. It is similar for quantum models that sample from a probability distribution.

Gradients can be computed for expectation values, since those values are deterministic (if you compute expectation values in repeated experiments with the same parameters, you will always get the same result), or for states (since the simulation of a state consists of just deterministic linear algebra operations). But measurements are stochastic, so the notion of a gradient is not well-defined.

i think it’s the tf.random.categorical that makes it ** non-differentiable**. please correct if i’m wrong.

on the other hand, is there a workaround to work with rho_dist directly or even the log version of it?
Since I don’t care about the projected state i vary the number of shots in the tf.random.categorical and then average the gathered q_tensor it’s very naive and basic.
It’s just that the Berry-Esseen distribution requires a lot of assumptions and I don’t if they actually occur.
but thanks a lot for your answer :slight_smile:

Yes, that’s right, tf.random.categorical is the particular TF operation that does not have a gradient represented. To reiterate, any stochastic operation will have the same issue, it’s not something particular to that op or to TensorFlow.

I believe that rho_dist should still be differentiable with respect to upstream inputs, yes (though I didn’t run code to check). Regarding the question of whether there is a “workaround”, it’s really hard give an answer. There are techniques like REINFORCE (used, e.g., in reinforcement learning) that let you estimate gradients when stochastic processes are involved, but you’ll need to provide clearer details about exactly what you’re trying to do for us to help more.