Equivalence between classical NNs and CV model QNNs


I have been playing around with CV neural networks for function fitting. As described in Killoran et al. (2019), classical neural networks are embedded in this model whenever no entanglement and superposition are introduced (among some other conditions). For the one qumode case, this effectively means setting the interferometer phases to 0.

I was trying to verify this result by fitting some functions both with the quantum model and with an equivalent classical model. Surprisingly for me, I am not being able to reproduce with the classical network the results obtained with the quantum one. I guess the way in which I am building the “equivalent” classical network is not correct. I am using one qumode and four layers, as suggested in Function fitting with a photonic quantum neural network (models with more qumodes are quite computationally expensive to simulate), which gives results as the following ones for simple functions and 100 iterations

descarga (1)

The classical network should be a four-layer network defined with Keras as

model = keras.Sequential()
model.add(keras.layers.Dense(units = 1, activation = 'linear', input_shape=[1]))
model.add(keras.layers.Dense(units = 1, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'linear'))
model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=1e-2))

However, the network is not training at all (not very surprising for a 1 neuron per layer NN). For it to work, I have to use many more neurons per layer.

I have noted that the quantum layers use more parameters to perform the same linear transformation, so that there are more free parameters to train. However, the final transformation is still the same, so better results should not be expected. What is not totally equivalent is the non-linear Kerr transformation, because it also has a trainable parameter (in contrast to usual classical activation functions). The point is, even when I fix that parameter, results are pretty decent for the quantum case, in contrast to the classical one. What am I missing? What does it make this comparison unfair? Thanks in advance!

Hey @Pablo_Vinas! It will probably be next week when I’m able to give you a response here. Most likely, I will need to consult the authors of this paper :sweat_smile:. Hang tight — I will update you ASAP!

1 Like

Hey @isaacdevlugt! Take your time, you already helped me twice. Thank you so much :smile:

1 Like

Hey @Pablo_Vinas! Just wanted to update you that I reached out to the author of this paper. Will get back to you as soon as possible!

1 Like

Hi @isaacdevlugt! Thank you for your involvement, let me know what you find out!

Hi @Pablo_Vinas,

Thanks for your interest in our work! :raised_hands:

The conceptual connection noted between the classical model and the quantum model where no entanglement/superposition is very cool for sure, but some care must be taken to assure the conditions hold. For instance, we need that the \hat{x} and \hat{p} components never mix anywhere in the model (at the heart of it, states which have no superposition content in the \hat{x} basis will have continuously infinite superposition structure when expressed in the \hat{p} basis).

You are right to identify the Kerr nonlinearity as the culprit here, for two reasons:

  • It actually does a quite complex nonlinear operation, even when unparameterized; compare this to the usual reLU from NNs, which is basically a piecewise linear function (as linear a nonlinearity as one could imagine)
  • The Kerr gate inextricably mixes the \hat{x} and \hat{p} eigenstates. In other words, it generates superposition/entanglement

These two together will cause the QNN to be able to do more complex operations than the simple NN counterpart could do, which seems to be what you are observing.

Note that using the other conventional photonic nonlinearity mentioned in the paper (the cubic phase gate) would mitigate the second point above (it does not generate superposition/entanglement of \hat{x} eigenstates), but if you have just a single mode, this would just generate an (unobservable) global phase, which would mean that the model is effectively fully linear.

I would not want to make a bet one way or another about its capabilities in practice, but a two-model model with local cubic phase gate nonlinearities might form the best basis for comparison.


Hi @nathan! Thank you for your answer, it was so clear. I guess it was rather naive to think that the two models should work in the same way just because the classical is embedded in the quantum one, as the transformations available in photonic computers are also limited and might not lead to the equivalence between models. It is interesting how well the Kerr gate performs in this context. Congratulations on your work!

1 Like

Straight from the source! Thanks @nathan :raised_hands:!

1 Like