Possible to create a QNN like classical one?

Hi
Good day.
Say, I have a classical NN with 4 layers structured as [4,5,3,2] i.e., four input nodes, five nodes for the first hidden layer, three for the second, and the last output nodes of size two.

According to the paper “Continuous-variable quantum neural networks”, there is a layer-by-layer structure in QNN. Most examples there only maintain the same size for each layer. In QNN, no “neuron” is indicated.

1.What is the most appropriate structure for QNN if I wish to have a fair performance comparison between the classical one mentioned above and QNN?

  1. The examples in the paper are using the same size structure for each layer. How do I code if I want a network with different sizes for each layer? Is there any example that can be referred to?

Thanks

Hi @SuFong_Chien,

Good question! In Fig. 2 of the paper, we show how layers with different widths can be combined. In the case of width [4,5,3,2], we would have a 5-qubit device and, for example the first layer would operate on the first 4 qubits. This can be realised in PennyLane using:

import pennylane as qml
from pennylane import numpy as np

widths = [4, 5, 3, 2, 1]
wires = max(widths)
cutoff = 4
seed = 1967

dev = qml.device("strawberryfields.fock", wires=wires, cutoff_dim=cutoff)


@qml.qnode(dev)
def qnn(inputs, weights):
    
    qml.templates.DisplacementEmbedding(inputs, wires=range(wires))
    
    for weight, width in zip(weights, widths):
        qml.templates.CVNeuralNetLayers(*weight, wires=range(width))
    
    return qml.expval(qml.X(0))


weights = [qml.init.cvqnn_layers_all(n_layers=1, n_wires=width, seed=seed) for width in widths]
inputs = np.random.random(wires)

dqnn = qml.grad(qnn)

print(qnn(inputs, weights))
print(dqnn(inputs, weights))

Hope that helps!

Hi Tom

TQVM for your solution. Before I go to my journey for XANADU QNN. (especially for the boson type). I wish to have further understanding of the given codes.

  1. How do you judge the cutoff =4 is good enough? It is refer to number of phtons in the fock stage right? If I am not mistaken the function fitting example set to 10.

  2. What is the purpose of ‘1’ in [4,5,3,2,1]? In this case we need 5 wires to construct the network right?

  3. I am sorry I am a bit lost now. If I want to train this QNN, I must pass train data as input (making loops for shuffling data and etc). After that I use the weight for prediction, am I right? I suppose there is a *.predict() function available in PennyLane, isn’t it?

  4. Possible to print out the network diagram like what we can do in IBM qiskit?

  5. There is the last “out of scope” question for you :slight_smile: . Most of the engieering work like to ask about the Big-O for a proposed algorithm.What is the Big-O for XANADU CV quantum neural network? I have studied two other different types of QNNs. For example, paper written by ‘Quantum algorithms for feed forward neural networks’ by J. Allcock et al do describe the Big-O. However, there is no description about Big-O for the paper ‘Training deep quantum neural networks’.

I am sorry to dump you ‘trouble’ but I hope I can be very familiar with XANADU product in the future. Thanks.

Hey @SuFong_Chien,

Good questions!

How do you judge the cutoff =4 is good enough? It is refer to number of phtons in the fock stage right? If I am not mistaken the function fitting example set to 10.

Yes, the cutoff refers to the number of photons that are kept track of in each mode (strictly, a cutoff of 4 means that we track the 0, 1, 2, 3 photon states). Ideally, the higher the cutoff the better, allowing us to have greater levels of squeezing or displacement that will push up the average number of photons. Unfortunately this comes with a big trade-off in simulation speed - the overhead is (cutoff) ^ (modes), so increasing the number of modes is exponentially hard. We can get away with a cutoff of 10 because we have few modes, but going up to 5 modes we may need to compromise. One way to check is to calculate the trace of the output state, if it is significantly below 1 then we know the cutoff is perhaps too low.

What is the purpose of ‘1’ in [4,5,3,2,1]? In this case we need 5 wires to construct the network right?

Ah, the one was just to bring us down to a single mode and have a 1D output for the sake of this prototype, similar to having e.g. a final neuron in a neural network as a binary classifier. The code above should allow you to be free in your choice of widths.

I am sorry I am a bit lost now. If I want to train this QNN, I must pass train data as input (making loops for shuffling data and etc). After that I use the weight for prediction, am I right? I suppose there is a *.predict() function available in PennyLane, isn’t it?

Training the QNN would look something like:

import pennylane as qml
from pennylane import numpy as np

widths = [4, 5, 3, 2]
wires = max(widths)
cutoff = 4
seed = 1967

dev = qml.device("strawberryfields.fock", wires=wires, cutoff_dim=cutoff)


@qml.qnode(dev)
def qnn(inputs, weights):
    
    qml.templates.DisplacementEmbedding(inputs, wires=range(wires))
    
    for weight, width in zip(weights, widths):
        qml.templates.CVNeuralNetLayers(*weight, wires=range(width))
    
    return qml.expval(qml.X(0)), qml.expval(qml.X(1))


weights = [qml.init.cvqnn_layers_all(n_layers=1, n_wires=width, seed=seed) for width in widths]
inputs = np.random.random(wires)
outputs = np.random.random(widths[-1])

opt = qml.GradientDescentOptimizer(stepsize=0.4)


def cost(weights):
    return np.sum((qnn(inputs, weights) - outputs) ** 2)
    

print("Example weight before: ", weights[-1][0])
    
for i in range(2):
    weights = opt.step(cost, weights)

print("Example weight after: ", weights[-1][0])

Possible to print out the network diagram like what we can do in IBM qiskit?

We have a circuit drawer that can be output using print(qnn.draw()). However, it is a text-based drawer and the circuit in this case might be a little too deep, so that the output doesn’t fit nicely on your screen.

There is the last “out of scope” question for you :slight_smile: . Most of the engieering work like to ask about the Big-O for a proposed algorithm.What is the Big-O for XANADU CV quantum neural network? I have studied two other different types of QNNs. For example, paper written by ‘Quantum algorithms for feed forward neural networks’ by J. Allcock et al do describe the Big-O. However, there is no description about Big-O for the paper ‘Training deep quantum neural networks’.

This is still a bit of an open research question, especially for near-term algorithms. A lot of algorithms do show speed ups or improved data capacity, but these algorithms tend to require quite deep circuits and error correction. Instead, more “near-term” algorithms focus on the variational approach: having a fixed circuit of limited depth and altering the parameters. In that case the scaling is less clear, but we might be expecting things like an improved quality of training (although, this needs to be formalized).

Hi Tom

Thanks for your explanation. I have learned much.

If I have an input =17 data, the network now with the width =[17, 4, 3, 2]. In this case,wires = max(widths) = 17. Can the program that using normal computer handle this? The cutoff is still =4?

In this case, possible to apply XANADU photonics real quantum computer to simulate? Are we allow to use the quantum computer?

If I wish to compare a big-size classical NN i.e., [17, 256, 128, 64, 32, 16, 4] to QNN for their accuracy of prediction, Still possible to use CVQNN for simulation? Is the cutoff = 7?

Hi @SuFong_Chien,

If I have an input =17 data, the network now with the width =[17, 4, 3, 2]. In this case,wires = max(widths) = 17. Can the program that using normal computer handle this? The cutoff is still =4?

You can always attempt to run the circuit with different widths/inputs, but since the overhead is (cutoff) ^ (modes), as @Tom_Bromley mentioned, it’s difficult to simulate many modes and/or a high cutoff. 17 modes would be a lot, even for a cutoff value of 4, and would most likely need more memory than modern personal computers have. An even higher cutoff would probably be needed as well to get any realistic results.

In this case, possible to apply XANADU photonics real quantum computer to simulate? Are we allow to use the quantum computer?

Unfortunately this type of circuit cannot be executed on current hardware. The issue is that the CVNeuralNetLayers involve a non-Gaussian gate in each layer (the Kerr gate). This gate is quite challenging and not possible on our current device. We recommend that you checkout here to understand the type of circuits that will fit on our present device. However, you are encouraged to request access to our quantum hardware and cloud here.

If I wish to compare a big-size classical NN i.e., [17, 256, 128, 64, 32, 16, 4] to QNN for their accuracy of prediction, Still possible to use CVQNN for simulation? Is the cutoff = 7?

This would probably be very difficult due of the high number of modes and the large cutoff, for the same reasons as explained above.

I hope this answers your questions. :slight_smile:

Hi Theodor

it is very clear now. A bit disappointed when I was told that there is a limitation to test the real photonics QC :sob:. Anyhow, thank you very much for your very informative explanation. I hope I can learn as much as possbible from you guys.

Dear Theodor & Tom

Just have something in mind. Supposedly, I wish to “escape” from the (cutoff) ^ (modes) problem, I move to classical-quantum NN. The first layer input of size 17 modes let the classical part to handle it and let it pass to the next layer that with a small number of output nodes that can be handled by the quantum part, for example classical [17, 8]-> quantum [4,3,2]. Is it a workable solution? If so, what is the advantage by comparing this solution with the pure qnn? What are the interesting points if we compare both hybrid QNN and pure QNN to the pure classical NN for prediction?

Hey SuFong_Chien,

Let me jump in here. Sure, having a hybrid of classical and quantum layers in a neural network is something people do a lot (see also the transfer learning tutorial and the neural network module extending classical NNs generically).

What this means in terms of power and mechanics of the classifier is subject to ongoing research, and I don’t think anyone knows for sure yet what advantages such an approach would have compared to a pure QNN or classical NN.

1 Like

“What this means in terms of power and mechanics of the classifier is subject to ongoing research, and I don’t think anyone knows for sure yet what advantages such an approach would have compared to a pure QNN or classical NN.”

Can “A rigorous and robust quantum speed-up in supervised machine learning” (https://arxiv.org/pdf/2010.02174.pdf) that has just shared by Robert Sutor justify the advantage?

The argument in the paper is basically that if your quantum machine learning model does something like Shor’s algorithm, there is an advantage of quantum processing (which is beautifully shown, but does not really tell us much about whether QML is useful for more general applications).

So, if your QNN layer learns a quantum algorithm that is thought to be classically intractable, you could definitely argue that some advantage is achieved :slight_smile: The challenge is to find the dataset that leads to a cost function which favours such a solution…in other words, is there an ML problem for which Shor’s algorithm is required?

This is the ever-present question in QML: we do not only have to show that a QML algorithm is classically intractable, but also that it can be learnt efficiently and that it is useful for a realistic class of problems. That’s really hard to formulate/benchmark/model.

Hi

I am sorry I have not started any simulations yet. It is because I can only use my free time for learning a problem that I am interested. I have just gone through a paper that I am intereted in, I found there is no hope to use full QNN model because of the network size issue i.e., [17, 16, 8, 4] that cannot be handled by the machine with (cutoff) ^ (modes) = 4^17. I have tried to use the code given by Tom, to construct a hybrid version which is classical [17, 16, 8] which I take this as “data transformation” and feed the output to QNN with [8, 4, 4] or [8,8,4]. However, I do not manage to code due to lack of skills and knowledge. In view of this, I move to the “Fraud detection fitting script”. I have some issues to raise for this hybrid model that I do not fully understand acording to the script.

  1. Is the output layer of the classical net that set to 14 is because of the paper just takes half of the data features to study?

  2. The paper does not state why it chooses 4 points to U gate, 2 for each S gate, 2 for each Dgate, and 1 for each K gate? Is it an arbitrary choice?

  3. If I set 8 output neurons, can I simply put 2 for U, 1 for S,D,K each?

  4. This example chooses the same number of neurons for the input layer and both hidden layers in claasical net, is it free to choose different type for this particular case?

  5. This example has two final outputs in qnn for classication, it is either zero or one. However, my work needs to get 4 real outputs, how to modifiy the so-called “one_hot_input”? I think I may need phi = a1*[1,0,0,0]+ a2*[0,1,0,0]+ a3*[0,0,1,0]+ a4*[0,0,0,1]. Can you show the way?

Thanks

Hi @SuFong_Chien,

In view of this, I move to the “Fraud detection fitting script”

Just to check, do you mean this script?

Is the output layer of the classical net that set to 14 is because of the paper just takes half of the data features to study?

Looking back at the paper, I recall we took the first ten principal components from the credit card data as the input features. We then had a series of classical layers with the final layer having 14 features. The justification for this was that there are 14 free parameters in the first layer of a 2-mode CV layer.

The paper does not state why it chooses 4 points to U gate, 2 for each S gate, 2 for each Dgate, and 1 for each K gate? Is it an arbitrary choice?

Good question! We chose these numbers to match the free parameters of each gate. For squeezing (Sgate) and displacement (Dgate) the input can be thought of as complex, and so there is both a magnitude and angle input. The interferometer is broken down into a beamspliter (BSgate) and a rotation gate (Rgate) on each mode. The beamsplitter has two angle parameters, while each Rgate has a single parameter, summing to 4. Finally, the Kerr gate is simply described by a single parameter. You could check out here for some more context on the number of parameters.

If I set 8 output neurons, can I simply put 2 for U, 1 for S,D,K each?

Yes, this should be possible. Overall there are 14 gate parameters. If we only control 8 and assume the rest are zero or fixed, we may expect the layer to be less flexible and for training to hence take longer or maybe fail.

This example chooses the same number of neurons for the input layer and both hidden layers in claasical net, is it free to choose different type for this particular case?

Yes, there is also freedom here. The choice of [10, 10, 14] was arbitrary, except for the 14 which was motivated as discussed above. You could do [8, 8, 14] and things may work just as well. This is a familiar machine learning question about how to design the network.

This example has two final outputs in qnn for classication, it is either zero or one. However, my work needs to get 4 real outputs, how to modifiy the so-called “one_hot_input”? I think I may need phi = a1*[1,0,0,0]+ a2*[0,1,0,0]+ a3*[0,0,1,0]+ a4*[0,0,0,1]. Can you show the way?

If we stick with two modes, you could associate the probability of two photons in either mode as the remaining two classes. For example:
(# photons in mode 1, # photons in mode 2)
(0, 1): class 1
(1, 0): class 2
(0, 2): class 3
(2, 0): class 4
Practically this could be achieved by updating the one_hot_input around line 286.

I think this suggestion is for classification purposes. What I want is to predict 4 real values, which is not for classification. I have browsed XANADU web, there is an example like this:

def cost(weights):
# Create a dictionary mapping from the names of the Strawberry Fields
# symbolic gate parameters to the TensorFlow weight values
mapping = {p.name: w for p, w in zip(sf_params.flatten(), tf.reshape(weights, [-1]))}

# run the engine
state = eng.run(qnn, args=mapping).state
ket = state.ket()

difference = tf.reduce_sum(tf.abs(ket - target_state))
fidelity = tf.abs(tf.reduce_sum(tf.math.conj(ket) * target_state)) ** 2
return difference, fidelity, ket, tf.math.real(state.trace())

Am I right to look into this?

thanks

image

BSgate(bs_variables[layer_number, 0, 0, 0], bs_variables[layer_number, 0, 0, 1])
| (q[0], q[1])

    for i in range(mode_number):
        Rgate(phase_variables[layer_number, i, 0]) | q[i]

From the diagram above, q[0] passes through BS and next to Rotation gate, q[1] doesn;t pass through Rgate, why should we compute Rgate(phase_variables[layer_number, 1, 0]) for q[1]? Is it means that both q[0] and q[1] have been splitted into two beams each that means they are 4 beams come out from BS?

Hey @SuFong_Chien,

I think this suggestion is for classification purposes. What I want is to predict 4 real values, which is not for classification.

Ah, for regression you could instead output the result of a continuous-valued homodyne measurement. For example as shown in the earlier comment, the qnn() function returns two real numbers measuring the X-position on each mode. To get to four outputs, you could either increase the number of modes to four, or use two systems composed of two modes. Also, what we often do is append a classical neural network onto the end (see the concept of a dressed quantum circuit). This let’s us not worry so much about how we decode the results from the quantum circuit.

The code you shared looks like a snippet from Strawberry Fields that calculates the fidelity between a state and a target state.

From the diagram above, q[0] passes through BS and next to Rotation gate, q[1] doesn;t pass through Rgate, why should we compute Rgate(phase_variables[layer_number, 1, 0]) for q[1]? Is it means that both q[0] and q[1] have been splitted into two beams each that means they are 4 beams come out from BS?

:thinking: This is likely a difference between the code and the diagram. The diagram shows applying just an R gate on the first mode, which could be viewed as the “minimum” we need to do in this transformation. We are also able to add an R gate onto the second mode for a bit more flexibility, knowing that the case with zero phase on the second mode approximates the case with no R gate on the second mode.

In other words, I think both options are fine, but adding an R gate onto the second mode might add a bit more flexibility.

Hi Tom

Trying to run the program fraud_detction.py. The error message comes out " is “AttributeError: module ‘tensorflow’ has no attribute ‘placeholder’”. I solved it by putting " import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()“. This problem then doesn’t come out anymore but another problem come out " from .circuit import Circuit …trawberryfields\backends\tfbackend\circuit.py”, line 304 “alpha = self._maybe_batch(alpha)” IndentationError: expected an indented block. How to solve this? Thanks

Do you mean I set either classification = tf.placeholder(shape=[batch_size, 2,2], dtype=tf.int32) OR classification = tf.placeholder(shape=[batch_size, 4], dtype=tf.int32) which gives prob = tf.abs(ket[i, classification[i, 0, 0], classification[i, 0, 1], classification[i, 1, 0],[i, 1,1]]) ** 2 OR prob = tf.abs(ket[i, classification[i, 0], classification[0, 1], classification[i, 02],[i,3]]) ** 2? Following that I just need to take out the one-hot part, the probaility I get will be the prediction values, am I right? This {classification: one_hot_input} also has to take it off ?

Thanks

Hi @SuFong_Chien,

Ah, one thing to mention about this repo is that it was made with older versions of Strawberry Fields and TensorFlow, and we unfortunately have not had the chance to update to be compatible with current versions. In the Requirements section, you can find some details on how to set up the correct environment.

Regarding your second question: there are probably multiple ways you could get the idea to work. For me, classification = tf.placeholder(shape=[batch_size, 2], dtype=tf.int32) would probably be sufficient - we do not need to change the shape, we just know that terms in the classification tensor may now have value 2 instead of being restricted to {0, 1}. I believe that prob would then remain unchanged. Remember, the reason for the 2 in shape [batch_size, 2] is to reflect the number of modes rather than the number of classes. The remainder might be something like:

        for i in range(batch_size):
            if int(classes[i]) == 0:
                # Encoded such that genuine transactions should be outputted as a photon in the first mode
                one_hot_input[i] = [1, 0]
            elif int(classes[i]) == 1:
                one_hot_input[i] = [0, 1]
            elif int(classes[i]) == 2:
                one_hot_input[i] = [0, 2]
            elif int(classes[i]) == 3:
                one_hot_input[i] = [2, 0]

This might work, but I haven’t tried directly.

1 Like

Thanks. Need to do installation stuff for the said version :frowning:

Hi Tom

I finally managed to run fraud_detection.py by installing tensorflow and etc according to the requirements. I found that they are two folders in the outputs folder, i.e. models and tensorboard. In the “models”, they are files as belowimage . However, I cannot view this file. How to view the sess.ckpt-xxxxx files? Is sess.ckpt-26000.data-0000-of-00001 the “train model” that will be used for testing/verifying ? Thank you.