Question about the use of amplitude embedding mentioned in PennyLane's page

This page explains the definition of the amplitude embedding and also provides one example of how to use it. In this example, all samples in a classical dataset are concatenated together into one vector. Then we can get a quantum state |D> by encoding this vector into the amplitudes of it. So the state |D> contains the information of the whole dataset. I think this encoding method could significantly speed up the machine learning model training, as we use only one quantum state to represent the whole dataset. But I don’t know how to use this state |D> to train a machine learning model. For example, if we evolve this state using a variational quantum circuit and finally perform measurement, what is the meaning of the outputs and how can we use these outputs and labels of the dataset to define the loss function for a supervised learning model? I would appreciate it if any one could give me some hint in this sense. Thanks!

Hey @gojiita_ku! Thanks for your question. I’m not sure I understand how you want to use amplitude embedding. But, as you said, amplitude embedding (AE’ing) is just a way to encode a dataset into a quantum state. Typically, we encode it into a quantum circuit in the interest of performing a classification task / feature mapping. The measurements one performs after the variational parts of the circuit must be such that we can compare to the data’s original labels and minimize some loss function in hopes of training our circuit to “learn” the features.

Maria Schuld puts it nicely in her paper:

"We interpret the process of encoding inputs in a quantum state as a nonlinear feature map that maps data to quantum Hilbert space. A quantum computer can now analyse the input data in this feature space. Based on this link, we discuss two approaches for building a quantum model for classification…

By doing AE’ing, we can leave it to a quantum computer to deal with our classical input data (learn its features and classify). Hope this helps! If you’re having trouble implementing it yourself for your particular task, please post your source code here!

Hi, @isaacdevlugt! Thanks for your reply. I think we usually encode one data sample per time into a quantum state using amplitude embedding so that the output of variational circuit can be compared with the label of this sample. But if we encode all data samples of the dataset into a quantum state using amplitude embedding, like how the state |D> is obtained in this page, I don’t know how to compare the output of this circuit with labels of these samples. This is what I’m confusing about. So do you have any idea with
the motivation of generating this quantum state |D> encoding the information of all data samples and how to use it? Or I guess the author of this documentation might be clear about it?

Let’s say your data’s labels are binary: -1 represents class A and +1 represents class B. Your circuit output that will be used to compare to those labels should therefore be binary as well. E.g., a Pauli-Z measurement in this case.

The motivation for doing this, in a sense, is to leverage quantum. The papers referenced here do a nice job motivating!

1 Like

Hi @gojiita_ku, to complement @isaacdevlugt’s answer, if you encode your classical data using amplitude embedding you could end up with a quantum state that represents your data, however you cannot input quantum states into classical computers. This means that you will have to measure and re-encode your results into your classical computer.

For example, let’s say your data is the following vector: [1,2,3,4]. You can encode it using Amplitude Encoding. PennyLane will add a few gates to your quantum circuit so that the amplitudes of this quantum state represent your data. Usually what happens is that you would encode your data 1 by 1, so you would need 4 quantum circuits, one for each datapoint. From what I understand you want a single quantum circuit where you encode all of these datapoints at the same time. You definitely can do it using amplitude embedding but as you have noticed it becomes hard to interpret your measurement. When you have a single circuit per datapoint then your measurement can be directly compared with a label, however if you add all of your data at once it becomes harder to know what the measurement means.

My suggestion would be to encode your data 1 by 1 using a variational circuit, and then the measurement can be compared with the label for each datapoint.

I hope this helps!

1 Like

Hi @CatalinaAlbornoz, thanks so much for your reply. I think you got what I mean. Yes, I agree that it is hard to interpret the measurement if we encode all datapoints at once with one variational circuit. But I am just curious about the motivation of this idea demonstrated in this tutorial, as shown below:

Does the amplitude encoding implemented in this way have practical uses?

Hi @gojiita_ku, I now see what you mean. My colleague @Guillermo_Alonso had the following idea:

You can input your dataset D, your labels L, and an additional datapoint x into the quantum circuit by using amplitude embedding. You then add a variational circuit W. The measurement is an expectation value which you can compare with the label of x. This means that you can then train the variational circuit for different datapoints x so that when you get a new datapoint x_new, the expectation value of your measurement will help you predict the label of that new datapoint.

Note that in order for this to work you will have to separate the data you have into D and X, where X is the set of datapoints x that you will use for training.

This is just an idea so you should test whether it works in practice but hopefully it does. Let us know how it goes!

1 Like

Hi, @CatalinaAlbornoz, @Guillermo_Alonso, thanks so much for this clever idea which combines both dataset and labels!

My understanding is as follows. Training dataset D and training labels L actually do not directly contribute to model training or parameter updating, as we don’t use them to calculate the loss function. But they participate the training in an indirect way. The only thing directly contributing to the model training is the new datapoint x, as we calculate the loss function by comparing its prediction and label. So this can be considered as a supervised learning with batch size of one. Am I correct?

One more question. Is it possible to leverage this method to address the problem raised in this post?

Hi @gojiita_ku, the idea behind this is a mix between supervised and unsupervised learning. Basically we expect the algorithm to learn the best circuit parameters that make the expectation value match the new datapoint x (this is the supervised part). In order to learn this the model can leverage the fact of having dataset D and labels L, except that we don’t explicitly tell it that these are labels corresponding to a dataset (this is the unsupervised part).

We have no idea whether or not this will work for your problem, but it was the only option we could come up with for using Amplitude Embedding the way you wanted to use it.

Hi, @CatalinaAlbornoz. Thanks for your further explanation! I’ll try to implement this idea using PennyLane and see if it could work for my problem.

By the way, could you please share with me how you created the quantum circuit diagram which is uploaded in the previous reply? Thanks!

Hi @gojiita_ku, do let us know how it goes!

I used Keynote for the diagram :sweat_smile:

Hi, @CatalinaAlbornoz, thanks! I think using Keynote is one of the most efficient ways to create the diagram like that :smile:.

Sure, I’ll let you know how it goes!