Hi, I was wondering if the state preparation (feature embedding) used in the example below, could be proven or conjectured to be hard to simulate classically, i.e. providinig some degree of quantum advantage.
I also noticed that the Iris data set used (iris_classes1and2_scaled.txt) has already undergone some preprocessing, I would like to know what kind of preprocessing was applied to the data set.
This is amplitude encoding, which literally means you prepare an amplitude vector that resembles your data input. This is of course not classically hard. In fact, it is a rather complex procedure for a quantum circuit as you can see, while classically, you would not have to do anything.
The Iris dataset is scaled (by putting zero mean and unit deviation), and classes 1 and 2 were selected.
I’ve been reading the paper https://arxiv.org/pdf/1804.11326.pdf " “Supervised learning with quantum enhanced feature spaces” where they implement a quantum feature map based on quantum circuits that are conjectured to be hard to simulate classically. Specifically, the second order expansion feature map. Does Pennylane contemplate to implement this feature map in its roadmap?
I’ve been playing with the available feature-embedding circuits such as basis , amplitude encoding circuits that are available in the library, but I haven’'t found this specific class of feature maps
Yes, we want to significantly extend the library of embeddings, and this would be one of the first ones to add. But in the meantime, feel free to code this up yourself and make a pull request .
Thanks for your question. If you plan to use AmplitudeEmbedding as a way to input data to your quantum circuit, that is the correct function to use, yes
If your dataset is already in a vector/array form (e.g., a numpy array), then that should be sufficient for the input to AmplitudeEmbedding function. If it’s not in numerical form, you’ll have to do that “data wrangling” beforehand, just like you would with any classical ML dataset.
Note that this embedding assumes a vector of dimension 2^N for N qubits; if your data dimension is not a multiple of two, you can use the pad option to “fill in” any missing entries with a numeric value of your choice (likely 0). Be mindful also of the normalize option, which should be used if you want your embedded data to be a properly normalized quantum state.
As hinted in the documentation page you linked, there are also other possible choices for embedding classical data into a quantum computer. Depending on your needs, you might want to experiment with different options.
A final note: as indicated in the docs, AmplitudeEmbedding is currently not differentiable. This is no issue if your input is truly “data”, but if the input features are coming directly via some upstream model (e.g., in pytorch or tensorflow) that you want to train, this constraint would prevent those upstream layers from being trainable).