Why does the embedding_metric_learning case not work?





We study the case AA and find that the training is invalid, the trained parameters are the same as the initialized parameters, I wonder why?

Step 0 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]
Step 1 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]
Step 2 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]
Step 3 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]
Step 4 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]
Step 5 done.
[tensor([[ 0.22291946, -0.11248928, 0.04565372, …, 0.06703037,
-0.02766389, 0.08405803],
[ 0.08830776, 0.0598873 , 0.08355277, …, -0.03754032,
0.03460176, 0.02242945]], requires_grad=True), tensor([[ 0.11442182, -0.17605075, -0.01458899],
[-0.13463645, -0.11874351, -0.0132903 ],
[ 0.03468135, -0.05576273, 0.01454365],
[-0.10994996, -0.01509634, 0.07580086]], requires_grad=True)]

We found out that the training is ineffective, the parameters generated by each iteration do not change and therefore cannot be dichotomized.
@Maria_Schuld
@manu_manohar
@Andre_Tavares

Hey @RX1! The demo was taken down years ago, I wonder where you found it? :slight_smile:

The reason was a bug: the training data and test data was accidentally identical. But when fixing the bug, the results of the demo didn’t work out so easily any more (more precisely, the hybrid model overfitted). I simply didn’t have the time to rewrite the demo, so it was removed…

In other words, you are using this outdated code at your own risk :slight_smile:

emmm… . well, thanks for the answer, I followed you on Google Scholar. Is there any task about QNN or VQA for MNIST dichotomization? I’m trying to study it.

Hey @RX1,

There are some PennyLane demos that use the MNIST dataset here:

Using quantum convolutional neural networks

A demo made by the community

Using a quantum GAN

Hi @RX1 ,

I’ve been looking into this demo for a while now and the reason it doesn’t train is because the most recent versions of pennylane don’t seem to be able to recognize the hybrid mix of parameters as being trainable parameters. If you use version 0.18.0 of pennylane, it should train.

However, as @Maria_Schuld pointed out, it does overfit greatly on the image data. I’ve tried ommiting the ResNet-18 and carrying out various degrees of PCA on the pixel data, as well as carrying out PCA directly on the ResNet-18 output features but this does not improve the generalization (the level of dimensional reduction is likely too extreme).

It actually does work pretty well on datasets that have a greater number of training samples than the number of features per sample, though - I used the UCI ML Breast Cancer Diagnostic dataset which has 30 clinical features associated with each sample and around 500 total samples (around 300 of which are training samples), and it generalizes pretty well with test set precision, recall and F1 scores all above 0.96 (if you apply PCA to the 30 features). Test set cost of course doesn’t go as low as it did for the original overfitting image data, but it’s not too bad - I’ve found it can go as low as about 0.25.

As far as I can tell, with the current circuit, the method should work well with other datasets, again as long as you keep the number of features representing each sample to be substantially lower than the number of training samples - I imagine it could work with image data too, but perhaps only with a significant number of training images. Each image in the MNIST dataset is fairly simple, and the number of samples is quite large so I can see it possibly working quite well with this approach, particularly after moderate dimensional reduction.

I’ve submitted the PCA breast cancer version of the demo as a new pull request to the pennylane qml repo, hoping it can maybe be used a revived/revised version of the old demo. For now, you can find it in this fork.

1 Like