Quantum Transfer Learning Question

Dear andremari.

When unzipping the hymenoptera.zip, does this subfolder has to be on my laptop, or somewhere else? I have extracted into the subfolder, but my on my laptop, all subfolder are divided by “”, not “/” like it says in the code. Is this Mac & Linux specific? I presume this is the reason I get FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘…/_data/hymenoptera_data\train’

Hi @_risto,

Thank you for your interest in the tutorial!

Could you check if your folder structure matches the following:

.
│── tutorials
│ ├── _data
│ ├── ├── hymenoptera_data
│ ├── quantum_transfer_learning
│ ├── ├── tutorial_quantum_transfer_learning.ipynb

Let’s assume that we have a general (arbitrarily named) tutorials folder. This folder could then contain a subfolder quantum_transfer_learning with the tutorial_quantum_transfer_learning.ipynb file in it. The important thing would be to place the _data folder that we obtain after extraction into the general tutorials folder (on the same level as quantum_transfer_learning).

Was able to reproduce your error when my folder structure did not look as above. Tried it on a Windows machine, os.path.join is used for creating the path and helping with platform-independence.

Let us know how it goes!

Hi antalszava

Yes, it matches. I still get : FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘…/_data/hymenoptera_data\train’
After extraction, I don’t get the “_data” file, but “hymenoptera_data”. I have created the “_data” folder and have put the “hymenoptera_data” file inside it, according to the folder structure.

Hi @_risto,

Thanks for the follow-up!

The problem would, in that case, be related to in which directory the jupyter notebook command was issued. That directory will be used when the path is being constructed (as the path is relative), hence could lead to the error that you are experiencing.

There could be two solutions to this:

  1. executing the jupyter notebook command in the quantum_transfer_learning folder
  2. specifying the absolute path to your data folder by changing the data_dir = "../_data/hymenoptera_data" line to: data_dir = os.path.abspath(r"C:\MyPath\pl_tutorials\_data\hymenoptera_data") where MyPath will be unique to your system.

Let me know how it goes and if you have further questions!

Hi antalszava.

Thank you for your answer.
1.) Have tried that, but it doesn’t work, as recent Anaconda installers have this box unchecked by default. I have tried to fix this using multiple solutions on stackoverflow, none of them worked.
2.) Where do I even write this?

I have tried other methods “run a jupyter notebook in a folder” from the web, none worked. When I tried this answer: https://stackoverflow.com/questions/35254852/how-to-change-the-jupyter-start-up-folder#40514875 , my laptop doesn’t even have a “C:\Users\username.jupyter\jupyter_notebook_config.py” file.

Hi @_risto,

For 2., it would be this line in the Python script. Once you have a local copy of the script, you can modify this line and try running the script.

Should you have a notebook, you could modify this line there too.

Hope this helps!

Hi antalszava.

Thank you for suggestions.
I have managed to open/run file in jupyter notebook, but still get the same error. I have tried the second suggested method, but then get “FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘C:\MyPath\pl_tutorials\_data\hymenoptera_data\train’”.

Hi @_risto,

That’s great to hear! :slight_smile:

Just to double-check: I notice that MyPath is still contained in the error message. In my previous comment MyPath was meant to serve as a placeholder inside of the string for the path. You’ll have to use the absolute path of the hymenoptera_data folder on your system.

Were you using a valid absolute path when receiving this error?

Hi antalszava.

It worked, :slight_smile: , I got up to step “model_hybrid = train_model(
model_hybrid, criterion, optimizer_hybrid, exp_lr_scheduler, num_epochs=num_epochs” when I get “SyntaxError: unexpected EOF while parsing”.

Hi @_risto,

That is really great news! :slight_smile:

Oh I see! Could you make sure that you have the following in that step:

  model_hybrid = train_model(
    model_hybrid, criterion, optimizer_hybrid, exp_lr_scheduler, num_epochs=num_epochs
)

From your error message I’d suspect that there is a ) character missing (and this would also explain the SyntaxError that you are getting).

Let me know how it is!

Hi antalszava.

It worked, thank you for help! I went s step further and test it with dataset of brain tumors, but after 20 ephos I’ve noticed the val_loss increasing. I assume it is due to ResNet network, that is pre-trained to non-brain-tumor dataset, although as I understand, this particular ResNet18 is trained from the whole ImageNet and should work on a new data. Why is it overfitting brain-tumor dataset?
How could I compare same ResNet with same data with classical ResNet : ResNet & quantum layer?

Hi @_risto,

That’s great, happy that it worked out!

It’s an interesting question! It is worth noting that questions of overfitting and generalisation are still unresolved in QML research. Therefore it would be difficult to say what exactly is amiss without delving more into your research problem. Could you perhaps elaborate further on how the comparison would be done between using ResNet with classical and quantum layers?

Hi @antalszava

I would like to compare classical network used as a base in this tutorial (ResNet) with one, where last layer is replaced with quantum circuit and another, where last layer is replaced by quantum embedding and metric learning in classifying different images (specific tumors, by types, grades and even by deletions of gene). Is that possible based on code provided in the tutorials? How do I test networks in % of accurately classified pictures or is this same as validation? How do I do test/valid in quantum embedding tutorial?
In addition, I looked at the other 3 classifier tutorials (who are based on iris dataset), but I presume coding only 4 features would not be enough to analyse more complex data?

Cheers,

Hi @_risto,

Yes, this should all be possible with slight modifications of the demo. If you use the NumPy interface, you can just write your own function to measure the accuracy. Many examples can be found in ML tutorials as well (e.g. Image classification or Training a Classifier). Hope this helps!

Hi @antalszava

Thanky for the help.
May I ask, the iris dataset (3 classes) in other tutorials is preprocessed into this: https://raw.githubusercontent.com/XanaduAI/qml/master/demonstrations/variational_classifier/data/iris_classes1and2_scaled.txt - how can I process any picture into this format?
In addition, it keeps happening, that subfolder with data is not found. I have tried to modified the code you have me for this particular tutorial, didn’t work. Is there any general code / setup, which I can modify and replace it in the original tutorial code?

Hi @_risto,

Generally, for the rescaling of the data, tools like https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html could be used.

This can be useful so that

  • Data can be rescaled to have zero mean and a variance of 1
  • For images, one may want to do some convolutional layers first and then flatten before passing to a QNode

Specifying the path for the data

One could use the appdirs library to specify the path to a file in a platform-independent way(should be installed by default, alternatively can be installed for example using pip: pip install appdirs).

Using the user_data_dir function from the appdirs library, we can point to pre-defined folders on each operating system. For a Unix based system, for example, this folder is going to be the /home/user/.local/share/ folder.

On my Unix based system locally, I executed the following commands in a shell terminal:

mkdir /home/antal/.local/share/data
mv iris.csv /home/antal/.local/share/data/

First I’ve created a data folder in /home/antal/.local/share/ and then moved the file for the Iris dataset (iris.csv) to this folder. This second step assumes that iris.csv is located in the directory where the commands were executed.

After these changes, I could add in the following modifications to the tutorial at the related parts:

# Additional imports required
from appdirs import user_data_dir
import os

# Querying the directory where data will be placed
# For example `/home/antal/.local/share/data` on a Unix based system
directory = user_data_dir("data")

def load_and_process_data():

    # Loading the data file
    # os.path.join(directory, "iris.csv") will output
    # for example `/home/antal/.local/share/data/iris.csv` on a Unix based system
    data = np.loadtxt(os.path.join(directory, "iris.csv"), delimiter=",")

This is a potentially more general way of placing user data.

Having said that, the previously suggested approach of explicitly passing the absolute path of the data as a string should also work well (for this tutorial this should be passed as the first argument to the np.loadtxt function).

2 Likes

Another note on how the data file linked was created specifically (thanks to @Maria_Schuld for checking this!).

After loading the very original data from the iris dataset, the first two features and the first two classes of flowers are extracted. After that, all features are shifted to the positive subspace (by subtracting the smallest example of a feature across the data) and are multiplied by 1/2.

The 1/2 multiplier is not of too much importance here, since the data lies safely within the period of [0,2pi], which is important because of the periodicity of a Pauli rotation.

In code:

data_original_feat1and2 = 0.5*(data_original_feat1and2 - np.min(data_original_feat1and2, axis=0)) 

Hope this helps!

1 Like

Hi @antalszava.

Thank you for the explanations. While running the quantum embedding tutorial, the following link is not valid anymore: https://github.com/PennyLaneAI/qml/blob/master/implementations/embedding_metric_learning/X_antbees.txt .

Hi @_risto,

Thank you for catching that! The file is now located at this address.

Hi @antalszava

Just to double-check. Again, if I want to take any image dataset, and make pre-extracted feature vectors of the images (like in the last link), do I need to use once already mentioned method: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html ? Or something else?