Generally, for the rescaling of the data, tools like https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html could be used.
This can be useful so that
- Data can be rescaled to have zero mean and a variance of 1
- For images, one may want to do some convolutional layers first and then flatten before passing to a
Specifying the path for the data
One could use the
appdirs library to specify the path to a file in a platform-independent way(should be installed by default, alternatively can be installed for example using pip:
pip install appdirs).
user_data_dir function from the
appdirs library, we can point to pre-defined folders on each operating system. For a Unix based system, for example, this folder is going to be the
On my Unix based system locally, I executed the following commands in a shell terminal:
mv iris.csv /home/antal/.local/share/data/
First I’ve created a
data folder in
/home/antal/.local/share/ and then moved the file for the Iris dataset (
iris.csv) to this folder. This second step assumes that
iris.csv is located in the directory where the commands were executed.
After these changes, I could add in the following modifications to the tutorial at the related parts:
# Additional imports required
from appdirs import user_data_dir
# Querying the directory where data will be placed
# For example `/home/antal/.local/share/data` on a Unix based system
directory = user_data_dir("data")
# Loading the data file
# os.path.join(directory, "iris.csv") will output
# for example `/home/antal/.local/share/data/iris.csv` on a Unix based system
data = np.loadtxt(os.path.join(directory, "iris.csv"), delimiter=",")
This is a potentially more general way of placing user data.
Having said that, the previously suggested approach of explicitly passing the absolute path of the data as a string should also work well (for this tutorial this should be passed as the first argument to the