Variational Classifier - problem with weights

Dear all so i have this code

Blocks=0

@qml.qnode(dev)
def circuit(weights, x=None):
    for i in range(blocks):
        AngleEmbedding(x, wires = range(n_qubits))
        StronglyEntanglingLayers(weights[i], wires = range(n_qubits))
    return qml.expval(qml.PauliZ(0))

# variational quantum classifier
def variational_classifier(theta, x=None):
    weights = theta[0]
    bias = theta[1]
    return circuit(weights, x=x) + bias

# draw random quantum node weights
theta_weights = strong_ent_layers_uniform(layers, n_qubits, seed=42)
theta_bias = 0.0
theta_init = (theta_weights, theta_bias) # initial weights
  1. I would like to find a way to increase number of blocks but i have problem defining weights. Is there an easy way to do this?

  2. It would make more sense each block to have different weights

  3. Also is this similar to data-reuploading technique?

p.s. does the qml.enable_tape() make any difference in this sketch?

Not sure if it helps but i use this code to update weights:

for it, batch_index in enumerate(chain(*(n_epochs * [X_batches]))):
    # Update the weights by one optimizer step
    batch_cost = \
        lambda theta: cost(theta, X_train[batch_index], e_train[batch_index])
    theta = pennylane_opt.step(batch_cost, theta)

Many thanks!!

Hi @NikSchet,

Thanks for the question! :slight_smile:

1 & 2.

I would like to find a way to increase number of blocks but i have problem defining weights. Is there an easy way to do this?

It would make more sense each block to have different weights

Sure! One way would be to create a list of weights where the seed is dependent on the block number:

theta_weights = [strong_ent_layers_uniform(layers, n_qubits, seed=i+1) for i in range(blocks)]

This will produce different weights for each block. Another approach is to simply leave out seeding and just depend on randomness.

3.

Also is this similar to data-reuploading technique?

The structure is somewhat similar, however, in the multi-qubit case, the circuit with entanglement proposed in Pérez-Salinas et al. (2019) involves applying CZ gates instead of CNOT gates for creating entanglement, and omits the entanglers for the last layer. You could check out passing the imprimitive=qml.CZ keyword argument to StronglyEntanglingLayers to make this change.

Also, it’s worth noting that the cost function studied in the paper (that is also showcased in the Data-reuploading classifier demonstration) was related to the fidelity of the final state of the circuit.

+1

p.s. does the qml.enable_tape() make any difference in this sketch?

qml.enable_tape() is the remnant of switching to the new core of PennyLane that is now the default since our newest release that was published this week. If you’ve updated to that version, then calling this function doesn’t affect the use of PennyLane. Otherwise, you may experience better performance when using the new core.

Note on StronglyEntanglingLayers

When using StronglyEntanglingLayers, the number of layers is specified inherently when passing the weights (denoted as L in the docs): the shape of the weights will determine how many layers of StronglyEntanglingLayer to apply (so there will be L many layers applied per block).

Hope this helps!

1 Like

Thank you very much for the detailed answer! I will try to implement the changes. especially the fidelity cost function.

So i used the `imprimitive=qml.CZ with worst results than using qml.CNOT, actually it didnt work and the network failed to capture an easy pattern . I guess it is a commutator problem with RX in the angle embedding? Is there any other explanation? Thanks in advance

Hi @NikSchet,

Yes, that will most likely be due to the fact that general single-qubit rotations were considered in the original proposal.

There could be two possibilities:

  1. Trying with rotation='Y' when using AngleEmbedding.
  2. If the results are still poor, turning to layers with more general single-qubit rotations (qml.Rot). For this, the previously linked demonstration could give a good starting point.

Hope this helps!

1 Like

@antalszava Thank you very much for the reply indeed if i try with Ry rotations and CZ everything works great. I will try it the general gml.rot as data embedding layer!

I also have some more questions regarding the variational classifier code.

  1. How about probability measurement instead of expectation value?

  2. Does it make sense to add Hadamard gates before the angle embedding layer? So far my simulations showed no significance difference

Thanks in advance

Hi @NikSchet,

Glad that it worked out! :slightly_smiling_face: Would be curious to see if you observe any differences with qml.Rot.

Good questions! When switching to outputting probabilities, some care should be taken as the output dimension of the QNode will change. Maybe probabilities could be used to define a cost function that outputs a single scalar.

We haven’t really delved into trying out these possibilities and their effectiveness will most probably also depend on your specific dataset. Would nonetheless be interesting to hear your findings on how such changes could influence training! :slightly_smiling_face:

2 Likes

@antalszava

Thanks! i already tried everything and it seems that qml.ROT offers more rich prediction grid, and needs slightly more training, but i will come back with more conclusive evidence. (i will try the probabilities but i need to crack some math formulas first for the cost function)

Meantime i try to automate and made more compact the new code and i am again slightly confused with how to set weights on this cause i get many errors:

@qml.qnode(dev)
def circuit(weights, x):
qml.Hadamard(wires = range(n_qubits))
for i in range(blocks):

    qml.Rot(x, wires = range(n_qubits)))
    StronglyEntanglingLayers(weights[i], wires = range(n_qubits),imprimitive=entangler)
return qml.expval(qml.PauliZ(0))

# draw random quantum node weights
theta_weights = [strong_ent_layers_uniform(layers, n_qubits, seed=randomseed+i+1) for i in range(blocks)]
theta_bias = 0.0
theta_init = (theta_weights, theta_bias) # initial weights

To be honest setting weights for each network is the hardest thing in Pennylane for me :slight_smile: please help me on this one too. Thanks in advance!

Moreover I am trying 3 different classical data scaling (normalization) : (-π,π), (0,2π), (0,π)
It looks like the (0,2π) is worst choice even though it is similar to (-π,π) and for sure better than (0,π). Is there any valid explanation?

Hi @NikSchet.
Regarding the errors you get in the code you copied above, there are few small issues, such as extra ) in qml.Rot(x, wires = range(n_qubits))) and the way that the qml.Rot parameters are defined. A working version of the code with some corrections is copied in the following. Please let us know if you have any questions. Could you please also send us a minimal working example of your code that gives different results for the classical data scaling you tried? Thanks.

import pennylane as qml
from pennylane.templates.layers import StronglyEntanglingLayers
from pennylane.init import strong_ent_layers_uniform

blocks = 3 
n_qubits = 3
layers = 2
entangler = qml.CNOT 
randomseed = 42

dev = qml.device("default.qubit", wires=n_qubits)

@qml.qnode(dev) 
def circuit(weights, x, y, z): 
    qml.Hadamard(wires=0) 
    for i in range(blocks): 
        qml.Rot(x, y, z, wires=0)
        StronglyEntanglingLayers(weights[i], wires=range(n_qubits), imprimitive=entangler) 
    return qml.expval(qml.PauliZ(0)) 
 
# draw random quantum node weights 
theta_weights = [strong_ent_layers_uniform(layers, n_qubits, seed=randomseed+i+1) for i in range(blocks)] 

circuit(theta_weights, 0.1, 0.2, 0.3)
1 Like

Thank you very much for the fast reply . Regarding scaling

  1. in simple datasets i see no major difference with scaling
  2. Difference is more noticeable in complex unbalanced, highly noise datasets. On second thought i believe that is not about scaling intervals (-π,π - 0,2π - 0,π ) it is about how you approximate a function best.

Anyway for simple datasets and for 3 different scalings as mentioned above the respective codes can be found here (naming of the file denotes the scaling interval). Again no major differences are noticed

(-π,π ) [VCDR_ST2_-pi_pi.ipynb] (https://github.com/nsansen/Quantum-Machine-Learning/blob/main/VCDR_ST2_-pi_pi.ipynb)

(0,2π ) VCDR_ST2_0_2pi.ipynb

(0,π ) VCDR_ST2_0_pi.ipynb

@sjahangiri Again thank you very much . Does this code make sense to you? i am not sure if i applied weights correctly to the circuit (please ignore non essential code lines. The code seems to run but i dont really get good results, not sure if there is a problem with how i set the weights (same code runs perfectly with single RX rotations instead of qml.ROT). Also does the placement of the hadamard gates make sense? or should i put them in the loop? :slight_smile:

@qml.qnode(dev)
def circuit(weights, x=None):
    qml.Hadamard(wires=0)
    qml.Hadamard(wires=1)
    for i in range(blocks):
        qml.Rot(theta[2], theta[3], theta[4], wires=0)
        qml.Rot(theta[5], theta[6], theta[7], wires=1)
        StronglyEntanglingLayers(weights[i], wires = range(n_qubits),imprimitive=entangler)
    return qml.expval(qml.PauliZ(0))

# variational quantum classifier
def variational_classifier(theta, x=None):
    weights = theta[0]
    bias = theta[1]
    return circuit(weights, x=x) + bias

# draw random quantum node weights
theta_weights = [strong_ent_layers_uniform(layers, n_qubits, seed=randomseed+i+1) for i in range(blocks)]
theta_bias = 0.0
theta_init = (theta_weights, theta_bias,0.1,0.2,0.3,0.1,0.2,0.3) # initial weights

def cost(theta, X, expectations):
    e_predicted = \
        np.array([variational_classifier(theta, x=x) for x in X])
    loss = np.mean((e_predicted - expectations)**2)    
    return loss


def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / len(labels)

    return loss


# convert classes to expectations: -1 to -1, 1 to +1
e_train = np.empty_like(y_train)
e_train[y_train == -1] = -1
e_train[y_train == 1] = +1
# calculate numbe of batches
batches = len(X_train) // batch_size

# train the variational classifier
theta = theta_init


# split training data into batches
X_batches = np.array_split(np.arange(len(X_train)), batches)



for it, batch_index in enumerate(chain(*(n_epochs * [X_batches]))):
    # Update the weights by one optimizer step
    batch_cost = \
        lambda theta: cost(theta, X_train[batch_index], e_train[batch_index])
    theta = opt.step(batch_cost, theta)

Hi @NikSchet,
The placement of the Hadamard gates looks fine. I could not run the code because the train and the test data are not provided. However, I noticed that the x parameter passed to the circuit is not used anywhere. Also, you can use different parameters for the qml.ROT gates and consider taking them out from theta and pass them separately to the circuit. Hope it helps.

I am still not able to make it work, and when it works it fails. one such code with steady angles in qml.ROT is this one. and i am not able to modify it.

This should be a standalone working code with moons dataset. Thanks in advance for your time and help!!

import pennylane as qml
import pandas as pd
from pennylane import numpy as np
from pennylane.templates.layers import StronglyEntanglingLayers
from pennylane.init import strong_ent_layers_uniform
from pennylane.optimize import GradientDescentOptimizer
from sklearn.datasets import make_moons , make_circles
from sklearn.preprocessing import StandardScaler , minmax_scale
from itertools import chain
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

blocks=1
layers=2
batch_size = 16
n_epochs = 20
test_size=0.2 #(train/test split)
learning_rate = 0.6
entangler = qml.CNOT
opt = GradientDescentOptimizer(stepsize=learning_rate)
n_qubits =2 
dev = qml.device("default.qubit", wires=n_qubits)
randomseed = 1

X, y = make_moons(n_samples=400, noise=0)
X = minmax_scale(X, feature_range=(-np.pi, np.pi))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

from sklearn.metrics import roc_auc_score
# quantum circuit
#
#
# draw random quantum node weights
theta_weights = [strong_ent_layers_uniform(layers, n_qubits, seed=randomseed+i+1) for i in range(blocks)]
theta_bias = 0.0
chi1=0.1
psi1=0.2
zet1=0.1
chi2=0.1
psi2=0.2
zet2=0.1
theta_init = (theta_weights, theta_bias,chi1,psi1,zet1,chi2,psi2,zet2) # initial weights

@qml.qnode(dev)
def circuit(weights):
    qml.Hadamard(wires=0)
    qml.Hadamard(wires=1)
    for i in range(blocks):
        qml.Rot(chi1, psi1, zet1, wires=0)
        qml.Rot(chi2, psi2, zet2, wires=1)
        StronglyEntanglingLayers(weights[i], wires = range(n_qubits),imprimitive=entangler)
    return qml.expval(qml.PauliZ(0))

# variational quantum classifier
def variational_classifier(theta):
    weights = theta[0]
    bias = theta[1]
    return circuit(weights) + bias + chi1 + psi1 + zet1 + chi1 + psi2 + zet2



# train the variational classifier
theta = theta_init

def cost(theta, X, expectations):
    e_predicted = \
        np.array([variational_classifier(theta) for x in X])
    loss = np.mean((e_predicted - expectations)**2)    
    return loss

def accuracy(labels, predictions):

    loss = 0
    for l, p in zip(labels, predictions):
        if abs(l - p) < 1e-5:
            loss = loss + 1
    loss = loss / len(labels)

    return loss


# calculate numbe of batches
batches = len(X_train) // batch_size
# split training data into batches
X_batches = np.array_split(np.arange(len(X_train)), batches)
lossplot = []
aucrocplot = []
accuracytrainplot = []
accuracytestplot = []

for it, batch_index in enumerate(chain(*(n_epochs * [X_batches]))):
    # Update the weights by one optimizer step
    batch_cost = \
        lambda theta: cost(theta, X_train[batch_index],y_train[batch_index])
    theta = opt.step(batch_cost, theta)
   
    
    if it % 10 == 0:
        expectations = np.array([variational_classifier(theta) for x in X_train])
        prob_class_one = (expectations + 1) / 2.0
        prob_class_one = pd.DataFrame.from_dict(prob_class_one)
        prob_class_one = prob_class_one.iloc[:, :]
        prob_class_one = prob_class_one[0].apply(lambda x: -1 if x <= 0.5 else 1)
        prob_class_onet = prob_class_one.to_numpy()
           
        expectations = np.array([variational_classifier(theta) for x in X_test])
        prob_class_one = (expectations + 1) / 2.0
        prob_class_one = pd.DataFrame.from_dict(prob_class_one)
        prob_class_one = prob_class_one.iloc[:, :]
        prob_class_one = prob_class_one[0].apply(lambda x: -1 if x <= 0.5 else 1)
        prob_class_one = prob_class_one.to_numpy()
        #
        #
        #--------- GRID PLOT START
        #
        #
        plt.figure()
        cm = plt.cm.RdBu
        fig= plt.figure(figsize=(5,5))
        xx, yy = np.meshgrid(np.linspace(-np.pi, np.pi, 15), np.linspace(-np.pi, np.pi, 15))
        X_grid = [np.array([x,y]) for x, y in zip(xx.flatten(), yy.flatten())]
        predictions_grid = np.array([variational_classifier(theta) for x in X_grid])
        zminus = (predictions_grid + 1.0) / 2.0
        zminus = predictions_grid
        Z=np.reshape(zminus, xx.shape)
# plot decision regions
        cnt = plt.contourf(xx, yy,Z, levels=np.arange(-1, 1., 0.1), cmap=cm, alpha=0.8, extend="both")
        plt.contour(xx, yy,Z, levels=[0.0], colors=("black",), linestyles=("--",), linewidths=(0.8,))
        plt.show()
        #
        #
        #--------- GRID PLOT END
        #
        #
    #print("Acc test",metrics.accuracy_score(y_test, prob_class_one))
    #print(metrics.confusion_matrix(y_test, prob_class_one))
        lossplot.append(cost(theta, X_train[batch_index], y_train[batch_index]))
        aucrocplot.append(roc_auc_score(y_test, prob_class_one))
        accuracytrainplot.append(metrics.accuracy_score(y_train, prob_class_onet))
        accuracytestplot.append(metrics.accuracy_score(y_test, prob_class_one))
#    print("It",it+1,"out of",len(X_batches) *n_epochs)
        print("It",it+1,"out of",len(X_batches) *n_epochs, "loss: ",cost(theta, X_train[batch_index], y_train[batch_index]),
        " : Acc train: ",round(metrics.accuracy_score(y_train, prob_class_onet),2),
        " : Acc test : ",round(metrics.accuracy_score(y_test, prob_class_one),3),
        " : Auc : ",round(roc_auc_score(y_test, prob_class_one),3)
        )
        if metrics.accuracy_score(y_train, prob_class_onet) >= 0.97:
                break

plt.plot(lossplot) #lets plot the second line
plt.ylabel('Loss')
plt.show()

plt.subplot(2,1,1)
plt.plot(accuracytrainplot,'r',label="train")
plt.plot(accuracytestplot,'b',label="test")
plt.ylabel('accuracy')
plt.legend()
plt.show()

plt.plot(aucrocplot) #lets plot the second line
plt.ylabel('auc roc score')
plt.show()

Hi @NikSchet,
Thanks for providing the code. It seems to me that the features X are not used in the circuit at all. Is there a reason for that? In other words, the e_predicted values that are computed in the function loss do not use any information about the sample features. For more information, you can see two examples in this tutorial. I suggest to start from one of the examples provided in the tutorial and then try to modify the circuit part only, by using your desired gates as you have above. Please let me know if you need any help with building the new circuit. Thanks.