Quantum Machine Learning in Feature Hilbert Spaces

Hey @hilarlia!

Your questions are all good, I often regret that we didn’t publish our code back then -that will hopefully never happen again.

I had a look into the simulation, and yes, we used an l2 regulariser with penalty coefficient 0.01, as well as a standard square loss. Furthermore, we used PennyLane’s AdagradOptimizer with initial step 0.01, batch size 5 and four layers.

Hope this helps? The experiment was not necessarily trying to propose a good classifier, but more a demonstration that the principle works, so I wouldn’t put too much effort into reproducing it…

Hi @Maria_Schuld, thank you very much for your earlier response. I want to reproduce some papers to see those working examples. I tried the according to what you’ve mentioned, however I couldn’t get good results. It was oscillating around 0.6. Could you point out where I am making the mistake?

dev = qml.device("strawberryfields.fock", wires=2, cutoff_dim=3, analytic=False)

np.random.seed(0)
batch_size = 5
n_steps = 5000
n_layers = 4
eta = 0.02
Theta = (0.05*np.random.rand(n_layers,2,4),eta)
fraction_train = 0.75
def layer(params,l): # The variational circuit layer
qml.Beamsplitter(params[l,0,0],params[l,1,0],wires=[0,1])
qml.Displacement(params[l,0,1],0.0,wires=0)
qml.Displacement(params[l,1,1],0.0,wires=1)
qml.QuadraticPhase(params[l,0,2],wires=0)
qml.QuadraticPhase(params[l,1,2],wires=1)
qml.CubicPhase(params[l,0,3],wires=0)
qml.CubicPhase(params[l,1,3],wires=1)

@qml.qnode(dev)
def circuit_o0(params, x): # Circuit to Calculate p(2,0)
qml.templates.embeddings.SqueezingEmbedding(x,wires= 
[0,1],method='phase',c=1.5)
for l in range(n_layers):
    layer(params,l)
return
qml.expval(qml.FockStateProjector(np.array([2,0]),requires_grad=Fal
se), wires = [0,1]))

@qml.qnode(dev)
def circuit_o1(params,x): # Circuit for Calculate p(0,2)
    qml.templates.embeddings.SqueezingEmbedding(x,wires= 
[0,1],method='phase',c=1.5)
for l in range(n_layers):
    layer(params,l)
return




def variational_classifier(Theta, x): # Determine the probability: p(2,0)/(p(0,2)+p(2,0))
    params = Theta[0]
    bias = Theta[1]
    o0 = circuit_o0(params,x) + bias
    o1 = circuit_o1(params,x) + bias
    p0 = o0/(o0+o1)
    label = 0
    if p0<0.5:
        label = 1
    norm = np.linalg.norm(Theta[0][0][0],ord=2)**2 + 
    np.linalg.norm(Theta[0][0][1],ord=2)**2 + np.linalg.norm(Theta[0][1][0],ord=2)**2 + np.linalg.norm(Theta[0][1][1],ord=2)**2 + np.linalg.norm(Theta[0][2][0],ord=2)**2 + np.linalg.norm(Theta[0][2][1],ord=2)**2 + np.linalg.norm(Theta[0][3][0],ord=2)**2 + np.linalg.norm(Theta[0][3][1],ord=2)**2
    return p0, 1-p0,label, norm

def square_loss(labels,prediction_probs): # Calculate square loss
    loss = 0
    reg = 0.01
    for l, p in zip(labels, prediction_probs):
    loss = loss + (l-p[2])**2
    loss = loss / len(labels)
    loss = loss + p[3]*reg
    return loss

def accuracy(labels, predictions): # Calculate Accuracy
    acc = 0
    for l, p in zip(labels, predictions):
        if abs(l - p[2]) < 1e-5:
            acc = acc + 1
    acc = acc / len(labels)
    return acc

def cost(Theta, X, Y): # Calculate cost using square loss
    prediction_probs = np.array([variational_classifier(Theta, x=x) for x in X])
    return square_loss(Y,prediction_probs)



# Extract dataset
X, Y = make_moons(n_samples=200, shuffle=True, noise=0.1, 
random_state=None)

num_data = len(Y) # Number of data

num_train = int(fraction_train * num_data) # Number of training samples

index = np.random.permutation(range(num_data)) # Permutation for shuffling
feats_train = X[index[:num_train]] # Features for training
Y_train = Y[index[:num_train]] # Labels of training features
feats_val = X[index[num_train:]] # Test features
Y_val = Y[index[num_train:]] # Labels of test features

opt = qml.AdagradOptimizer() # I will use QNG Optimizer
var = Theta
a = []
c = []
l = []
cnt = 0

for it in range(n_steps):
    
# Randomly select the batches
batch_index = np.random.randint(0, num_train, (batch_size,))
feats_train_batch = feats_train[batch_index]
Y_train_batch = Y_train[batch_index]
    
# Optimize the theta
var = opt.step(lambda v: cost(v, feats_train_batch, Y_train_batch), var)
    
# Predict train and test samples
predictions_train = np.array([variational_classifier(var, x) for x in feats_train])
predictions_val = np.array([variational_classifier(var, x) for x in feats_val])

# Calculate accuracies
acc_train = accuracy(Y_train, predictions_train)
acc_val = accuracy(Y_val, predictions_val)
        
# Store cost, accuracy and square loss values
c.append(cost(var, X, Y))
a.append(acc_val)
l.append(square_loss(Y_val, predictions_val))
        
# Break the loop after reaching convergence
if it>0 and abs(a[it] - a[it-1])<1e-3:
    cnt = cnt+1
else:
    cnt = 0
if cnt >= 20 and it>=100:
    break
        
# Print results
print(
    "Iter: {:5d} | Loss: {:0.7f} | Cost: {:0.7f} | Acc train: {:0.7f} | Acc validation: {:0.7f} "
     "".format(it + 1, square_loss(Y_val, predictions_val), cost(var, X, Y), acc_train, acc_val)
)

Hey @hilarlia,

As far as I can see, the main settings are the same as in our experiment. But I am not sure I understand the regulariser, which is the third entry of the prediction p[3]. We used the inner product of the trainable variables weights (arranged as a vector) with itself, l2regularization = np.sum(np.inner(weights, weights)) [np is PennyLane’s numpy version].

I am not sure if this would make a difference?

Usually, training QNNs is quite tricky, but I remember that in this case things were rather stable - since it is not so much different to a Gaussian kernel…