Circuit not optimizing parameters

import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Filter images and labels for digits 0 to 7
train_mask = y_train <= 7
test_mask = y_test <= 7

x_train = x_train[train_mask]
y_train = y_train[train_mask]
x_test = x_test[test_mask]
y_test = y_test[test_mask]
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)
# Normalize pixel values to [0, 1]
x_train = x_train / 255.0
x_test = x_test / 255.0

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Resize images to 16x16 using tf.image.resize
x_train_resized = tf.image.resize(x_train, size=(16, 16))
x_test_resized = tf.image.resize(x_test, size=(16, 16))

# Convert to numpy arrays
x_train_resized = x_train_resized.numpy()
x_test_resized = x_test_resized.numpy()

# Flatten the images
x_train_flat = x_train_resized.reshape(x_train_resized.shape[0], -1)
x_test_flat = x_test_resized.reshape(x_test_resized.shape[0], -1)

# Convert class labels to one-hot encoded vectors
y_train = tf.keras.utils.to_categorical(y_train, num_classes=8)  # 8 classes now
y_test = tf.keras.utils.to_categorical(y_test, num_classes=8)

# Convert 0s to -1s in the labels and cast to int8
y_train[y_train == 0] = -1
y_train = y_train.astype(np.int8)

y_test[y_test == 0] = -1
y_test = y_test.astype(np.int8)
x_train=x_train_flat[0:1000]
y_train=y_train[0:1000]
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

import time
start = time.time()

num_qubits = 8

dev = qml.device('default.qubit', wires = num_qubits)

@qml.qnode(dev)
def circuit(parameters, data):
    #for i in range(num_qubits):
    #    qml.Hadamard(wires = i)
  
    AmplitudeEmbedding(features = data, wires = range(num_qubits), normalize=True)
        
    qml.BasicEntanglerLayers(weights = parameters, wires = range(num_qubits))
    
    return [qml.expval(qml.PauliZ(i)) for i in range(8)]

def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias


def square_loss(y_true, y_pred):
    return np.mean(np.square(y_true - y_pred))

# Define custom accuracy metric
def accuracy(y_true, y_pred):
    # Convert predicted probabilities to labels (-1 or 1)
    y_pred_labels = np.sign(y_pred)
    
    # Count correct predictions
    correct_predictions = np.sum(y_true == y_pred_labels)
    
    # Calculate accuracy
    accuracy = correct_predictions / y_true.size
    
    return accuracy



def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

#basic
num_layers = 2
shape = qml.BasicEntanglerLayers.shape(n_layers=num_layers, n_wires=8)
weights_init  = np.random.random(size=shape)
bias_init = np.array(0.0, requires_grad=True)
weights_init

opt = AdamOptimizer(stepsize=0.1, beta1=0.9, beta2=0.99)
#opt = AdamOptimizer()
batch_size = 128

wbest = 0
bbest = 0
abest = 0
weights = weights_init
bias = bias_init

for it in range(10):

    # weights update by one optimizer step

    batch_index = np.random.randint(0, len(x_train), (batch_size,))
    X_batch = x_train[batch_index]
    Y_batch = y_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_batch, Y_batch)

    # Compute the accuracy
    predictions = [variational_classifier(weights, bias, x) for x in x_train]
    
    if accuracy(y_train, predictions) > abest:
        wbest = weights
        bbest = bias
        abest = accuracy(y_train, predictions)
        print('New best')

    acc = accuracy(y_train, predictions)
  
    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(weights, bias, x_train, y_train), acc
       )
    )

i tried the circuit for 8 classes. but loss/accuracy remains constant. I increased layers. still things not changes. Is there anything I am missing. Can you please check.

Hi @Amandeep,
The first time you run this you will notice a warning saying that the “Output seems independent of input.”

This is an indication that your program is struggling with something that is non-differentiable.

Take a look at this section of the docs to see if some of your operations aren’t supported.

I would recommend different courses of action: if you don’t care too much about using numpy and tensorflow you could try using Torch instead. It’s less likely that you’ll see these issues. We have demos on using Torch which can help you.

If you want to keep your current setup then the best is to start from a minimal example. Make the tiniest version of your code which works, for example using 2 classes instead of 8, etc. Then you can start adding complexity until you reach the error. This will help you see where the error lies. If you’re still struggling you can send us your minimal example and we can take another look to try to uncover the issue.

Let us know if you have any further questions!

Hi @CatalinaAlbornoz Thank you for your response. I did this code for binary. I works perfect.
But when i did for multiclass. loss/accuracy remains constant on taking measurement from 10 qubits for 10 classes.

Hi @Amandeep, can you please send me the minimal code that you made? And what it is that you changed when it stopped working? I can try to find a workaround.

@CatalinaAlbornoz The code working for 0 and 1 class. but i am trying for 10 class. It always conflict between numpy and arraybox. measuring 10 qubits.

import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape images to 32x32 and append zeros around the original images
def resize_images(images, new_size=(32, 32)):
    resized_images = np.zeros((images.shape[0], new_size[0], new_size[1]))
    original_size = images.shape[1:]

    # Calculate padding
    pad_x = (new_size[0] - original_size[0]) // 2
    pad_y = (new_size[1] - original_size[1]) // 2

    # Insert original images into the center of the resized images
    resized_images[:, pad_x:pad_x+original_size[0], pad_y:pad_y+original_size[1]] = images

    return resized_images

# Normalize and then resize training and test images
x_train = x_train / 255.0
x_test = x_test / 255.0

x_train_resized = resize_images(x_train)
x_test_resized = resize_images(x_test)

# Flatten the images
x_train_flat = x_train_resized.reshape(x_train_resized.shape[0], -1)
x_test_flat = x_test_resized.reshape(x_test_resized.shape[0], -1)

print(x_train_flat.max())

# Update class labels
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

# Limit the dataset size for demonstration
x_train = x_train_flat[:200]
y_train = y_train[:200]

import numpy as np 
x_train =x_train 
y_train=y_train
x_test=np.array(x_test)
y_test=np.array(y_test)

print(type(x_train), x_train.shape, y_train.shape, x_test.shape, y_test.shape)
import time
start = time.time()

num_qubits = 10

dev = qml.device('default.qubit', wires = num_qubits)


def layer(W):
    qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
    qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
    qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
    qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
    qml.Rot(W[4, 0], W[4, 1], W[4, 2], wires=4)
    qml.Rot(W[5, 0], W[5, 1], W[5, 2], wires=5)
    qml.Rot(W[6, 0], W[6, 1], W[6, 2], wires=6)
    qml.Rot(W[7, 0], W[7, 1], W[7, 2], wires=7)
    qml.Rot(W[8, 0], W[8, 1], W[8, 2], wires=8)
    qml.Rot(W[9, 0], W[9, 1], W[9, 2], wires=9)
    
    for i in range(9):
         qml.CZ(wires=[i, i+1])
            
@qml.qnode(dev)
def circuit(parameters, data):
 
    AmplitudeEmbedding(features = data, wires = range(num_qubits), normalize=True)
        
    #qml.BasicEntanglerLayers(weights = parameters, wires = range(num_qubits))
    for i, block in enumerate(weights):
        layer(block)
    
    return [qml.expval(qml.PauliZ(i)) for i in range(10)]

def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

import math
def loss(y_true,y_pred):
    if len(y_true.shape)==2:     #batch_size=1
        samples=y_true.shape[0]
        batch=1
    else:                        #batch_size>1  
        samples=y_true.shape[1]
        batch=y_true.shape[0]
    y_true=np.asarray(y_true).flatten()
    y_pred=np.asarray(y_pred).flatten()
    loss=0
    for i in range(len(y_true)):
        loss+= y_true[i]*(math.log(y_pred[i]))
    loss*=-1
    return loss/(samples*batch)

def accuracy(labels, predictions):
   
    predicted_classes = np.argmax(predictions, axis=1)
    true_classes = np.argmax(labels, axis=1)
    correct = np.sum(predicted_classes == true_classes)
    total = labels.shape[0]
    accuracy = correct / total
    return accuracy

def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return loss(Y, predictions)

#basic
num_layers = 4
from pennylane import numpy as np
weights_init  = np.random.random( size=(num_layers,num_qubits,3), requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)
weights_init

opt = AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99)
#opt = AdamOptimizer()
batch_size = 32

wbest = 0
bbest = 0
abest = 0
weights = weights_init
bias = bias_init

for it in range(10):

    # weights update by one optimizer step

    batch_index = np.random.randint(0, len(x_train), (batch_size,))
    X_batch = x_train[batch_index]
    Y_batch = y_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_batch, Y_batch)

    # Compute the accuracy
    predictions = [variational_classifier(weights, bias, x) for x in x_train]
    
    if accuracy(y_train, predictions) > abest:
        wbest = weights
        bbest = bias
        abest = accuracy(y_train, predictions)
        print('New best')

    acc = accuracy(y_train, predictions)
  
    print(
        "Iter: {:5d} | Cost: {:0.7f} | Accuracy: {:0.7f} ".format(
            it + 1, cost(weights, bias, x_train, y_train), acc
       )
    )

Hi @Amandeep,

The Arraybox issue in these cases usually happens in one of three cases:

  • A: you’re updating a global variable without realizing
  • B: you have a mismatch in the size or type of the data you’re handling, or
  • C: you’re using a forbidden functionality of Numpy as seen here in the documentation.

In this case I think your problem is a combination of B and C. I think the core of the issue is in the line loss+= y_true[i]*(math.log(y_pred[i])). I would recommend trying different loss functions such as the on in this blog post until you find one that works.

Unfortunately your problem is so complex that it’s hard to know exactly where the problem is. Debugging is part of research so I recommend that you simplify as much as possible until you find the solution. You may find this post helpful too. They were facing the same “Output seems independent of input” issue that you were facing before, and it was fixed by handling the data more carefully.

I hope this helps you.