Variational Classifiers and QNGOptimizer

I am trying to explore using different methods of optimisation with variational classifiers. I have built mine in the style shown here, where you have an optimisation step similar too:

for batch_index in tqdm(X_batches): 
     X_batch = X_train[batch_index] # grab out batches
     y_batch = y_train[batch_index]

     batch_cost = lambda v: cost(v, X_batch, y_batch) 
     theta = pennylane_opt.step(batch_cost, theta) 

This, until now, has worked great! However, I have tried switching out the optimiser for QNGOptimizer and now get the error:

ValueError: The objective function must either be encoded as a single QNode or a VQECost object for the natural gradient to be automatically computed. Otherwise, metric_tensor_fn must be explicitly provided to the optimizer.

Though the message is very precise, I’m not sure how to fix my code to solve the problem. This isn’t a VQE problem, so it would seem I cant create a VQECost object and have a metric_tensor method.

Therefore, do I need to restructure my classifier to use only 1 QNode? The example here uses 3 though (if number of QNodes = number of wires), while my code only has 2. Due to this, I’m not sure how to proceed this way either.

Or, have I misunderstood and I cannot use QNGOptimizer for classification problems?

If anyone has any advice on how to use QNGOptimizer with a variational classifier that would be appreciated, thanks!

Edit:
Added some code for clarity:

# quantum circuit
@qml.qnode(dev)
def circuit(weights, x=None):
    AngleEmbedding(x, wires = range(n_qubits))
    StronglyEntanglingLayers(weights, wires = range(n_qubits))
    return qml.expval(qml.PauliZ(0))

# variational quantum classifier
def variational_classifier(theta, x=None):
    weights = theta[0]
    bias = theta[1]
    return circuit(weights, x=x) + bias

def cost(theta, X, expectations):
    e_predicted = np.array([variational_classifier(theta, x=x) for x in X])
    loss = np.mean((e_predicted - expectations)**2)    
    return loss

Hi @andrew! The theory behind the quantum natural gradient as presented in https://arxiv.org/abs/1909.02108 only considers the optimization/natural gradient of a single QNode with no classical post-processing — hence the restriction you are seeing!

Extending the QNG optimization method to a model that includes both classical and quantum processing is therefore somewhat non-trivial, and involves some research.

For this particular cost function, however, it should be doable by hand, since your trainable weights appear only in the quantum portion of the model.

For example, consider your cost function:

\text{cost}(w, x, y) = \frac{1}{N} \sum_i (U(w,x_i) + b - y_i)^2

where w are the trainable weights, b the bias, x_i the input data, y_i the predicted expectations, and N the size of the input data. We can take the natural gradient of this expression:

\begin{align} \nabla^{(ng)}_w \text{cost}(w, x, y) &= \frac{1}{N} \sum_i \nabla^{(ng)}_w(U(w,x_i) + b - y_i)^2\\ &= \frac{1}{N} \sum_i 2(U(w,x_i) + b - y_i) \nabla^{(ng)}_wU(w,x_i) \end{align}
where we have applied the chain rule.

The quantum natural gradient of U(w,x_i) is given by g^{-1}(w,x_i) \nabla_w U(w,x_i) where g^{-1} is the inverse of the metric tensor; therefore

\begin{align} \nabla^{(ng)}_w \text{cost}(w, x, y) &= \frac{1}{N} \sum_i 2(U(w,x_i) + b - y_i) g^{-1}(w, x_i)\nabla_wU(w,x_i) \end{align}

So if we can construct the above function in Python, we can pass this to the qml.GradientDescentOptimizer as the gradient function! This could be done as follows:

quantum_grad = qml.grad(circuit)

def cost_ng(theta, X, expectations):
    """Calculate the natural gradient of the cost function"""
    qgrad = quantum_grad(theta, x, expectations)
    ginv = sp.linalg.pinvh(circuit.metric_tensor(theta, x=X))
    e_predicted = np.array([variational_classifier(theta, x=x) for x in X])
    loss_ng = np.mean(2*(e_predicted - expectations)*ginv*qgrad) 
    return loss_ng

You can then run the optimization as follows:

opt = qml.GradientDescentOptimizer(stepsize=0.4)


for i in range(steps):
    # update the circuit parameters
    weights = opt.step(lambda w: cost(w, X, Y), weights, grad_fn=lambda w:cost_ng(w, X, Y))

(where the lambda functions are just to force the cost/cost_ng functions to have a single parameter weights). Note: I have not tested the above code!

Let me know if you have any questions!

1 Like

Thank you Josh - this is great, its a very thorough answer, and has helped a lot!

So, just so I totally get it, in this particular example I can use QNG optimisation, however, the QNGOptimizer object is not designed for it. I can though create this optimisation manually, and pass it to GradientDescentOptimizer (which is what we have done here). For all intents and purposes, using this method, and directly using QNGOptimizer would give the same result?

For all intents and purposes, using this method, and directly using QNGOptimizer would give the same result?

Assuming my maths above is correct, then yes :slight_smile:

Internally, this is exactly what the QNGOptimizer class is doing; it is using g^{-1}(w)\nabla U(w) to perform the gradient update step. However it cannot keep track of additional classical processing, so here we do it manually.

This has given me an idea on how we can possibly incorporate the classical processing however!

1 Like

I’m trying to get cost_ng to run, and have a couple more questions. I’m going through each line, trying to ensure it will run, and have had to make some small changes. This is where I am at so far:

quantum_grad = qml.grad(circuit)
def cost_ng(theta, X, expectations):
    weights = theta[0]
    bias = theta[1] 

    print(circuit(weights, x=X[0]))
    qgrad = [quantum_grad(weights, x=x) for x in X]

    print(circuit.metric_tensor) #this runs, and returns an object
    print(circuit.metric_tensor(weights, x=X)) #this does not

cost_ng(theta, X_batch, y_batch)
  • I split theta into 2, assuming we won’t want to pass biases to the circuit at the moment.
  • qgrad does not run when I pass it x=X, so I’m treating it the same way I treated e_predicted, though I am unsure if this is correct.
  • circuit.metric_tensor is causing me some issues. If I pass it x=X it says it has got an unexpected keyword ‘x’. If I just pass it weights, then I get the error 'circuit() got multiple values for argument ‘x’. When I pass it nothing, and it runs, it then crashes when I pass it to pinvh().

Do you have any suggestions on how I should approach how to solve my problem with .metric_tensor?

Hi @andrew, the following should work:

import pennylane as qml
from pennylane.templates import AngleEmbedding, StronglyEntanglingLayers
from pennylane import numpy as np
import scipy as sp

n_qubits = 3
batch_size = 2

weights = qml.init.strong_ent_layers_normal(n_wires=n_qubits, n_layers=3)
bias = 0.5

theta = (weights, bias)
X = np.random.random(size=[batch_size, n_qubits])
Y = np.random.random(size=[batch_size])


dev = qml.device("default.qubit", wires=n_qubits)
dev.operations.remove("Rot")


@qml.qnode(dev)
def circuit(weights, x):
    """Variational quantum circuit"""
    AngleEmbedding(x, wires=range(n_qubits))
    StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return qml.expval(qml.PauliZ(0))


quantum_grad = qml.grad(circuit, argnum=0)
print("Circuit evaluation:", circuit(weights, X[0]))
print("Gradient evaluation:", quantum_grad(weights, X[0]))


def variational_classifier(theta, x):
    """Variational quantum classifier"""
    weights = theta[0]
    bias = theta[1]
    return circuit(weights, x) + bias


def cost(theta, X, expectations):
    """Cost function for the variational quantum classifier"""
    e_predicted = np.array([variational_classifier(theta, x) for x in X])
    loss = np.mean((e_predicted - expectations) ** 2)
    return loss


print("Cost evaluation:", cost(theta, X, Y))


def cost_ng(theta, X, expectations):
    """Natural gradient of the cost function"""
    weights = theta[0]
    bias = theta[1]

    qnatgrad = np.empty((batch_size,) + weights.shape)
    e_predicted = np.empty([batch_size])

    for idx, x in enumerate(X):
        e_predicted[idx] = variational_classifier(theta, x)

        # compute the gradient of each QNode with repsect to `weights`
        qgrad = quantum_grad(weights, x)

        # compute the metric tensor of each QNode with respect to `weights`
        num_params = np.prod(qgrad.shape)
        g = circuit.metric_tensor([weights, x])[:num_params, :num_params]

        # compute g^{-1} \nabla U, and reshape it so it has the same shape as `weights`/`qgrad`
        qnatgrad[idx] = np.linalg.solve(g, qgrad.flatten()).reshape(*qgrad.shape)

    # Take the tensordot between the natural gradient and the loss,
    # and divide by the batch size (i.e., taking the mean).
    loss_ng = np.tensordot(2 * (e_predicted - expectations), qnatgrad, axes=1) / batch_size
    return loss_ng


cost_grad = qml.grad(cost, argnum=0)
print("Cost gradient:", cost_grad(theta, X, Y))
print("Cost natural gradient:", cost_ng(theta, X, Y))

Note that to get this to work, I’ve had to modify the devices to not support the qml.Rot operation. This is because qml.Rot() is not natively supported by metric_tensor()but if the device does not support it, then PennyLane will automatically decompose it into qml.RZ and qml.RY gates, which are supported by metric_tensor()!

To verify that this works correctly, you can replace the metric tensor calculation with the identity matrix:

# g = circuit.metric_tensor([weights, x])[:num_params, :num_params]
g = np.identity(num_params)

and you should find that the cost gradient and cost natural gradient are now equivalent.


Note: since the combination of quantum natural gradient + classical processing is an open question, I’m curious to hear if this trains better than standard gradient descent :slightly_smiling_face:

2 Likes

Again, thank you! I have implemented this, I am now able to run cost_ng(theta, X_batch, y_batch)
I had to make a subtle change, right at the start of cost_ng() I have said _batch_size = len(X), and used this object throughout the function instead of the “real” batch_size.

I have been looking at passing these gradients to the optimisation using the grad_fn argument. If I create an object:

classic = lambda v: cost_grad(v, X_batch, y_batch)

and passed this I find it works fine (I am assuming this is the “normal” gradient descent grad function). However, if I create something like:

grad_cost = lambda v: cost_ng(v, X_batch, y_batch)

and pass this I run into problems. I get an error:

index 0 is out of bounds for axis 0 with size 0

I believe its because cost_grad and cost_ng are creating objects that are different shapes - cost_grad is a tuple, where the 2nd entry in it (I think…) is related to the bias. It seems that grad_fn demands something of this shape. To get around this I have made cost_ng return (loss_ng, bias). This now allows optimisation to “work”. However, I feel I am definitely handling this incorrectly. For one, this value seems stuck at 0.0, what it is initialised at. I have tried to understand how to include the bias correctly in the optimisation, but I’m kinda coming up short - do you have any suggestions where I should go from here? I feel I will need to optimise it with respect to something in cost_ng (a line similar to: qgrad = quantum_grad(weights, x)), but can’t get something like that to work

Another small question - you mention that what I am doing is a combination of quantum natural gradient + classical processing. What exactly do you mean by this - are you meaning that since this is a QVC, I need a loss function like MSE, and this MSE part is “classical processing”, or, is there another thing at play here I am overlooking?

Hey @andrew,

I’m joining this thread a bit late but can have a go at helping!

I believe its because cost_grad and cost_ng are creating objects that are different shapes - cost_grad is a tuple, where the 2nd entry in it (I think…) is related to the bias. It seems that grad_fn demands something of this shape. To get around this I have made cost_ng return (loss_ng, bias) . This now allows optimisation to “work”. However, I feel I am definitely handling this incorrectly. For one, this value seems stuck at 0.0, what it is initialised at. I have tried to understand how to include the bias correctly in the optimisation, but I’m kinda coming up short - do you have any suggestions where I should go from here? I feel I will need to optimise it with respect to something in cost_ng (a line similar to: qgrad = quantum_grad(weights, x) ), but can’t get something like that to work

Yes I think this is on the right track! Have you tried handling theta as a flat array? I’ve found this helps in the past. For example, the following seemed to work:

import pennylane as qml
from pennylane.templates import AngleEmbedding, StronglyEntanglingLayers
from pennylane import numpy as np
import scipy as sp

n_qubits = 3
batch_size = 2

weights = qml.init.strong_ent_layers_normal(n_wires=n_qubits, n_layers=3)
bias = 0.5

theta = np.hstack([weights.flatten(), bias])
X = np.random.random(size=[batch_size, n_qubits])
Y = np.random.random(size=[batch_size])


dev = qml.device("default.qubit", wires=n_qubits)
try:
    dev.operations.remove("Rot")
except KeyError:
    pass


@qml.qnode(dev)
def circuit(weights, x):
    """Variational quantum circuit"""
    AngleEmbedding(x, wires=range(n_qubits))
    StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return qml.expval(qml.PauliZ(0))


def variational_classifier(theta, x):
    """Variational quantum classifier"""
    weights = theta[:-1].reshape((3, n_qubits, 3))
    bias = theta[-1]
    return circuit(weights, x) + bias


quantum_grad = qml.grad(variational_classifier, argnum=0)


def cost(theta, X, expectations):
    """Cost function for the variational quantum classifier"""
    e_predicted = np.array([variational_classifier(theta, x) for x in X])
    loss = np.mean((e_predicted - expectations) ** 2)
    return loss


def cost_ng(theta, X, expectations):
    """Natural gradient of the cost function"""
    weights = theta[:-1].reshape((3, n_qubits, 3))
    bias = theta[-1]

    qnatgrad = np.empty((batch_size,) + weights.shape)
    biasgrad = np.empty((batch_size,))
    e_predicted = np.empty([batch_size])

    for idx, x in enumerate(X):
        e_predicted[idx] = variational_classifier(theta, x)

        # compute the gradient of each QNode with repsect to `weights`
        overall_grad = quantum_grad(theta, x)
        qgrad = overall_grad[:-1].reshape((3, n_qubits, 3))
        bgrad = overall_grad[-1]
        
        biasgrad[idx] = bgrad

        # compute the metric tensor of each QNode with respect to `weights`
        num_params = np.prod(qgrad.shape)
        g = circuit.metric_tensor([weights, x])[:num_params, :num_params]

        # compute g^{-1} \nabla U, and reshape it so it has the same shape as `weights`/`qgrad`
        qnatgrad[idx] = np.linalg.solve(g, qgrad.flatten()).reshape(*qgrad.shape)
        
    # Take the tensordot between the natural gradient and the loss,
    # and divide by the batch size (i.e., taking the mean).
    loss_ng = (np.tensordot(2 * (e_predicted - expectations), qnatgrad, axes=1) / batch_size).flatten()
    bias_avg = np.mean(biasgrad, axis=0)
    
    return np.hstack([loss_ng, bias_avg])


cost_fn = lambda theta: cost(theta, X, Y)
cost_fn_g = lambda theta: cost_ng(theta, X, Y)

opt = qml.GradientDescentOptimizer(stepsize=0.4)

opt.step(cost_fn, theta, grad_fn=cost_fn_g)

Here, the bias gradient is just concatenated onto the end of the flat weights gradient.

Another small question - you mention that what I am doing is a combination of quantum natural gradient + classical processing. What exactly do you mean by this - are you meaning that since this is a QVC, I need a loss function like MSE, and this MSE part is “classical processing”, or, is there another thing at play here I am overlooking?

:+1: My understanding of what we mean by classical post processing here is that we are taking the average of the QNode over batches of data, also with the addition of a bias. Maybe @josh has some more thoughts there?

Thank you, this has solved my problem. If you’re interested, I have found an edge case caused (I think) if 0 is in x. Then, g can contain a 0 in the diagonal, and np.linalg.solve will fail. I’m working around this by just continuing-ing over calculation if this condition of if 0 is in x is met, though I plan on either finding a different way to compute it or slightly changing each 0 value, shifting it slightly into a positive number.

I am curious about how you are able to call overall_grad = quantum_grad(theta, x) inside of cost_ng. I was under the impression that a gradient cannot be calculated inside of a cost function unless you move to the PyTorch backend (which I have done for another thing I’m trying) - does this not apply here as this isn’t the cost function, but the gradient function itself?

I see. So, if I moved to an architecture/ training setup where batch_size=0 and I removed the bias term, this would be considered a purely qNG problem?

Hey @andrew,

If you’re interested, I have found an edge case caused (I think) if 0 is in x. Then, g can contain a 0 in the diagonal, and np.linalg.solve will fail. I’m working around this by just continuing -ing over calculation if this condition of if 0 is in x is met, though I plan on either finding a different way to compute it or slightly changing each 0 value, shifting it slightly into a positive number.

Good spot! Yes I believe there can be issues with inverting the metric tensor and the QNGOptimizer provides an optional lam argument to regularize the matrix.

I am curious about how you are able to call overall_grad = quantum_grad(theta, x) inside of cost_ng . I was under the impression that a gradient cannot be calculated inside of a cost function unless you move to the PyTorch backend (which I have done for another thing I’m trying) - does this not apply here as this isn’t the cost function, but the gradient function itself?

Yes, that constraint comes from Autograd, which is the default interface in PennyLane. However, when we feed a callable function to the grad_fn argument of PennyLane optimizer methods, we are actually skipping the need to evaluate the gradient using Autograd, since the optimizer can simply call the function passed through the grad_fn argument. You can see this by checking out for example the source of the GradientDescentOptimizer.

I see. So, if I moved to an architecture/ training setup where batch_size=0 and I removed the bias term, this would be considered a purely qNG problem?

PennyLane should be good with anything that is directly the output of a QNode or VQECost object. If we have classical postprocessing, things get a bit more complicated. So yes, in this case I believe that removing the bias, batch average, and squared error postprocessing should be good. However, this also makes the model more restricted.

1 Like

I see it even skips looking at the original version of cost altogether

Handling the bias is an interesting problem. With what we’ve currently done, I realise that the bias gradient is always 1 or -1, and the bias term slowly grows (or shrinks) forever. The loss then also perpetually increases. I think its because the bias term is never being compared to the “truth”. Looking at qnatgrad, that in a sense is compared to the truth in the line we find loss_ng. I tried to “classically” handle the bias term - ie, do something like:

cost_grad = qml.grad(cost, argnum=0)
grad = cost_grad(theta, X, expectations)
bias_ng = grad[-1]

which removes the issue of the bias drifting larger and larger, but doesn’t train very well. I’ve settled on something like:

bias_ng = np.tensordot(2 * (e_predicted - expectations), biasgrad, axes=1) / batch_size

This seems to make the process of training better and is maybe “closer” to how we’ve treated the weights, but just without g^-1.

I suppose we’re kinda in uncharted territory here? I would like to include the bias term so as to not restrict my model, it seems this may be the way to do it?

Hi @andrew,

That sounds reasonable and it’s great you’ve found a way to get training working!

I suppose we’re kinda in uncharted territory here?

Yes, this topic definitely has some open research questions to sort out! It would be nice to increase support for hybrid optimization and if you’d like to get involved as a contributor then let us know.

Thanks,
Tom

1 Like