Quantum Natural Gradient Descent

Hi,

I am currently creating a hybrid neural network. Is applying quantum natural gradient descent something you could do to a hybrid neural network? If so, how would this be applied in pytorch?

Thanks for your help!

James

Hi @James_Ellis! Let me go through your two questions one by one.

Is applying quantum natural gradient descent something you could do to a hybrid neural network?

Yes definitely! The natural gradient descent optimizer that comes with PennyLane performs the following update step:

\theta_{t+1} = \theta_t - \eta g^{+}(\theta_t)\nabla \mathcal{L}(\theta)

where g^{+}(\theta_t) is the pseudo-inverse of the metric tensor at the current value of the quantum parameters.

In PennyLane currently, this works if your cost function is a single QNode. The metric tensor is available via the qnode.metric_tensor(params) method, and the QNGOptimizer uses this to apply the update step when using the autograd/numpy interface. You can check out quantum natural gradient tutorial for more details on inner-workings.

Note that the built-in optimizer currently will only accept cost functions which consist of a single QNode. This optimization process however should continue to work for models consisting of multiple QNodes, as long as:

  • Each parameter in the model is used in a single QNode,
  • The parameter update step for each parameter uses the corresponding metric tensor (i.e., if parameters x[4] to x[6] are used in QNode b, than the metric tensor for QNode b must be used when updating these parameters),
  • If classical processing is performed on the QNode outputs, this must be taken into account via the chain rule.

If so, how would this be applied in pytorch?

This can be done by simply defining a PyTorch custom optimizer that implements the QNG update rule as above:

class QNGOptimizer(torch.optim.Optimizer):

    def __init__(self, params, lr=0.01, diag_approx=False, lam=0):
        defaults = dict(lr=0.01, diag_approx=False, lam=0)
        super().__init__(params, defaults)


    def step(self, closure=None):
        loss = None

        if closure is not None:
            loss, metric_tensor = closure()

        for group in self.param_groups:
            for p in group["params"]:

                if p.grad is None:
                    continue

                grad = p.grad.data
                state = self.state[p]

                # State initialization
                if len(state) == 0:
                    state["step"] = 0

                g = metric_tensor([p.data.numpy()], diag_approx=group["diag_approx"])
                g += group["lam"] * np.identity(g.shape[0])

                state["step"] += 1

                d_p = torch.tensor(-group['lr'] * np.linalg.solve(g, grad))
                p.data.add_(d_p)

        return loss

This can then be used as any other PyTorch optimizer, however in this case, the closure function should also return the metric tensor function (not just the loss):

import torch
from torch.autograd import Variable

import pennylane as qml
import numpy as np

dev = qml.device("default.qubit", wires=3)


@qml.qnode(dev, interface="torch")
def circuit(params):
    # |psi_0>: state preparation
    qml.RY(np.pi / 4, wires=0)
    qml.RY(np.pi / 3, wires=1)
    qml.RY(np.pi / 7, wires=2)

    # V0(theta0, theta1): Parametrized layer 0
    qml.RZ(params[0], wires=0)
    qml.RZ(params[1], wires=1)

    # W1: non-parametrized gates
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])

    # V_1(theta2, theta3): Parametrized layer 1
    qml.RY(params[2], wires=1)
    qml.RX(params[3], wires=2)

    # W2: non-parametrized gates
    qml.CNOT(wires=[0, 1])
    qml.CNOT(wires=[1, 2])

    return qml.expval(qml.PauliY(0))

steps = 200
init_params = Variable(torch.tensor([0.432, -0.123, 0.543, 0.233]), requires_grad=True)

opt = QNGOptimizer([init_params], lr=0.01)

def closure():
    opt.zero_grad()
    loss = circuit(init_params)
    loss.backward()
    return loss, circuit.metric_tensor

for i in range(steps):
    opt.step(closure)

p_final = opt.param_groups[0]['params'][0]
print("Final parameters:", p_final)
print("Final circuit value:", circuit(p_final))

Thank you so much Josh! I really appreciate the help.

Would this custom optimiser work for the classical layers too? i.e will it reduce to the Fisher information matrix?

Would this custom optimiser work for the classical layers too? i.e will it reduce to the Fisher information matrix?

Not as written above — the optimizer I wrote above simply expects a single QNode metric tensor function to be returned by the closure.

It should be relatively simple, however, to modify the step() method and the closure function so that it returns a hybrid Fisher information/quantum geometric tensor :slightly_smiling_face:

Would it be possible to get a brief example? Sorry, I’m quite new to pytorch and pennylane.

Unfortunately, this will likely be a lot more work than a brief example, and veers into potential new research territory — and would make a very good PennyLane QML Demonstration! :slightly_smiling_face:

But I will keep this in mind, and will think about how this might be added to PennyLane/see if I can get back to you.

1 Like

Hi @josh,

What is g += group["lam"] * np.identity(g.shape[0]) doing in the code?

Also, where does the psuedo inverse occur?

Thanks for your help

Hi @James_Ellis, here we are adding on \lambda I to the metric tensor, where \lambda is a small positive value (default value 0). This is to avoid issues taking the inverse of g when g is ill-conditioned.

With respect to the pseudo-inverse, instead of taking the pseudo-inverse directly (np.linalg.pinvh), we insteaduse NumPy to solve the system of linear equations (np.linalg.solve), as this is more accurate and numerically stable than the pseudo-inverse algorithm :slightly_smiling_face:

Thanks @josh for the reply!

How would I amend the code to get metric_tensor to work for a circuit that takes both parameters and an input e.g circuit(q_in, q_weights)

Thanks!

Applying g = metric_tensor([q_in, weights]) appears to be including the input as a parameter in the matrix. Is there anyway to declare q_in as a non-parameter for the metric tensor?

Also, in SGD the average gradient is calculated from the batch size. Can you perform something similar with natural gradient descent by getting the average metric tensor from the batch size?

Hi @James_Ellis, the way to do this is to define your QNode with the input as a auxiliary parameter,

@qml.qnode(dev)
def circuit(weights, q_in=None):

You will then need to call the QNode with q_in always as a keyword argument:

circuit(weights, q_in=q_in)

By default, PennyLane treats all auxiliary/keyword arguments as non-differentiable, so they won’t show up in the metric tensor as parameters

Hi @josh, would you be familiar with how to apply the Fisher information matrix to a single nn.Linear() layer in pytorch?

Thanks again for all the help :smiley:

Hi @James_Ellis, I have to admit I’m not too sure. This sounds like a (classical) PyTorch question.

I did a quick search to see if there are resources I could point you to, unfortunately this was the top result! https://discuss.pytorch.org/t/fisher-information-matrix/73429 (I assume this is you as well :laughing:)

I found some implementations of the natural gradient in PyTorch, perhaps these might help:

Thanks so much @josh!

I have read alot about empirical and true Fisher information matrix. Does the same apply when we are calculating the inverse metric tensor for the circuit? i.e is it true or empirical?

Thanks for all the help :smiley:

Hi @James_Ellis,

the metric tensor we calculate is a block-diagonal approximation to the true (Quantum) Fisher Information Matrix up to a constant factor. We can calculate the true matrix because we “know” the underlying probability distribution which is defined by the quantum gates.

Hi @johannesjmeyer,

Why do we calculate the block-diagonal approximation and not the non block-diagonal approximation. How would the two differ?

Thanks for the reply!

Hi @James_Ellis,

This is largely an artifact of what we knew when we were working on the original QNG paper. We recognized that we could do a block diagonal approximation with the same number of quantum device queries as the diagonal (though much more classical processing).

Since then, I do think it is possible to do the full QNG tensor using a slightly different approach. We have not tested or implemented this at all though