Vectorization of gradient circuits

Hi all!

I want to take the gradient of a circuit that takes trainable parameters params and sampled inputs x. The inputs consist of batches of 2 number, say, x_1 and x_2. The batch size is 10,000, meaning x \in \mathbb{R}^{10} \times \mathbb{R}^2. The trainable parameters are p \in \mathbb{R}^3. That is, I want the gradient circuit to be vectorized so that it directly gives me a gradient over all the x I put into the network. Then, I want the optimizer to optimize the params, the x are fixed. The goal is to minimize \nabla_{x_1} \mathrm{circuit}(\mathrm{params}, x). The latter should be computed for 10 different inputs at once, I guess the optimizer then has to optimize the mean of these outputs.

How do I do this with input batches x? The code I used is

import pennylane as qml
from pennylane import numpy as np

dev = qml.device('default.qubit', wires=3)

x = np.random.random([10,2], requires_grad=True)
params = np.random.random([3], requires_grad=True)

#@qml.batch_params(all_operations=True)  # <-- did not help
@qml.qnode(dev, maxdiff=2)
def circuit(params, x):
    qml.RX(x[:,0], wires=0)
    qml.RY(x[:,1], wires=1)
    qml.RZ(x[:,0], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")

    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RZ(params[2], wires=2)

    qml.broadcast(qml.CNOT, wires=[0, 1, 2], pattern="ring")
    
    return qml.expval(qml.PauliZ(0))

def loss(params, x):
    grad_circ = qml.grad(circuit)
    u_x = grad_circ(params, x)[0]
    
    return u_x

opt = qml.AdamOptimizer()

for i in range(4):
    params = opt.step(loss, params, x)
    print(f"Step {i}: cost = {loss(params, x):.2f}")

I also tried this with including np.mean() at several places and with jax as the interface and optimizer optax, to no avail. The error I get is (without the stack trace):

TypeError: Grad only applies to real scalar-output functions. Try jacobian, elementwise_grad or holomorphic_grad.

jax gives a similar error, but mentions the output shape of (10,).

qml.about() gives:

Summary
Name: PennyLane
Version: 0.30.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/XanaduAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /home/mielke/QPINNs/qpinns/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml
Required-by: PennyLane-Lightning

Platform info:           Linux-4.19.0-24-cloud-amd64-x86_64-with-glibc2.28
Python version:          3.11.3
Numpy version:           1.23.5
Scipy version:           1.10.1
Installed devices:
- default.gaussian (PennyLane-0.30.0)
- default.mixed (PennyLane-0.30.0)
- default.qubit (PennyLane-0.30.0)
- default.qubit.autograd (PennyLane-0.30.0)
- default.qubit.jax (PennyLane-0.30.0)
- default.qubit.tf (PennyLane-0.30.0)
- default.qubit.torch (PennyLane-0.30.0)
- default.qutrit (PennyLane-0.30.0)
- null.qubit (PennyLane-0.30.0)
- lightning.qubit (PennyLane-Lightning-0.30.0)

Am I doing something very wrong, or is this just not possible with pennylane?

As an addon question, I need to do this with the second derivative wrt to x_1 too. Is that possible?

Thanks!

Hey @shorthans! Welcome to the forum :tada:

Just want to clarify. You’re saying

I want to take the gradient of a circuit that takes trainable parameters params and sampled inputs x… Then, I want the optimizer to optimize the params, the x are fixed

But, then you want to take the gradient of a cost function with respect to \nabla_{x_1}(\texttt{params}, \texttt{x}) :thinking:. Does x_1 belong to params? I think that’s what you’re saying.

The issue is indeed with qml.grad not being so friendly with broadcasting in this case (nor does it work for non-scalar outputs, as the error is suggesting). When you call circuit(params, x) and x is a batch of parameters that circuit needs to broadcast over, qml.grad just sees it as a non-scalar and yells at you.

I’m not sure if this is intended or not, but based on your explanation, I don’t think you should call qml.grad in your loss function :slight_smile:. When using opt.step_and_cost, the loss function you pass into it just needs to be some \mathbb{R}^n \mapsto \mathbb{R} function. For instance:

dev = qml.device("default.qubit", wires=3)

params = np.random.random([3], requires_grad=True)

x = np.array([[0.1, 0.2], [0.3, 0.4], [1.1, 1.2], [1.3, 1.4]], requires_grad=False)
opt = qml.GradientDescentOptimizer(0.9)


@qml.qnode(dev)
def circuit(params, x):
    qml.RX(x[:, 0], wires=0)
    qml.RY(x[:, 1], wires=1)
    qml.RZ(x[:, 0], wires=2)

    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RZ(params[2], wires=2)

    return qml.expval(qml.PauliZ(0))


def loss_good(params, x):
    return np.mean(circuit(params, x))


for i in range(10):
    (params, _,), l = opt.step_and_cost(loss_good, params, x)
    print(l)

Then, opt.step_and_cost takes care of differentiating :slight_smile:.

0.5878599651992247
0.1369228012390685
-0.5064351812534149
-0.8310702482677879
-0.8710145492675354
-0.8730975298108528
-0.8731936827478932
-0.8731980925571439
-0.8731982947411181
-0.8731983040108606

Hope this helps!

Hey @isaacdevlugt, thanks! Especially for the explanations :grinning:

I tried the np.mean() in the loss, it didn’t change anything unfortunately. I do need the gradient in the loss function, since I need to optimize for a minimal combination of the circuit gradients later on, e.g., minimize (the mean of) \nabla_{x_1}\mathrm{circ}(\mathbf{x};\mathrm{params}) - 0.4 \nabla_{x_2}\mathrm{circ}(\mathbf{x};\mathrm{params}) + \Delta_{x_1}\mathrm{circ}(\mathbf{x};\mathrm{params}). This is a part of my loss function. I need the optimizer to find the params that minimize this expression of the gradients wrt the x_i.

Judging by what you said, I guess this is just not possible in a vectorized form and I have to iterate over each sample of the x_i. The gradient circuit gives me a tuple containing two gradients, one wrt params, and a second wrt \mathbf{x}. That’s why I thought it would be possible to optimize the params while using the \mathbf{x} gradients differently.

I tried this with the tensorflow and pytorch backends too. It did kind of work with pytorch, but the process is extremely slow. That’s why I wanted to move to pure pennylane (or jax).

Thanks again!

Oh okay! If I understand correctly now, then you can use autograd.elementwise_grad:

import pennylane as qml
from pennylane import numpy as np

import autograd as ag

dev = qml.device("default.qubit", wires=3)

params = np.random.random([3], requires_grad=True)

x = np.array([[0.1, 0.2], [0.3, 0.4], [1.1, 1.2], [1.3, 1.4]], requires_grad=False)
opt = qml.GradientDescentOptimizer(0.9)


@qml.qnode(dev)
def circuit(params, x):
    qml.RX(x[:, 0], wires=0)
    qml.RY(x[:, 1], wires=1)
    qml.RZ(x[:, 0], wires=2)

    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.RZ(params[2], wires=2)

    return qml.expval(qml.PauliZ(0))


def loss(params, x):
    grad_circ = ag.elementwise_grad(circuit)
    u_x = grad_circ(params, x)[0]

    return u_x


(params,  _), l = opt.step_and_cost(loss, params, x)
print(l)
print(dev.num_executions)
-3.4838506140310317
1

Let me know if that works!

Awesome! I’ve been looking for an elementwise gradient method for a looong time. Apparently there used to be one in pennylane (qml.elementwise_grad(), not sure if I read it in an error message or some forum post), but it seems to be gone.

If I understand the code correctly, ag.elementwise_grad(circuit) gives the elementwise gradient with respect to the params. grad_circ(params, x) returns three numbers, one for each component of params.
From the code of autograd on github, the method computes the jacobian of the circuit, then sums over columns. I guess it only does so for the first argument, not for all of them. I need the gradients with respect to x_1 and x_2 for the loss.

Any further ideas? I could probably clone the params 10 times and concatenate them with the x, then use the 3rd component of the gradient circuit, but I’m not sure how efficient this is. Edit: Ah, no, I can’t, as the optimizer would then also update x.

I guess it only does so for the first argument, not for all of them. I need the gradients with respect to x_1 and x_2 for the loss.

By first argument, you mean params? So you need to differentiate w.r.t. the input data x as well?