Get gradient of quantum circuit with PyTorch interface

Hello! I’m trying to get the gradient values of my quantum circuit to do a Barren Plateaus analysis. I am following the following tutorial: Barren plateaus in quantum neural networks — PennyLane documentation. However, I am using the torch interface, and I am running into a TypeError when I just tweak the code a little. Additionally, I am a bit confused about how the gradient in the tutorial is being defined, because I thought that the gradient has to be defined with respect to some cost function, which is not present in the tutorial code. Incidentally, I am using a classical cost function; how would I get the gradients with respect to that?

import pennylane as qml
import numpy as np
import matplotlib.pyplot as plt

import torch
from torch.autograd import Variable

# qcircuit
def rotation_layer(w):
    for i in range(num_qubits):
        qml.RY(w[i], wires=i)
def entangling_block(w):
    for i in range(num_qubits):
        qml.CZ(wires = [i, (i+1)%num_qubits])
def generator(w, num_qubits, num_layers = 3):
    # if init_strategy == "uniform":
    #     qml.Hadamard(wires=range(num_qubits)

    for i in range(1, num_layers*2 + 1, 2):
        entangling_block(w[num_qubits * (i) : num_qubits * (i+1)])
        rotation_layer(w[num_qubits * (i+1) : num_qubits * (i+2)])

    return qml.probs(wires=range(num_qubits))

num_qubits = 3 # arbitrary
num_layers = 3
params = np.random.uniform(0, np.pi, size=(num_layers * 2 + 1) * num_qubits)

dev = qml.device("default.qubit", wires=num_qubits)
qcircuit = qml.QNode(generator, dev, interface="torch") # Have to use torch for Pytorch based optimization
grad = qml.grad(qcircuit, argnum=0)
gradient = grad(params, num_qubits, num_layers)

Full error message below:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_120/ in <cell line: 35>()
     33 qcircuit = qml.QNode(generator, dev, interface="torch")
     34 grad = qml.grad(qcircuit, argnum=0)
---> 35 gradient = grad(params, num_qubits, num_layers)

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/ in __call__(self, *args, **kwargs)
    113             return ()
--> 115         grad_value, ans = grad_fn(*args, **kwargs)
    116         self._forward = ans

/opt/conda/envs/pennylane/lib/python3.9/site-packages/autograd/ in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/ in _grad_with_forward(fun, x)
    131         difference being that it returns both the gradient *and* the forward pass
    132         value."""
--> 133         vjp, ans = _make_vjp(fun, x)
    135         if not vspace(ans).size == 1:

/opt/conda/envs/pennylane/lib/python3.9/site-packages/autograd/ in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

/opt/conda/envs/pennylane/lib/python3.9/site-packages/autograd/ in trace(start_node, fun, x)
      8     with trace_stack.new_trace() as t:
      9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
     11         if isbox(end_box) and end_box._trace == start_box._trace:
     12             return end_box._value, end_box._node

/opt/conda/envs/pennylane/lib/python3.9/site-packages/autograd/ in unary_f(x)
     13                 else:
     14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
     16             if isinstance(argnum, int):
     17                 x = args[argnum]

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/ in __call__(self, *args, **kwargs)
    845             return res
--> 847         res = qml.execute(
    848             [self.tape],
    849             device=self.device,

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/interfaces/ in execute(tapes, device, gradient_fn, interface, mode, gradient_kwargs, cache, cachesize, max_diff, override_shots, expand_fn, max_expansion, device_batch_transform)
    649     if gradient_fn == "backprop" or interface is None:
    650         return batch_fn(
--> 651             qml.interfaces.cache_execute(
    652                 batch_execute, cache, return_tuple=False, expand_fn=expand_fn
    653             )(tapes)

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/interfaces/ in wrapper(tapes, **kwargs)
    204         else:
    205             # execute all unique tapes that do not exist in the cache
--> 206             res = fn(execution_tapes.values(), **kwargs)
    208         final_res = []

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/interfaces/ in fn(tapes, **kwargs)
    129         def fn(tapes: Sequence[QuantumTape], **kwargs):  # pylint: disable=function-redefined
    130             tapes = [expand_fn(tape) for tape in tapes]
--> 131             return original_fn(tapes, **kwargs)
    133     @wraps(fn)

/opt/conda/envs/pennylane/lib/python3.9/ in inner(*args, **kwds)
     77         def inner(*args, **kwds):
     78             with self._recreate_cm():
---> 79                 return func(*args, **kwds)
     80         return inner

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/ in batch_execute(self, circuits)
    655             # TODO: Insert control on value here
--> 656             res = self.execute(circuit)
    657             results.append(res)

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/devices/ in execute(self, circuit, **kwargs)
    233                     )
--> 235         return super().execute(circuit, **kwargs)
    237     def _asarray(self, a, dtype=None):

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/ in execute(self, circuit, **kwargs)
    431         # apply all circuit operations
--> 432         self.apply(circuit.operations, rotations=circuit.diagonalizing_gates, **kwargs)
    434         # generate computational basis samples

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/devices/ in apply(self, operations, rotations, **kwargs)
    267                         self._debugger.snapshots[len(self._debugger.snapshots)] = state_vector
    268             else:
--> 269                 self._state = self._apply_operation(self._state, operation)
    271         # store the pre-rotated state

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/devices/ in _apply_operation(self, state, operation)
    295             return self._apply_ops[operation.base_name](state, axes, inverse=operation.inverse)
--> 297         matrix = self._asarray(self._get_unitary_matrix(operation), dtype=self.C_DTYPE)
    299         if operation in diagonal_in_z_basis:

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/devices/ in _get_unitary_matrix(self, unitary)
    307         if unitary in diagonal_in_z_basis:
    308             return self._asarray(unitary.eigvals(), dtype=self.C_DTYPE)
--> 309         return self._asarray(unitary.matrix(), dtype=self.C_DTYPE)
    311     def sample_basis_states(self, number_of_states, state_probability):

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/devices/ in _asarray(self, a, dtype)
    248                 res =[torch.reshape(i, (-1,)) for i in res], dim=0)
    249         else:
--> 250             res = torch.as_tensor(a, dtype=dtype)
    252         res = torch.as_tensor(res, device=self._torch_device)

/opt/conda/envs/pennylane/lib/python3.9/site-packages/autograd/numpy/ in __len__(self)
     19     dtype = property(lambda self: self._value.dtype)
     20     T = property(lambda self: anp.transpose(self))
---> 21     def __len__(self): return len(self._value)
     22     def astype(self, *args, **kwargs): return anp._astype(self, *args, **kwargs)

TypeError: object of type 'numpy.complex128' has no len()

Output of qml.about():

Name: PennyLane
Version: 0.28.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
License: Apache License 2.0
Location: /opt/conda/envs/pennylane/lib/python3.9/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, retworkx, scipy, semantic-version, toml
Required-by: PennyLane-Cirq, PennyLane-Lightning, PennyLane-qiskit, pennylane-qulacs, PennyLane-SF

Platform info:           Linux-5.4.209-116.367.amzn2.x86_64-x86_64-with-glibc2.31
Python version:          3.9.15
Numpy version:           1.23.5
Scipy version:           1.10.0
Installed devices:
- default.gaussian (PennyLane-0.28.0)
- default.mixed (PennyLane-0.28.0)
- default.qubit (PennyLane-0.28.0)
- default.qubit.autograd (PennyLane-0.28.0)
- default.qubit.jax (PennyLane-0.28.0)
- (PennyLane-0.28.0)
- default.qubit.torch (PennyLane-0.28.0)
- default.qutrit (PennyLane-0.28.0)
- null.qubit (PennyLane-0.28.0)
- cirq.mixedsimulator (PennyLane-Cirq-0.28.0)
- cirq.pasqal (PennyLane-Cirq-0.28.0)
- cirq.qsim (PennyLane-Cirq-0.28.0)
- cirq.qsimh (PennyLane-Cirq-0.28.0)
- cirq.simulator (PennyLane-Cirq-0.28.0)
- lightning.qubit (PennyLane-Lightning-0.28.2)
- strawberryfields.fock (PennyLane-SF-0.20.1)
- strawberryfields.gaussian (PennyLane-SF-0.20.1)
- strawberryfields.gbs (PennyLane-SF-0.20.1)
- strawberryfields.remote (PennyLane-SF-0.20.1)
- (PennyLane-SF-0.20.1)
- qiskit.aer (PennyLane-qiskit-0.28.0)
- qiskit.basicaer (PennyLane-qiskit-0.28.0)
- qiskit.ibmq (PennyLane-qiskit-0.28.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.28.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.28.0)
- qulacs.simulator (pennylane-qulacs-0.28.0)

Thanks for your question @jkwan314 .

qml.grad is for taking derivatives of circuits with the autograd interface, that accept pennylane numpy inputs and return pennylane numpy outputs.

Since you have have requested the torch interface, you need to provide torch variables as inputs and take gradients with torch.

For example,

>>> params = torch.tensor(params, requires_grad=True)
>>> res = torch.sum(qcircuit(params, num_qubits, num_layers))
>>> res.backward()
>>> params.grad
tensor([ 5.5511e-17,  2.7756e-16, -5.5511e-17,  0.0000e+00,  0.0000e+00,
         0.0000e+00,  1.1102e-16,  2.2204e-16,  5.5511e-17,  0.0000e+00,
         0.0000e+00,  0.0000e+00,  5.5511e-17,  0.0000e+00,  5.5511e-17,
         0.0000e+00,  0.0000e+00,  0.0000e+00, -1.1102e-16, -1.1102e-16,
        -1.1102e-16], dtype=torch.float64)

Once you specify interface="torch", you can treat the QNode as you would any other torch function.

Hope that helps clear things up.


Hi, I just had a conceptual followup for Barren Plateaus. In the tutorial, they tracked a single parameter across multiple random circuits, but my model does iterative optimization over time. Does this mean that I should be tracking the variance of the gradient of a single parameter in my model across time? What about the other parameters?

Hey @jkwan314!

It boils down to what you want to do with your code and what your research needs are :slight_smile:. The reason the tutorial tracks a single parameter’s gradient across multiple random circuits with more and more qubits is to empirically demonstrate the signature of barren plateaus: exponentially vanishing gradients with the number of qubits.

If you want to do this with your model, then you’d follow something similar to the demo. However, optimizing your model might be at the mercy of barren plateaus; it will be hard to optimize if they exist.

Hi @isaacdevlugt , thank you for your prompt reply! To give a bit more context, I am trying to reproduce the results of Zoufal et. al (2019) Quantum Generative Adversarial Networks for learning and loading random distributions | npj Quantum Information, but I am unable to do so; the output generated by my quantum generator seems to be totally random, so I was suspecting Barren Plateaus.

I am having trouble understanding exactly the role of the variance here. Is it the variance with respect to the other parameters in the model? Or the variance of the gradient of a single parameter across time?

Thank you.

Is it the variance with respect to the other parameters in the model? Or the variance of the gradient of a single parameter across time?

It’s how the variance of the derivative (for each parameter) depends on the number of qubits in your circuit if your circuit is randomly initialized (formally, “randomly” here means Haar-random — randomly sampled unitaries, but barren plateaus can still happen otherwise). There are some other nuances with barren plateaus (cost functions, entanglement, etc.) — it’s a lot to cover :sweat_smile:.

Appendix I of this paper shows all of the math in gory detail :slight_smile:.