Cache accumulate when get gradient of circuit

I want to record gradient variable , and used pytorch autogradient to do that.

However as i repeat such process, usage of memory constantly increases, and suddenly kernel gets down.
My computer’s memory does not increases when N_sample = 10000->1000, so i think speed of cache removal is slower than memory accumulation

I tried del L, del thetas or something else but it does not work well.
How can i remove all gradient information after L.backward()?

import torch as T
import pennylane as qml
import numpy as np

dev=qml.device("default.qubit",wires=4)
def qc(thetas):
    for i in range(4):
        qml.RX(thetas[i],wires=i)
    for i in range(3):
        qml.CZ(wires=[i,i+1])
    return qml.state()
qcirc=qml.QNode(qc,dev)
N_sample=10000
Z=T.tensor([[1, 0], [0, -1]], dtype=T.cdouble)
M=T.kron(Z, T.kron(Z, T.eye(4, dtype=T.cdouble)))
M=M.repeat(N_sample,1,1)
while(True):
    thetas=T.rand(4,N_sample,requires_grad=True)
    kets=qcirc(2*np.pi*thetas).reshape(N_sample,-1,1)
    L = T.abs(T.sum(T.bmm(T.conj(T.transpose(kets, 1, 2)),T.bmm(M, kets))))
    L.backward(retain_graph=True)
    Var = (T.std(thetas.grad, dim=1)**2).mean().item()/(4*np.pi**2)
Name: PennyLane
Version: 0.31.1
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: c:\users\user\anaconda3\lib\site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning

Platform info:           Windows-10-10.0.19044-SP0
Python version:          3.8.17
Numpy version:           1.23.5
Scipy version:           1.10.1
Installed devices:
- default.gaussian (PennyLane-0.31.1)
- default.mixed (PennyLane-0.31.1)
- default.qubit (PennyLane-0.31.1)
- default.qubit.autograd (PennyLane-0.31.1)
- default.qubit.jax (PennyLane-0.31.1)
- default.qubit.tf (PennyLane-0.31.1)
- default.qubit.torch (PennyLane-0.31.1)
- default.qutrit (PennyLane-0.31.1)
- null.qubit (PennyLane-0.31.1)
- lightning.qubit (PennyLane-Lightning-0.31.0)

Hello @maar_hybrid, I tried running your code with a QNode decorator and diff_method=“adjoint” but didn’t work because of the parameter broadcasting present. Running only RX gates seemed more diagnostic as RAM flatlines for a while.

Source: Gradients and training — PennyLane 0.31.1 documentation

Hi @maar_hybrid , thank you for your question!

Can you please explain a bit further what you want to do? I’m not sure what you mean by record gradient variable. Do you mean you want to get the gradient of your program for a specific variable?

If so then our demo on qubit rotation, the demo on using Torch with PennyLane, and the documentation on our PyTorch interface can be useful, especially the latter.