Gradient of learnable circuit parameters is None when using torch interface

Hi, I am implementing QCNN based on pennylane. For flexibility, I choose the torch interface. The forward process seems going well. However, when I check the gradient of circuit parameters, I find some of them is None, which results in that these parameters will not get updated during optimization.

Here is my code example.

def qconv_kernel(phi, params):
    '''
    Implementation of quantum convolution kernel circuit
    ------------------------------------------
    :param phi: image pixels, [n_quibits]
    :param params: learnable convolutional weights, [n_layers * n_qubits]
    '''
    # quantum circuit implemention here
    return qml.expval(qml.PauliZ(wires=0))

def qconv_torch(inputs, params, qnode, n_in_channels=1, kernel_size=[2, 2], stride=None, padding=False):
    """
    Convolves the input image with many applications of the same quantum circuit.
    ------------------------------------------------------------------------
    :param inputs: input image, [C, H, W]
    :param params: learnable convolutional weights, [n_out_channels, n_layers * n_qubits]
    :param qnode: QNode object
    :param n_in_channels: input channels of previous layer
    :param kernel_size: size of qconv kernel, [kernel_h, kernel_w]
    :param stride: step of qconv, [stride_h, stride_w]
    :param padding: whether padding, bool
    return a new image, [C_out, H_out, W_out]
    """
    for j in range(0, in_h-kernel_size[0], stride[0]):
        for k in range(0, in_w-kernel_size[1], stride[1]):
            channel_results = []
            for c_in in range(inputs.shape[0]):
                img_patch = []
                for h in range(kernel_size[0]):
                    for w in range(kernel_size[1]):
                        img_patch.append(inputs[c_in, j+h, k+w])
                out_pixel = qnode(torch.tensor(img_patch), params[i])
            out[:, j//stride[0], k//stride[1]] = torch.tensor(out_pixel)
    return out

class QCNN_torch(torch.nn.Module):
    def __init__(self, cfg):
        super().__init__()
        n_quibit = sum(cfg.MODEL.QCNN.KERNEL_SIZE)
        self.DIM = cfg.MODEL.QCNN.DIM
        dev = qml.device(cfg.CIRCUIT.BACKEND, wires=n_quibit)
        self.qnode = qml.QNode(qconv_kernel, dev, interface='torch')
        self.qconv_param = []
        for i in range(len(self.DIM)-1):
            param_weight = np.random.uniform(0, 2*math.pi, shape1*shape2)
            param_weight = np.reshape(param_weight, (shape1, shape2))
            param_weight = torch.nn.Parameter(torch.tensor(param_weight, requires_grad=True))
            self.register_parameter('layer'+str(i+1), param_weight)
            self.qconv_param.append(param_weight)

        # FC
        n_feat = ((28//(2**(len(self.DIM)-1)))**2) * self.DIM[-1]
        stdv = 1./n_feat
        self.fc = np.random.uniform(-stdv, stdv, n_feat*10)
        self.fc = np.reshape(self.fc, (n_feat, 10))
        self.fc = torch.nn.Parameter(torch.FloatTensor(self.fc))

    def forward(self, x):
        out = []
        for b in range(len(x)):
            bx = x[b]
            for i in range(len(self.DIM)-1):
                bx = qconv_torch(bx, self.qconv_param[i], self.qnode, n_in_channels=self.DIM[i])
            out.append(bx.flatten().unsqueeze(0) @ self.fc)
        return torch.cat(out, dim=0)
if __name__=='__main__':
    opt = get_opt()
    cfg = get_config(opt.config_file)


    loss_fn = torch.nn.CrossEntropyLoss()

    qcnn = QCNN_torch(cfg)
    optimizer = torch.optim.Adam(params=qcnn.parameters(), lr=0.1)
    optimizer.zero_grad()

    inputs = torch.rand(4, 1, 28, 28)
    out = qcnn(inputs)
    
    loss = loss_fn(out, torch.LongTensor([1, 2, 3, 4]))
    optimizer.param_groups[0]['params'][0].retain_grad()
    loss.backward()
    optimizer.step()

The problem is that I can only get the gradient of qcnn.fc but not other parameters.

Any suggestions are helpful. Thank you in advance.

Hi @Yang! Welcome to the forum :slightly_smiling_face:

Would you be able to reduce your code example above to a minimal, non-working example? That is, by removing parts of the code to make the example as small as possible.

This will help us pin down exactly what might be going wrong!

Yes, I removed some unnecessary parts of code. Because the failure of gradient propagation may result from the tensor assignment operation, I keep most of them. Hope it is clear and short enough.

Hi @josh. Thank you for your concern. I have solved this problem. It’s more about pytorch operations rather than pennylane. Could you please delete this topic?

No worries @Yang, I will close the topic. Feel free to open a new thread if you have any other questions!