Quantum chemistry operation gradient

Hello, I’ve started using PennnyLane recently for quantum chemistry simulations. For a circuit with a quantum chemistry operation, I’ve stumbled upon the fact that using different devices give different results when differentiating over a trainable parameter. What could be going wrong here?

The following is a comparison of two devices for the same circuit:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=2)
dev_lightning = qml.device("lightning.qubit", wires=2)

def circuit(phi):
    qml.Hadamard(0)
    qml.SingleExcitation(phi, wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

qnode1 = qml.QNode(circuit, dev, diff_method="parameter-shift", interface="autograd")
qnode2 = qml.QNode(circuit, dev_lightning, diff_method="parameter-shift", interface="autograd")

phi = np.array(0.1, requires_grad=True)

np.allclose(qml.grad(qnode1, argnum=0)(phi), qml.grad(qnode2, argnum=0)(phi))

Also, there seems to be some warning sometimes. Thanks!