I couldn’t figure out how to get the exp val of all 0s state without support errors for the adjoint diff_method, so I just switched to finite_diff, hopefully it is not that bad.

If you have a solution it would be nice though

I am using qml 0.30 cuda 11.7. I will try maybe param-shift isn’t that slow on a supercomputer If it is much more better than finite_diff/adjoint.

Here is my circuit, I think it is just stolen from kernel tutorial:

```
def layer(x, params, wires, i0=0, inc=1):
"""Building block of the embedding ansatz"""
i = i0
for j, wire in enumerate(wires):
qml.Hadamard(wires=[wire])
qml.RZ(x[i % len(x)], wires=[wire])
i += inc
qml.RY(params[0, j], wires=[wire])
qml.broadcast(unitary=qml.CRZ, pattern="ring", wires=wires, parameters=params[1])
def ansatz(x, params, wires):
"""The embedding ansatz"""
for j, layer_params in enumerate(params):
layer(x, layer_params, wires, i0=j * len(wires))
#check if I need batch_obs
dev = qml.device("lightning.gpu", wires=num_wires, shots=None)
wires = dev.wires.tolist()
@qml.qnode(dev, interface="autograd", diff_method="finite-diff")
def kernel(x1, x2, params):
ansatz(x1, params, wires=wires)
qml.adjoint(ansatz)(x2, params, wires=wires)
#return probability of all 0 state
return qml.expval(qml.Projector([0]*num_wires, wires=wires))
```