Qlm GradientDescent very slow with lightning.gpu

The problem is with Gradient Descent optimizer from pennylane and lightning.gpu compatibility I guess?

When I use default.qubit the optimizer step runs in a few seconds, when using lightning.gpu it hasn’t yet finished after 10 minutes with GPU or without GPU.

I don’t even know what to do, lightning.gpu without a GPU defaults to the CPU so I thought it should perform the same as the default.qubit.

SOLUTION: parameter-shift was just extremely slow

Hey @ilmars_kuhtarskis! Welcome to the forum :smiley:!

It really depends on your application. It sounds like you’re running code that uses a relatively low number of qubits (<<20). In that regime, Lightning-GPU isn’t going to outperform lightning qubit, possibly not even default qubit. Lightning-GPU is really meant for high qubit counts :).

1 Like

I could accept this answer if the speed difference wasn’t too big, but the default qubit takes 10 seconds for my code, but the lightning qubit hadn’t finished after 30 minutes.

How come lightning.qubit is so much better for everything else multiple x speedup, sometimes even more than 10x speedup, but here it is because it is ‘lightning qubit’. It is obviosuly better, I am just trying to find out what is the problem with the pennylane optimizer

Does setting diff_method="adjoint" help? It could be that lightning is defaulting to parameter-shift, which is going to be slower.

1 Like

I tried parameter-shift on default.qubit and it seems it is also taking really long now, I need to change my code for it to work for lightning.gpu becase some support issues… and check if it fixes the problem.

I think you are correct, I will update tomorrow!
Thank you, I definitely would not have figured this out by myself.

1 Like

Great! Let us know if the behaviour still persists. At that point, it would be great to get a full code example and for you to share your package versions :slight_smile:

1 Like

I couldn’t figure out how to get the exp val of all 0s state without support errors for the adjoint diff_method, so I just switched to finite_diff, hopefully it is not that bad.

If you have a solution it would be nice though :smiley:

I am using qml 0.30 cuda 11.7. I will try maybe param-shift isn’t that slow on a supercomputer If it is much more better than finite_diff/adjoint.

Here is my circuit, I think it is just stolen from kernel tutorial:

def layer(x, params, wires, i0=0, inc=1):
    """Building block of the embedding ansatz"""
    i = i0
    for j, wire in enumerate(wires):
        qml.Hadamard(wires=[wire])
        qml.RZ(x[i % len(x)], wires=[wire])
        i += inc
        qml.RY(params[0, j], wires=[wire])

    qml.broadcast(unitary=qml.CRZ, pattern="ring", wires=wires, parameters=params[1])

def ansatz(x, params, wires):
    """The embedding ansatz"""
    for j, layer_params in enumerate(params):
        layer(x, layer_params, wires, i0=j * len(wires))

#check if I need batch_obs
dev = qml.device("lightning.gpu", wires=num_wires, shots=None)
wires = dev.wires.tolist()

@qml.qnode(dev, interface="autograd", diff_method="finite-diff")
def kernel(x1, x2, params):
    ansatz(x1, params, wires=wires)
    qml.adjoint(ansatz)(x2, params, wires=wires)
    #return probability of all 0 state
    return qml.expval(qml.Projector([0]*num_wires, wires=wires))

I got adjoint to work and it is just as fast as backprop on the CPU and GPU it’s amazing!

I replaced the kernel function with this:

@qml.qnode(dev, interface="autograd", diff_method="adjoint") 
def kernel_circuit(x1, x2, params):
    ansatz(x1, params, wires=wires)
    qml.adjoint(ansatz)(x2, params, wires=wires)
    return [qml.expval(qml.PauliZ(w)) for w in wires]

def kernel(x1, x2, params):
    expectations = kernel_circuit(x1, x2, params)
    return np.prod(expectations)

Oh that’s awesome! Glad you were able to get this resolved :grin:. If there’s anything else do let us know!