Running apply_operation on Lightning device

I have searched the internet, this forum, ChatGPT, etc, for a way to speed up apply_operation by doing it on Lightning device, but can’t find any.

Here is my minimal reproducer:

from pennylane import numpy as np
from pennylane.devices.qubit import apply_operation

import pennylane as qml


ket = np.asarray(
    [
        [9.82601808e-01 - 0.14850574j, 9.85890302e-02 + 0.01490027j],
        [7.45635195e-04 + 0.00493356j, 7.43148086e-03 - 0.04917107j],
    ]
)
op = qml.RX(np.array(0.4, requires_grad=True), wires=1)
dU = qml.QubitUnitary(qml.operation.operation_derivative(op), op.wires)

print(apply_operation(dU, ket))

dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(dev)
def circuit(state):
    qml.QubitStateVector(state.reshape(-1), range(2))
    qml.apply(dU)
    return qml.state()

print(circuit(ket))

I attempt to speed up by wrapping the dU operation inside a circuit function. But I’m not sure if it is worth it. I haven’t tested the performance because the 2 different approaches do not match in their answer.

1 Like

Noting that the reason they don’t match is because of requires_grad=True in the qml.Rx param.

1 Like

Hello @rht, The documentation for Rx is here: qml.RX — PennyLane 0.31.0 documentation

There are Pennylane demos that easily switch from default.qubit to lightning.qubit that should help with the overall code.

The requires_grad=True is in the tensor param (0.4) of qml.Rx itself, so it is indirect, instead of a direct param of qml.Rx.

There are Pennylane demos that easily switch from default.qubit to lightning.qubit that should help with the overall code.

I’m aware of this in the context of a circuit decorator. My question is about speeding up standalone apply_operation via Lightning.

1 Like

Hello @rht ! Welcome to the forum!

Noting that the reason they don’t match is because of requires_grad=True in the qml.Rx param.

Are you sure about that? I just took your code example above and changing requires_grad=False didin’t make much difference.

Besides, dU is not a unitary operation, so I am suspecting this is the cause of the divergence in your code. :thinking:

Hmmm, I am bit a confused about what are you trying to do. Could you give me a little bit more context?

My question is about speeding up standalone apply_operation via Lightning.

If understand it correctly, I can say that the short answer to your question is no. :face_with_diagonal_mouth:

Basically, Lightning has a completely different way to apply operations, and different functions to calculate that.

Therefore, I would suggest using Lightning directly, as you did. However, since dU is not a unitary, this is not exactly a valid operation in a circuit.

Anyway, if you would like to suggest enhancements or you would like to contribute directly, feel free to check the github repos and open an Issue or a PR. :slight_smile:

Does it help? :slight_smile:

Thank you for the clarifying replies so far.

If understand it correctly, I can say that the short answer to your question is no. :face_with_diagonal_mouth:

I figured out a way to do the equivalent of apply_operation via Lightning-GPU:

def gpu_apply_operation(op, input_array, output_array):
    dev_gpu.apply(
        [qml.QubitStateVector(input_array, range(dev_gpu.num_wires)), op]
    )
    dev_gpu.syncD2H(output_array)

And so, my main question has been partially answered. Except that I have to redefine the device dev_gpu = qml.device("lightning.gpu", wires=num_wires), every time I do gpu_apply_operation(dU, input, output). Apparently dev_gpu.reset() is not sufficient, and there are probably some hidden states that acummulate.

Regarding with the secondary problem, it seems indeed it is mainly due to dU not being unitary. And the error is implicit, because there is no unitary check on the ops inside the circuit function.

Hi @rht

While you can technically use the lightning device this way by directly calling through the Python-C++ bindings to the device, I would recommend keeping to the PennyLane UI.

Calling directly to the device makes many assumptions, and isn’t a supported way to use the device as it circumvents a lot of the existing PennyLane logic layers around gradients, Hamiltonian supports, and output formats.

If you simply require applying gates to the C++ device, creating the device and applying can be done as

dev = qml.device("lightning.qubit", wires=10)
dev.apply(...)

While the above can work, we will recommend using the qnode approach where possible, which ensure efficient queuing of the operation, tracks any additional internal operations required for correct output given the basis your operation can apply in, and ensures end to end performance.

Also, note that if using lightning.gpu and calling the above approach will cause everything to sync back to the host, and you will lose any performance gains. If you want to define a state-vector input for lightning.gpu I recommend using the qnode directly, and avoiding raw calls to the device in this situation.

1 Like