I have searched the internet, this forum, ChatGPT, etc, for a way to speed up apply_operation by doing it on Lightning device, but can’t find any.
Here is my minimal reproducer:
from pennylane import numpy as np
from pennylane.devices.qubit import apply_operation
import pennylane as qml
ket = np.asarray(
[
[9.82601808e-01 - 0.14850574j, 9.85890302e-02 + 0.01490027j],
[7.45635195e-04 + 0.00493356j, 7.43148086e-03 - 0.04917107j],
]
)
op = qml.RX(np.array(0.4, requires_grad=True), wires=1)
dU = qml.QubitUnitary(qml.operation.operation_derivative(op), op.wires)
print(apply_operation(dU, ket))
dev = qml.device("lightning.qubit", wires=2)
@qml.qnode(dev)
def circuit(state):
qml.QubitStateVector(state.reshape(-1), range(2))
qml.apply(dU)
return qml.state()
print(circuit(ket))
I attempt to speed up by wrapping the dU operation inside a circuit function. But I’m not sure if it is worth it. I haven’t tested the performance because the 2 different approaches do not match in their answer.
Hmmm, I am bit a confused about what are you trying to do. Could you give me a little bit more context?
My question is about speeding up standalone apply_operation via Lightning.
If understand it correctly, I can say that the short answer to your question is no.
Basically, Lightning has a completely different way to apply operations, and different functions to calculate that.
Therefore, I would suggest using Lightning directly, as you did. However, since dU is not a unitary, this is not exactly a valid operation in a circuit.
Anyway, if you would like to suggest enhancements or you would like to contribute directly, feel free to check the github repos and open an Issue or a PR.
And so, my main question has been partially answered. Except that I have to redefine the device dev_gpu = qml.device("lightning.gpu", wires=num_wires), every time I do gpu_apply_operation(dU, input, output). Apparently dev_gpu.reset() is not sufficient, and there are probably some hidden states that acummulate.
Regarding with the secondary problem, it seems indeed it is mainly due to dU not being unitary. And the error is implicit, because there is no unitary check on the ops inside the circuit function.
While you can technically use the lightning device this way by directly calling through the Python-C++ bindings to the device, I would recommend keeping to the PennyLane UI.
Calling directly to the device makes many assumptions, and isn’t a supported way to use the device as it circumvents a lot of the existing PennyLane logic layers around gradients, Hamiltonian supports, and output formats.
If you simply require applying gates to the C++ device, creating the device and applying can be done as
dev = qml.device("lightning.qubit", wires=10)
dev.apply(...)
While the above can work, we will recommend using the qnode approach where possible, which ensure efficient queuing of the operation, tracks any additional internal operations required for correct output given the basis your operation can apply in, and ensures end to end performance.
Also, note that if using lightning.gpu and calling the above approach will cause everything to sync back to the host, and you will lose any performance gains. If you want to define a state-vector input for lightning.gpu I recommend using the qnode directly, and avoiding raw calls to the device in this situation.