Hello,
Are there plans to support backprop
differentiation method with lightning.gpu
?
I can use lightning.gpu
with adjoint
diff and all is good.
In my use case I have lots of memory and care about the speed - so would be nice to have lightning.gpu
with backprop
diff method.
P.S.
In some of my tests I see default.qubit + backprop
being much faster than lightning.gpu + adjoint
up until 18 qubits. Knowing how much faster lightning.gpu
is than default.qubit
on regular simulations, I would expect lightning.gpu + backprop
be much much faster than lightning.gpu + adjoint
P.P.S
Actually, seems like adjoint
doesn’t scale well with the number of output observables in the circuit:
- when I only have 1 observable expectation as the output of my quantum layer,
lightning.gpu + adjoint
is ~10x faster than default.qubit + backprop
.
- However when I have 18 observables as the output,
default.qubit + backprop
only slows down by 3x, lightning.gpu + adjoint
slows down by ~30x and they end up being roughly the same.
Hey @Hayk_Tepanyan!
In some of my tests I see default.qubit + backprop
being much faster than lightning.gpu + adjoint
up until 18 qubits.
Typically you’ll see improvements for >20 qubits with lightning gpu. There are overheads that make it slower for smaller systems. If it isn’t faster for you in those regimes, it would be great if we could see your code and package versions to see what’s going on
Currently we don’t support backprop here, as you mentioned. I’m not sure that this is on the horizon for us, but you can make a feature request to our github repository if you want to put it on our radar more formally.
Hope this helps!
1 Like
Thanks @isaacdevlugt,
We do see faster runtimes with lightning.gpu for >20 as you suggested.
On a related note, is there a way to configure lightning.gpu
such that it can “treat” thousands of small circuits as 1 large circuit?
In other words, if instead of single 20 qubit circuit I have 1024 10-qubit circuits, all based on the same parametrized QNode just having different params - is there a way to use lightning.gpu
here and achieve a similar runtime as for a single 20 qubit circuit?
This scenario is very common in qml use cases where the 1024 above is the batch_size. Currently we see clear linear dependency on the batch_size when using GPUs as opposed to classical ML cases.
Nice!
On a related note, is there a way to configure lightning.gpu
such that it can “treat” thousands of small circuits as 1 large circuit?
Yep! I think it makes more sense based on how our GPU-distributed support works to think about it as one circuit parallelized over many GPUs. Check out our blog post for more info on how to do that: PennyLane v0.31 released | PennyLane Blog
Let me know if that helps!