Which device is fastest?

I have multiple questions:

  1. It is mentioned that lightning.qubit is faster than default.qubit. Then is there any reason to use default.qubit?

  2. There is also a lightning.gpu now, faster than vanilla lightning. Then if you have GPU support, you should always use this device?

  3. Using adjoint differentiation is recommended on lightning.qubit for faster results. I believe that will be true for the GPU device also. Then adjoint with lightning.gpu is the recommended way of doing things?

  4. Where does JAX-JIT come into this? default.qubit.jax also supports GPU computation. Combining that with jitting makes it very fast. Is it faster than the above option? JAX-JIT does not support everything yet. But for doing things that are supported, is it the fastest?

Of course, I can test these out myself. But I would want to know the official word on this. Also, what is the recommended diff method for the JAX device?

PS- All these devices only support pure states. We only have default.mixed for noise, which is painfully slow. I hope we can have these faster advancements for noisy circuits too!

Also, I hope the results when using any device with any diff method are identical?

Hi @ankit27kh, great questions.

  1. Some things that are supported on default.qubit (such as the ‘backprop’ diff_method) are not currently supported on lightning.qubit.
  2. For small numbers of qubits the cpu version might be faster. I will ask a colleague for a more detailed answer but my intuition is that usually gpu is faster but not always.
  3. If you want to run on hardware then you will need to use the parameter-shift rule which is much slower. Adjoint can reduce memory consumption but increase the computational time with respect to backprop. However, this memory reduction can help you get to simulate more qubits.
  4. Good question, I’ll get back to you on this one.

PS - Thanks for this feedback on the noise issue! Hopefully more developments will come that way.

When using analytical differentiation methods such as backprop and adjoint the results should be identical but in practice there might be numerical precision issues or other issues than can cause differences. I would recommend trying out different device+diff_method combinations just in case when you do your code.

Hi @ankit27kh

Here are more details provided by another member of the PennyLane team.

Currently lightning.gpu is faster only for more than 20 qubits (for a single CPU core vs single GPU). jax.jit is currently the fastest for small numbers of qubits (usually less than 12) with CPU. For the GPU, we don’t have any benchmark results (vs lightning.gpu) yet, so it’s not clear. Still, we expect that any backprogation-based gradient methods will not work for over 20 qubits as it will overuse all memory.

Hi @CatalinaAlbornoz
I implemented a 6-qubit QAOA with default.qubit.jax and default.qubit, and found the jax device is much slower than default.qubit, which was unexpected.

Hey @Yang! Welcome back :slight_smile:

It’s tough to say what’s going on here without seeing your code and your package versions. Could you attach a complete working example along with the output of qml.about()?

Another thing to note here too is that default.qubit.jax uses our old device API. Using default.qubit will now seamlessly switch you to the right backend, including JAX :slight_smile:. That might be behind the slowdown!

1 Like