Pennylane performance vs alternatives?

There are some recent QML frameworks that claim to have greater performance than Pennylane:

  1. TorchQuantum claims a 246x speedup over Pennylane (CPU-only) in figure 12 of their paper
  2. TensorCircuit also claims 10-100x speedups over Pennylane (both CPU and GPU) in tables 6,7,8 of their paper

I’m curious to know what the Pennylane developers think about these benchmarking claims, and if there have been any significant performance gains within Pennylane since these papers have come out.

Hey @schance995!

Interesting question. The short answer with a lot of comparative studies like these is (1) that they aren’t using the other libraries as efficiently as they can and (2) this is a 2 year old paper! Lots has improved in PennyLane since then :sweat_smile:. The long answer:

The first citation (i.e. fig 12) deals with training over data, i.e., the cost function is a sum over many data points, so cost(x) = sum_data cost(x, data). Indeed PennyLane can be slow with a naive for-loop implementation for these instances (will be much better when used with Catalyst, though). However, using things like vmap significantly speeds up this process, which is most likely what they are doing and mean with tensorized (vectorized) batch processing. Besides that, they seem to use parameter-shift, further explaining the huge performance regression

Moreover, Pennylane can only use parameter shift to obtain gradients, which is inherently sequential, so no parallelization on batch and gate dimension can be achieved.

This is not generally true :point_up:. There are many differentiation methods that are available to use with PennyLane :slight_smile:.

Similar story for the second citation (tables 6, 7, 8), though they also seem to compare just value_and_grad evaluationss on lightning and (I guess?) with default qubit + jax on GPU.

Hope that helps!

1 Like

Good points! I hope that future Pennylane and Catalyst benchmarks can be more consistent in their comparisons with other libraries.