Regarding Amazon's local simulator run time

I am running a variation of the max-cut problem using QAOA. For N=6 and p=1 layer, Pennylane’s local simulator gets the job done in under 10 minutes(~300 steps). Here’s the device definition:

dev = qml.device("default.qubit", wires=n_wires, analytic=True, shots=1)

On the other hand, when I use the Braket local simulator, each step takes about a minute or so. Here’s my device definition:

dev = qml.device("braket.local.qubit", wires=n_wires, shots=1)

What’s are the plausible reasons for this mismatch? It is a local simulator and so the only mismatch seems to be on braket’s simulator backend which cannot be helped. But is there anything else that comes to mind?

Hi @kabirkhanna85, thanks for sharing your results!

This behaviour is to be expected as there is an inherent performance cost associated with interfacing pennylane with external devices. We are continuously working to minimise this! For now, default.qubit is well optimised so is a strong choice! A notable exception to this rule is qulacs.simulator device which is generally very fast!

If you are interested in performing QAOA on remote devices and are finding it takes too long to run the full algorithm, a useful hybrid solution would be to train your parameters on default.qubit, save them and load them for circuit evaluation on a plugin device.

Let us know if you have any more questions!

Thanks for the hybrid approach. I shall try it out.

However, I did try the qulacs.simulator and it is not nearly as fast as default.qubit on my algorithm but it is definitely faster than braket.local.qubit. Is there anything wrong on my end because you state otherwise?

Hi @kabirkhanna85, thanks for the update!

Thinking about it, I have had a similar experience recently when I swapped out a qualcs.simulator device for a default.qubit device and found the optimisation step of QAOA was faster on a default.qubit. So I don’t think this is an issue with your local code but is an interesting result which we will look into further and we can share the details with you.

It may simply be a consequence of our recent effort to speed up default behaviour in pennylane.

Hi @kabirkhanna85, we have investigated this further and found that the apparent difference in speed is due to the default diff_method associated with the qml.ExpvalCost() function.

When calling this on a default.qubit the default diff_method='backprop', whereas on qulacs.simulator the default is diff_method='parameter-shift' which is much slower than the former.

Hope this has shed some light on the situation. Let us know if you have any more questions!

For more details on the available differentiation methods, where they can be used, and how they differ, check out the gradients page in our documentation: https://pennylane.readthedocs.io/en/stable/introduction/interfaces.html

You are right. This is possibly why there’s such a difference. Thank you for looking into it! :slight_smile:

Thank you for the reference @josh! :slight_smile:

Glad we could help, let us know if you have any more questions!

Hey, another quick issue. For some reason, Amazon’s SV1 simulator takes 5-10 minutes for 5 steps in a basic 1 layer QAOA from Pennylane’s beginner demo. Do you think it might be because of the queuing system for tasks on simulators? Or is it generally almost instant and there might be something else that is wrong? Please let me know what you think.

Hey @kabirkhanna85,

In these early days of quantum machine learning and quantum computing, remote simulation pipelines are still very slow.

Basically, PennyLane creates circuits needed to compute the gradient of an optimisation step, sends them in a batch to the Amazon Braket server, where they get queued, simulated in parallel, the results stored in a bucket, retrieved, and sent back to your computer. The overhead of this pipeline is still large.

Having said that, the power of the SV1 means that the simulation itself is very fast. In other words, you will find that the runtime hardly depends on the number of qubits (until you reach very large numbers of around 30). So there is a point where the remote simulation will beat your local one!

The good news is that speeding up this pipeline is one of the top priorities for the PennyLane dev team at the moment, and we hope to improve things a lot - so watch the space :slight_smile: