Regarding Amazon's local simulator run time

kabirkhanna85 · April 19, 2021, 7:35am

I am running a variation of the max-cut problem using QAOA. For N=6 and p=1 layer, Pennylane’s local simulator gets the job done in under 10 minutes(~300 steps). Here’s the device definition:

dev = qml.device("default.qubit", wires=n_wires, analytic=True, shots=1)

On the other hand, when I use the Braket local simulator, each step takes about a minute or so. Here’s my device definition:

dev = qml.device("braket.local.qubit", wires=n_wires, shots=1)

What’s are the plausible reasons for this mismatch? It is a local simulator and so the only mismatch seems to be on braket’s simulator backend which cannot be helped. But is there anything else that comes to mind?

Ant_Hayes · April 19, 2021, 5:27pm

Hi @kabirkhanna85, thanks for sharing your results!

This behaviour is to be expected as there is an inherent performance cost associated with interfacing pennylane with external devices. We are continuously working to minimise this! For now, default.qubit is well optimised so is a strong choice! A notable exception to this rule is qulacs.simulator device which is generally very fast!

If you are interested in performing QAOA on remote devices and are finding it takes too long to run the full algorithm, a useful hybrid solution would be to train your parameters on default.qubit, save them and load them for circuit evaluation on a plugin device.

Let us know if you have any more questions!

kabirkhanna85 · April 20, 2021, 5:52am

Thanks for the hybrid approach. I shall try it out.

However, I did try the qulacs.simulator and it is not nearly as fast as default.qubit on my algorithm but it is definitely faster than braket.local.qubit. Is there anything wrong on my end because you state otherwise?

Ant_Hayes · April 20, 2021, 10:00am

Hi @kabirkhanna85, thanks for the update!

Thinking about it, I have had a similar experience recently when I swapped out a qualcs.simulator device for a default.qubit device and found the optimisation step of QAOA was faster on a default.qubit. So I don’t think this is an issue with your local code but is an interesting result which we will look into further and we can share the details with you.

It may simply be a consequence of our recent effort to speed up default behaviour in pennylane.

Ant_Hayes · April 20, 2021, 7:43pm

Hi @kabirkhanna85, we have investigated this further and found that the apparent difference in speed is due to the default diff_method associated with the qml.ExpvalCost() function.

When calling this on a default.qubit the default diff_method='backprop', whereas on qulacs.simulator the default is diff_method='parameter-shift' which is much slower than the former.

Hope this has shed some light on the situation. Let us know if you have any more questions!

josh · April 21, 2021, 5:34am

For more details on the available differentiation methods, where they can be used, and how they differ, check out the gradients page in our documentation: https://pennylane.readthedocs.io/en/stable/introduction/interfaces.html

kabirkhanna85 · April 21, 2021, 8:41am

You are right. This is possibly why there’s such a difference. Thank you for looking into it!

kabirkhanna85 · April 21, 2021, 8:42am

Thank you for the reference @josh!

Ant_Hayes · April 21, 2021, 10:12am

Glad we could help, let us know if you have any more questions!

kabirkhanna85 · May 12, 2021, 3:57am

Hey, another quick issue. For some reason, Amazon’s SV1 simulator takes 5-10 minutes for 5 steps in a basic 1 layer QAOA from Pennylane’s beginner demo. Do you think it might be because of the queuing system for tasks on simulators? Or is it generally almost instant and there might be something else that is wrong? Please let me know what you think.

Maria_Schuld · May 12, 2021, 8:00am

Hey @kabirkhanna85,

In these early days of quantum machine learning and quantum computing, remote simulation pipelines are still very slow.

Basically, PennyLane creates circuits needed to compute the gradient of an optimisation step, sends them in a batch to the Amazon Braket server, where they get queued, simulated in parallel, the results stored in a bucket, retrieved, and sent back to your computer. The overhead of this pipeline is still large.

Having said that, the power of the SV1 means that the simulation itself is very fast. In other words, you will find that the runtime hardly depends on the number of qubits (until you reach very large numbers of around 30). So there is a point where the remote simulation will beat your local one!

The good news is that speeding up this pipeline is one of the top priorities for the PennyLane dev team at the moment, and we hope to improve things a lot - so watch the space

Topic		Replies	Views
Help with Optimizers and speedups PennyLane Help	14	799	November 15, 2021
Qualcs device and qml.grad() timings PennyLane Help	6	652	March 10, 2022
PennyLane v0.11.0 Released PennyLane Development	2	773	October 21, 2020
Using PennyLane for Q-Reinforcement Learning PennyLane Help	9	818	July 4, 2022
Execution time very long. Options to speed up? PennyLane Plugins	9	2243	April 26, 2021

Regarding Amazon's local simulator run time

Related topics