Tips on how to improve performance on IBM hardware

andrew · August 12, 2020, 12:57pm

Hello,
I am trying to get some examples (specifically, variations of the variational quantum classifier) working on IBM hardware. I am managing to send jobs to the simulated hardware, and “real” stuff too. However, the issues I am finding is that training time is extremely long.

I am assuming therefore, at the moment, it is only feasible to train very small models (~1000 training points, very small network architectures). Is this a correct assumption?

Does anyone have any suggestions about how to improve the time it takes to train a model (the value I should use for shots, max amount of data, max network size, etc). Is it worth sacrificing using validation data (I do not mean not using test data) in your training loop to reduce time training time? Should I view this as a transfer learning-esque problem, where I train a network classically, then add a very small quantum circuit to the end, to take some pressure of the quantum network and hopefully reduce training time?

I am just curious to see how people are getting around what seems to be long network training times, even using ibmq_qasm_simulator.

Thanks! I’m forward to using the package more.

EDIT:
To be more specific, I have a variational quantum classifier with 2 quibits and 3 layers. I tried training for only 1 epoch, on just 1 piece of data. In this example, it sent ibmq_qasm_simulator around 36 jobs and took around 230 seconds. This is just for the optimisation step as well. Is this the sort of numbers I should be expecting to see? Is there a way, by hand, I can calculate how many jobs it will send?

Alain_Delgado_Gran · August 12, 2020, 8:57pm

Hi @andrew,

Thank you for your questions and welcome to the Xanadu discussion forum.

Please find below some recommendations that you may find useful:

Train on a simulator locally, not a cloud simulator.
Use IBM hardware that supports reservations rather than a queuing system (not sure if this is publicly available to everyone)
If you can’t reserve, choose a device that’s got an empty queue - sounds obvious but it really helps.
The number of shots does noticeably change the training time, using 1000 shots may be a reasonable trade-off between accuracy and speed.
Definitely validation can slow things down on each step - especially if it’s over a big validation set.

I hope this general recommendations help. Please, do not hesitate to get back to us if you have further questions.

Xanadu team.

andrew · August 13, 2020, 1:15pm

Thank you very much for the response! I am glad to know that the numbers I am getting during aren’t out of the ordinary. My aim will be then to use a small network, that uses small amounts of test data (<1000 points.) I will take into account all of your advice, thank you.

Is there an intuitive explanation as to why 1 datapoint, during training, results in >30 jobs on the IBM machine? Is it just that optimisation for each point is a process that requires a lot of calculations?

nathan · August 13, 2020, 1:43pm

Hi @andrew,

If you are doing optimization (even for one data point), PennyLane will compute the gradient with respect to all relevant model parameters. So the number of jobs scales with the number of parameters, not just the number of datapoints

andrew · August 13, 2020, 2:02pm

Hello Nathan, thanks for the reply. Sorry, I think I may be missing something. If I have a network with 2 qubits and three layers, including a bias value, would I not have 7 trainable parameters, therefore only 7 jobs per point?

nathan · August 13, 2020, 4:49pm

Hi @andrew,

The default scaling for N variable parameters is something like 2 * N + 1 (2 * N for the gradient computation via the parameter-shift rule, and the 1 represents a single evaluation to compute the cost function).

This is a rough estimate though, if you wanted a more detailed resource count, you’d have to share the explicit code you used

andrew · August 14, 2020, 11:42am

Thank you! I think I misinterpreted for a moment what was meant by trainable parameters - sorry. Each gate has three trainable parameters, so 2*N+1 makes sense to me now.

Topic		Replies	Views
Does using a cloud-based quantum simulator or computer provide a speedup over local simulators? PennyLane Help	1	19	May 5, 2025
Preferred method for long training runs? PennyLane Help	4	545	September 2, 2021
Simplest QML algorithm to run on IBM QC PennyLane Qiskit	3	403	June 30, 2023
Variational Classifier Demo Running in Qiskit PennyLane Help	1	319	February 24, 2023
Circuit simulation on 'ibmq_qasm_simulator' taking a lot of time PennyLane Qiskit	7	694	August 9, 2023

Tips on how to improve performance on IBM hardware

Related topics