Hi there,
I developed a new variational algorithm during my research. I tested it on simulators (CPU, GPU) and a Quantum System One 127 qubits (IBM) using the plugin pennylane-qiskit
version 0.37 (0.38 gave me a bug with the qubits observables allocation).
Currently, my code runs for 4000 iterations with 5 qubits in 1-2 hours, depending on the server (number of parallel executions). When I want to use the IBM machine, I’m stuck in a very long execution; 20 iterations are executed in 3 hours (I have no queue on the machine - private system).
I used cProfile
to analyze where the code is running low to understand where I must modify.
The results gave me this (in short):
ncalls tottime percall cumtime percall filename:lineno(function)
...
28745 2020.299 0.070 2020.299 0.070 {method 'read' of '_ssl._SSLSocket' objects}
...
5845 608.242 0.104 608.242 0.104 {built-in method time.sleep}
I did 4 iterations in ~50 minutes, and 33 were allocated for a SSLSocket
protocol. I’m not an infra guy, so what does it mean and can I bypass it? There is also a strange line where a sleep protocol uses 10 minutes!!
And, finally, make sure to include the versions of your packages. Specifically, show us the output of qml.about()
.
Name: PennyLane
Version: 0.39.0.dev7
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: /Users/christophe_pere/pennylane-no-probs/lib/python3.12/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, toml, typing-extensions
Required-by: PennyLane-qiskit, PennyLane_Lightning
Platform info: macOS-15.0-arm64-arm-64bit
Python version: 3.12.5
Numpy version: 1.26.4
Scipy version: 1.14.1
Installed devices:
- qiskit.aer (PennyLane-qiskit-0.37.0)
- qiskit.basicaer (PennyLane-qiskit-0.37.0)
- qiskit.basicsim (PennyLane-qiskit-0.37.0)
- qiskit.ibmq (PennyLane-qiskit-0.37.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.37.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.37.0)
- qiskit.remote (PennyLane-qiskit-0.37.0)
- default.clifford (PennyLane-0.39.0.dev7)
- default.gaussian (PennyLane-0.39.0.dev7)
- default.mixed (PennyLane-0.39.0.dev7)
- default.qubit (PennyLane-0.39.0.dev7)
- default.qubit.autograd (PennyLane-0.39.0.dev7)
- default.qubit.jax (PennyLane-0.39.0.dev7)
- default.qubit.legacy (PennyLane-0.39.0.dev7)
- default.qubit.tf (PennyLane-0.39.0.dev7)
- default.qubit.torch (PennyLane-0.39.0.dev7)
- default.qutrit (PennyLane-0.39.0.dev7)
- default.qutrit.mixed (PennyLane-0.39.0.dev7)
- default.tensor (PennyLane-0.39.0.dev7)
- null.qubit (PennyLane-0.39.0.dev7)
- lightning.qubit (PennyLane_Lightning-0.38.0)
I’m using the branch 0.39 no-probs due to the qubit simulation of the optimizers QNG and QN-SPSA.
Hi @Christophe_Pere !
Thank you for posting this here, we’re investigating the time delays you mention.
I’m unable to replicate this at the moment. Do you get the same time delays when running the following code using the machine you have access to?
import pennylane as qml
from pennylane import numpy as pnp
from qiskit_ibm_runtime import QiskitRuntimeService
QiskitRuntimeService.save_account(channel="ibm_quantum", token=token, overwrite=True)
service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)
dev = qml.device('qiskit.remote', wires=127, backend=backend)
@qml.qnode(dev)
def circuit(x):
qml.RX(x, wires=[0])
qml.CNOT(wires=[0, 1])
return qml.expval(qml.PauliZ(1))
print(circuit(pnp.pi/3, shots=1024))
For anyone else looking at this post please note that if you’re using real hardware you must specify the number of wires that the machine actually has. Otherwise you’ll get the “qubits observables allocation” error mentioned by Christophe. So if, for example, you want to use hardware with 127 qubits you must specify wires=127
at the moment of creating your PennyLane device.
Hi @Christophe_Pere,
My colleague Ali gave this suggestion which might help you figure out the cause of your delay, which looks like a network issue.
As far as I know cProfile
doesn’t directly support network I/O delays. It’s the built-in Python profiler and useful for getting some general insights about the program but less effective for profiling I/O bounds or network-related bottlenecks. I’d suggest scalene
it’s both a memory and CPU profiler. scalene
breaks out system I/O time and can give a more fine-grained report about the SSLSocket
process.
I hope this helps!
Hi @Catalina,
Thanks for your messages. I installed and tried scalene
. The report I got is specific to the code; for this profiler, the optimizer line execution is the one which has the most critical time consumption. I don’t know with this profiler which protocols in the optimizer are the problem. With cProfile
, I got information like sleep and SSL.
For the number of qubits provided for the simulator or hardware, the result is different, 0.38. In the 0.37, using the 127 qubits I extracted a bitstring with the same number of qubits as my circuit. But with the 0.38 I got a bitstring containing all the qubits, so 127 0s and 1s even if my circuit uses 5 qubits.
Hi @CatalinaAlbornoz,
I tested your code with the plug-in pennylane-qiskit==0.37
here are the results:
13045368 function calls (12858318 primitive calls) in 46.414 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
4203/1 0.049 0.000 46.424 46.424 {built-in method builtins.exec}
1 0.000 0.000 46.424 46.424 test_latency.py:1(<module>)
71 0.018 0.000 38.047 0.536 session.py:237(request)
69 0.001 0.000 35.676 0.517 sessions.py:593(get)
71 0.002 0.000 34.412 0.485 sessions.py:500(request)
71 0.008 0.000 34.347 0.484 sessions.py:673(send)
2191 0.023 0.000 33.687 0.015 {method 'readline' of '_io.BufferedReader' objects}
237 0.003 0.000 33.667 0.142 socket.py:706(readinto)
237 0.009 0.000 33.663 0.142 ssl.py:1236(recv_into)
237 0.002 0.000 33.653 0.142 ssl.py:1094(read)
237 33.651 0.142 33.651 0.142 {method 'read' of '_ssl._SSLSocket' objects}
71 0.002 0.000 32.505 0.458 adapters.py:613(send)
71 0.003 0.000 32.463 0.457 connectionpool.py:594(urlopen)
71 0.001 0.000 32.447 0.457 connectionpool.py:379(_make_request)
71 0.006 0.000 32.069 0.452 connection.py:438(getresponse)
71 0.001 0.000 32.023 0.451 client.py:1384(getresponse)
71 0.005 0.000 32.021 0.451 client.py:324(begin)
71 0.003 0.000 31.908 0.449 client.py:291(_read_status)
1 0.000 0.000 30.612 30.612 qnode.py:1017(__call__)
1 0.000 0.000 30.612 30.612 qnode.py:983(_impl_call)
1 0.000 0.000 30.612 30.612 qnode.py:900(_execution_component)
1 0.000 0.000 30.611 30.611 execution.py:367(execute)
118/116 0.000 0.000 30.610 0.264 tracer.py:35(f_wrapped)
1 0.000 0.000 30.610 30.610 autograd.py:101(autograd_execute)
1 0.000 0.000 30.610 30.610 autograd.py:150(_execute)
1 0.000 0.000 30.610 30.610 execution.py:191(inner_execute)
5 0.000 0.000 30.610 6.122 contextlib.py:78(inner)
1 0.000 0.000 30.609 30.609 single_tape_support.py:26(execute)
1 0.000 0.000 30.609 30.609 legacy_facade.py:443(execute)
1 0.000 0.000 30.609 30.609 qiskit_device.py:488(batch_execute)
2 0.000 0.000 28.042 14.021 runtime_job.py:136(result)
2 0.001 0.001 27.258 13.629 runtime_job.py:314(wait_for_final_state)
52 0.000 0.000 22.029 0.424 runtime_job.py:187(status)
52 0.002 0.000 22.029 0.424 base_runtime_job.py:204(_set_status_and_error_message)
51 0.001 0.000 22.026 0.432 runtime.py:106(job_get)
51 0.001 0.000 22.022 0.432 program_job.py:47(get)
481 0.007 0.000 10.671 0.022 __init__.py:1(<module>)
1 0.000 0.000 5.787 5.787 qiskit_runtime_service.py:69(__init__)
50 5.227 0.105 5.227 0.105 {built-in method time.sleep}
1 0.000 0.000 4.828 4.828 device_constructor.py:59(device)
1 0.000 0.000 4.305 4.305 remote.py:42(__init__)
1 0.000 0.000 4.305 4.305 qiskit_device.py:151(__init__)
21 0.001 0.000 4.304 0.205 ibm_backend.py:330(target)
2 0.000 0.000 4.304 2.152 backend.py:402(num_qubits)
71 0.000 0.000 3.562 0.050 inspect.py:1775(stack)
71 0.016 0.000 3.561 0.050 inspect.py:1745(getouterframes)
1778 0.037 0.000 3.539 0.002 inspect.py:1684(getframeinfo)
8 0.000 0.000 3.452 0.432 hub_group_project.py:52(backends)
6 0.000 0.000 3.452 0.575 hub_group_project.py:72(_discover_remote_backends)
6 0.000 0.000 3.452 0.575 runtime.py:277(list_backends)
...
As you can see, the results are the same, SSL
and time.sleep
are two protocols requiring most of the computational time.
Test realized on ibm_quebec
.
Full output:
test_cProfile_cat.txt (1.1 MB)
Hi @Christophe_Pere , thank you!
We’ll keep investigating.