Long execution on a real IBM machine due to SSLSocket

Hi there,

I developed a new variational algorithm during my research. I tested it on simulators (CPU, GPU) and a Quantum System One 127 qubits (IBM) using the plugin pennylane-qiskit version 0.37 (0.38 gave me a bug with the qubits observables allocation).
Currently, my code runs for 4000 iterations with 5 qubits in 1-2 hours, depending on the server (number of parallel executions). When I want to use the IBM machine, I’m stuck in a very long execution; 20 iterations are executed in 3 hours (I have no queue on the machine - private system).
I used cProfile to analyze where the code is running low to understand where I must modify.
The results gave me this (in short):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
   28745 2020.299    0.070 2020.299    0.070 {method 'read' of '_ssl._SSLSocket' objects}
...
     5845  608.242    0.104  608.242    0.104 {built-in method time.sleep}

I did 4 iterations in ~50 minutes, and 33 were allocated for a SSLSocket protocol. I’m not an infra guy, so what does it mean and can I bypass it? There is also a strange line where a sleep protocol uses 10 minutes!!

And, finally, make sure to include the versions of your packages. Specifically, show us the output of qml.about().

Name: PennyLane
Version: 0.39.0.dev7
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: /Users/christophe_pere/pennylane-no-probs/lib/python3.12/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, toml, typing-extensions
Required-by: PennyLane-qiskit, PennyLane_Lightning

Platform info:           macOS-15.0-arm64-arm-64bit
Python version:          3.12.5
Numpy version:           1.26.4
Scipy version:           1.14.1
Installed devices:
- qiskit.aer (PennyLane-qiskit-0.37.0)
- qiskit.basicaer (PennyLane-qiskit-0.37.0)
- qiskit.basicsim (PennyLane-qiskit-0.37.0)
- qiskit.ibmq (PennyLane-qiskit-0.37.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.37.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.37.0)
- qiskit.remote (PennyLane-qiskit-0.37.0)
- default.clifford (PennyLane-0.39.0.dev7)
- default.gaussian (PennyLane-0.39.0.dev7)
- default.mixed (PennyLane-0.39.0.dev7)
- default.qubit (PennyLane-0.39.0.dev7)
- default.qubit.autograd (PennyLane-0.39.0.dev7)
- default.qubit.jax (PennyLane-0.39.0.dev7)
- default.qubit.legacy (PennyLane-0.39.0.dev7)
- default.qubit.tf (PennyLane-0.39.0.dev7)
- default.qubit.torch (PennyLane-0.39.0.dev7)
- default.qutrit (PennyLane-0.39.0.dev7)
- default.qutrit.mixed (PennyLane-0.39.0.dev7)
- default.tensor (PennyLane-0.39.0.dev7)
- null.qubit (PennyLane-0.39.0.dev7)
- lightning.qubit (PennyLane_Lightning-0.38.0)

I’m using the branch 0.39 no-probs due to the qubit simulation of the optimizers QNG and QN-SPSA.

Hi @Christophe_Pere !

Thank you for posting this here, we’re investigating the time delays you mention.
I’m unable to replicate this at the moment. Do you get the same time delays when running the following code using the machine you have access to?

import pennylane as qml
from pennylane import numpy as pnp
from qiskit_ibm_runtime import QiskitRuntimeService

QiskitRuntimeService.save_account(channel="ibm_quantum", token=token, overwrite=True)
service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)

dev = qml.device('qiskit.remote', wires=127, backend=backend)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=[0])
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(1))

print(circuit(pnp.pi/3, shots=1024))

For anyone else looking at this post please note that if you’re using real hardware you must specify the number of wires that the machine actually has. Otherwise you’ll get the “qubits observables allocation” error mentioned by Christophe. So if, for example, you want to use hardware with 127 qubits you must specify wires=127 at the moment of creating your PennyLane device.

Hi @Christophe_Pere,

My colleague Ali gave this suggestion which might help you figure out the cause of your delay, which looks like a network issue.

As far as I know cProfile doesn’t directly support network I/O delays. It’s the built-in Python profiler and useful for getting some general insights about the program but less effective for profiling I/O bounds or network-related bottlenecks. I’d suggest scalene it’s both a memory and CPU profiler. scalene breaks out system I/O time and can give a more fine-grained report about the SSLSocket process.

I hope this helps!

Hi @Catalina,
Thanks for your messages. I installed and tried scalene. The report I got is specific to the code; for this profiler, the optimizer line execution is the one which has the most critical time consumption. I don’t know with this profiler which protocols in the optimizer are the problem. With cProfile, I got information like sleep and SSL.
For the number of qubits provided for the simulator or hardware, the result is different, 0.38. In the 0.37, using the 127 qubits I extracted a bitstring with the same number of qubits as my circuit. But with the 0.38 I got a bitstring containing all the qubits, so 127 0s and 1s even if my circuit uses 5 qubits.

Hi @CatalinaAlbornoz,

I tested your code with the plug-in pennylane-qiskit==0.37 here are the results:

         13045368 function calls (12858318 primitive calls) in 46.414 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   4203/1    0.049    0.000   46.424   46.424 {built-in method builtins.exec}
        1    0.000    0.000   46.424   46.424 test_latency.py:1(<module>)
       71    0.018    0.000   38.047    0.536 session.py:237(request)
       69    0.001    0.000   35.676    0.517 sessions.py:593(get)
       71    0.002    0.000   34.412    0.485 sessions.py:500(request)
       71    0.008    0.000   34.347    0.484 sessions.py:673(send)
     2191    0.023    0.000   33.687    0.015 {method 'readline' of '_io.BufferedReader' objects}
      237    0.003    0.000   33.667    0.142 socket.py:706(readinto)
      237    0.009    0.000   33.663    0.142 ssl.py:1236(recv_into)
      237    0.002    0.000   33.653    0.142 ssl.py:1094(read)
      237   33.651    0.142   33.651    0.142 {method 'read' of '_ssl._SSLSocket' objects}
       71    0.002    0.000   32.505    0.458 adapters.py:613(send)
       71    0.003    0.000   32.463    0.457 connectionpool.py:594(urlopen)
       71    0.001    0.000   32.447    0.457 connectionpool.py:379(_make_request)
       71    0.006    0.000   32.069    0.452 connection.py:438(getresponse)
       71    0.001    0.000   32.023    0.451 client.py:1384(getresponse)
       71    0.005    0.000   32.021    0.451 client.py:324(begin)
       71    0.003    0.000   31.908    0.449 client.py:291(_read_status)
        1    0.000    0.000   30.612   30.612 qnode.py:1017(__call__)
        1    0.000    0.000   30.612   30.612 qnode.py:983(_impl_call)
        1    0.000    0.000   30.612   30.612 qnode.py:900(_execution_component)
        1    0.000    0.000   30.611   30.611 execution.py:367(execute)
  118/116    0.000    0.000   30.610    0.264 tracer.py:35(f_wrapped)
        1    0.000    0.000   30.610   30.610 autograd.py:101(autograd_execute)
        1    0.000    0.000   30.610   30.610 autograd.py:150(_execute)
        1    0.000    0.000   30.610   30.610 execution.py:191(inner_execute)
        5    0.000    0.000   30.610    6.122 contextlib.py:78(inner)
        1    0.000    0.000   30.609   30.609 single_tape_support.py:26(execute)
        1    0.000    0.000   30.609   30.609 legacy_facade.py:443(execute)
        1    0.000    0.000   30.609   30.609 qiskit_device.py:488(batch_execute)
        2    0.000    0.000   28.042   14.021 runtime_job.py:136(result)
        2    0.001    0.001   27.258   13.629 runtime_job.py:314(wait_for_final_state)
       52    0.000    0.000   22.029    0.424 runtime_job.py:187(status)
       52    0.002    0.000   22.029    0.424 base_runtime_job.py:204(_set_status_and_error_message)
       51    0.001    0.000   22.026    0.432 runtime.py:106(job_get)
       51    0.001    0.000   22.022    0.432 program_job.py:47(get)
      481    0.007    0.000   10.671    0.022 __init__.py:1(<module>)
        1    0.000    0.000    5.787    5.787 qiskit_runtime_service.py:69(__init__)
       50    5.227    0.105    5.227    0.105 {built-in method time.sleep}
        1    0.000    0.000    4.828    4.828 device_constructor.py:59(device)
        1    0.000    0.000    4.305    4.305 remote.py:42(__init__)
        1    0.000    0.000    4.305    4.305 qiskit_device.py:151(__init__)
       21    0.001    0.000    4.304    0.205 ibm_backend.py:330(target)
        2    0.000    0.000    4.304    2.152 backend.py:402(num_qubits)
       71    0.000    0.000    3.562    0.050 inspect.py:1775(stack)
       71    0.016    0.000    3.561    0.050 inspect.py:1745(getouterframes)
     1778    0.037    0.000    3.539    0.002 inspect.py:1684(getframeinfo)
        8    0.000    0.000    3.452    0.432 hub_group_project.py:52(backends)
        6    0.000    0.000    3.452    0.575 hub_group_project.py:72(_discover_remote_backends)
        6    0.000    0.000    3.452    0.575 runtime.py:277(list_backends)
...

As you can see, the results are the same, SSL and time.sleep are two protocols requiring most of the computational time.
Test realized on ibm_quebec.

Full output:
test_cProfile_cat.txt (1.1 MB)

Hi @Christophe_Pere , thank you!

We’ll keep investigating.