Mid-circuit measurement: slow and memory explosion

Hi,
I’m looking to implement a complex absorbing potential, as described in this article: Efficient solution of the non-unitary time-dependent Schrodinger equation on a quantum computer with complex absorbing potential – Quantum.
To achieve this, I need to apply non-unitary operations, which requires using a mid-circuit measurement on an ancilla qubit. However, I’m encountering two main issues:

  • The mid-circuit measurements significantly slow down the simulation. When I comment out the mid-circuit measurement line, the calculation runs 100 times faster.
  • I quickly receive a memory overflow error when I increase the number of propagation step. Again, when I comment out the mid-circuit measurement line, the error message does not appear anymore.

I’m wondering if there might be something about mid-circuit measurement that I am overlooking.

Thanks in advance!

Adrien

Here it is my code:

# Import package
import pennylane as qml
import numpy as np
import scipy

# Building position grid
n_pos=7
n_qubit=n_pos+1
x_max=100
x=np.linspace(-x_max/2,x_max/2,2**n_pos)

# Creation of the k-grid
k=2*np.pi*scipy.fftpack.fftfreq(2**n_pos,delta_x)
k=scipy.fftpack.fftshift(k)

# Preparation of the initial Gaussian Wave Packet

def state_vector_gaussian(x,x_0,delta,p_0):
    # x_0 : initial position of the wavepacket
    # delta: width of the wavepacket
    #p_0 : initial momentum of the wavepacket
    phi_0=(1/(2*np.pi*delta**2))**(1/4)*np.exp(-(x-x_0)**2/(4*delta**2))*np.exp(p_0*(x-x_0)*1j)
    return phi_0

Ini_gauss=state_vector_gaussian(x,0,2,0)

# Normalizing Initial Wavepacket
Ini_gauss=Ini_gauss/np.linalg.norm(Ini_gauss)

#Kinetic Energy Operator in the momentum representation
m=1
T=(k)**2/(2*m)*np.eye(2**n_pos)

# Complex absorbing potential
def CAP(x,xc,eta,b):
    W = eta*abs(x+xc)**b*np.heaviside(-(x+xc),0)+eta*abs(x-xc)**b*np.heaviside((x-xc),0)
    return W
xc=45
eta=1
b=2
W=CAP(x,xc,eta,b)
W=W*np.eye(2**n_pos)

# Wavepacket propagation
dev = qml.device("default.qubit", wires=n_qubit)

@qml.qnode(dev)
def Propagation_CAP(ini_state,T,W,N_step,delta_t):

    #Building matrice of propagator
    Prop_T=scipy.linalg.expm(-T*delta_t*1j)

    #Building dilation matrix
    Diag=scipy.linalg.expm(W*(-delta_t))
    Off_diag=scipy.linalg.sqrtm(np.eye(2**n_pos)-scipy.linalg.expm(W*(-2*delta_t)))
    U_dial=np.kron(np.array([[1,0],[0,-1]]),Diag)+np.kron(np.array([[0,1],[1,0]]),Off_diag)

    # Preparation of initial gaussian
    qml.MottonenStatePreparation(ini_state,wires=range(1,n_qubit))

    #Propagation loop
    for i in range(N_step):
        qml.PauliX(1)
        qml.QFT(wires=range(1,n_qubit))
        qml.PauliX(1)
        qml.QubitUnitary(Prop_T,wires=range(1,n_qubit))
        qml.PauliX(1)
        qml.adjoint(qml.QFT)(wires=range(1,n_qubit))
        qml.PauliX(1)
        qml.QubitUnitary(U_dial,wires=range(n_qubit))
        qml.measure(0, postselect=0)
    return qml.probs(wires=range(1,n_qubit))

delta_t=0.1
State=Propagation_CAP(Ini_gauss,T,W,15,delta_t)

The full error message is

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[90], line 1
----> 1 State=Propagation_CAP(Ini_gauss,T,W,50,delta_t)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\workflow\qnode.py:1020, in QNode.__call__(self, *args, **kwargs)
   1018 if qml.capture.enabled():
   1019     return qml.capture.qnode_call(self, *args, **kwargs)
-> 1020 return self._impl_call(*args, **kwargs)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\workflow\qnode.py:1008, in QNode._impl_call(self, *args, **kwargs)
   1005 self._update_gradient_fn(shots=override_shots, tape=self._tape)
   1007 try:
-> 1008     res = self._execution_component(args, kwargs, override_shots=override_shots)
   1009 finally:
   1010     if old_interface == "auto":

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\workflow\qnode.py:957, in QNode._execution_component(self, args, kwargs, override_shots)
    951     warnings.filterwarnings(
    952         action="ignore",
    953         message=r".*argument is deprecated and will be removed in version 0.39.*",
    954         category=qml.PennyLaneDeprecationWarning,
    955     )
    956     # pylint: disable=unexpected-keyword-arg
--> 957     res = qml.execute(
    958         (self._tape,),
    959         device=self.device,
    960         gradient_fn=self.gradient_fn,
    961         interface=self.interface,
    962         transform_program=full_transform_program,
    963         inner_transform=inner_transform_program,
    964         config=config,
    965         gradient_kwargs=self.gradient_kwargs,
    966         override_shots=override_shots,
    967         **execute_kwargs,
    968     )
    969 res = res[0]
    971 # convert result to the interface in case the qfunc has no parameters

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\workflow\execution.py:660, in execute(tapes, device, gradient_fn, interface, transform_program, inner_transform, config, grad_on_execution, gradient_kwargs, cache, cachesize, max_diff, override_shots, expand_fn, max_expansion, device_batch_transform, device_vjp, mcm_config)
    658 # Exiting early if we do not need to deal with an interface boundary
    659 if no_interface_boundary_required:
--> 660     results = inner_execute(tapes)
    661     return post_processing(results)
    663 if (
    664     device_vjp
    665     and getattr(device, "short_name", "") in ("lightning.gpu", "lightning.kokkos")
    666     and interface in jpc_interfaces
    667 ):  # pragma: no cover

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\workflow\execution.py:212, in _make_inner_execute.<locals>.inner_execute(tapes, **_)
    209 transformed_tapes, transform_post_processing = transform_program(tapes)
    211 if transformed_tapes:
--> 212     results = device.execute(transformed_tapes, execution_config=execution_config)
    213 else:
    214     results = ()

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\modifiers\simulator_tracking.py:30, in _track_execute.<locals>.execute(self, circuits, execution_config)
     28 @wraps(untracked_execute)
     29 def execute(self, circuits, execution_config=DefaultExecutionConfig):
---> 30     results = untracked_execute(self, circuits, execution_config)
     31     if isinstance(circuits, QuantumScript):
     32         batch = (circuits,)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\modifiers\single_tape_support.py:32, in _make_execute.<locals>.execute(self, circuits, execution_config)
     30     is_single_circuit = True
     31     circuits = (circuits,)
---> 32 results = batch_execute(self, circuits, execution_config)
     33 return results[0] if is_single_circuit else results

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\logging\decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\default_qubit.py:630, in DefaultQubit.execute(self, circuits, execution_config)
    627 prng_keys = [self.get_prng_keys()[0] for _ in range(len(circuits))]
    629 if max_workers is None:
--> 630     return tuple(
    631         _simulate_wrapper(
    632             c,
    633             {
    634                 "rng": self._rng,
    635                 "debugger": self._debugger,
    636                 "interface": interface,
    637                 "state_cache": self._state_cache,
    638                 "prng_key": _key,
    639                 "mcm_method": execution_config.mcm_config.mcm_method,
    640                 "postselect_mode": execution_config.mcm_config.postselect_mode,
    641             },
    642         )
    643         for c, _key in zip(circuits, prng_keys)
    644     )
    646 vanilla_circuits = convert_to_numpy_parameters(circuits)[0]
    647 seeds = self._rng.integers(2**31 - 1, size=len(vanilla_circuits))

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\default_qubit.py:631, in <genexpr>(.0)
    627 prng_keys = [self.get_prng_keys()[0] for _ in range(len(circuits))]
    629 if max_workers is None:
    630     return tuple(
--> 631         _simulate_wrapper(
    632             c,
    633             {
    634                 "rng": self._rng,
    635                 "debugger": self._debugger,
    636                 "interface": interface,
    637                 "state_cache": self._state_cache,
    638                 "prng_key": _key,
    639                 "mcm_method": execution_config.mcm_config.mcm_method,
    640                 "postselect_mode": execution_config.mcm_config.postselect_mode,
    641             },
    642         )
    643         for c, _key in zip(circuits, prng_keys)
    644     )
    646 vanilla_circuits = convert_to_numpy_parameters(circuits)[0]
    647 seeds = self._rng.integers(2**31 - 1, size=len(vanilla_circuits))

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\default_qubit.py:896, in _simulate_wrapper(circuit, kwargs)
    895 def _simulate_wrapper(circuit, kwargs):
--> 896     return simulate(circuit, **kwargs)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\logging\decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\qubit\simulate.py:379, in simulate(circuit, debugger, state_cache, **execution_kwargs)
    376     return tuple(results)
    378 ops_key, meas_key = jax_random_split(prng_key)
--> 379 state, is_state_batched = get_final_state(
    380     circuit, debugger=debugger, prng_key=ops_key, **execution_kwargs
    381 )
    382 if state_cache is not None:
    383     state_cache[circuit.hash] = state

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\logging\decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\qubit\simulate.py:199, in get_final_state(circuit, debugger, **execution_kwargs)
    196 if len(circuit) > 0 and isinstance(circuit[0], qml.operation.StatePrepBase):
    197     prep = circuit[0]
--> 199 state = create_initial_state(sorted(circuit.op_wires), prep, like=INTERFACE_TO_LIKE[interface])
    201 # initial state is batched only if the state preparation (if it exists) is batched
    202 is_state_batched = bool(prep and prep.batch_size is not None)

File ~\anaconda3\envs\Pen\Lib\site-packages\pennylane\devices\qubit\initialize_state.py:43, in create_initial_state(wires, prep_operation, like)
     41 if not prep_operation:
     42     num_wires = len(wires)
---> 43     state = np.zeros((2,) * num_wires, dtype=complex)
     44     state[(0,) * num_wires] = 1
     45     return qml.math.asarray(state, like=like)

MemoryError: Unable to allocate 4.00 EiB for an array with shape (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) and data type complex128

Here it is the version that I use:

Name: PennyLane
Version: 0.38.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: C:\Users\adrid\anaconda3\envs\Pen\Lib\site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, toml, typing-extensions
Required-by: PennyLane_Lightning

Platform info:           Windows-10-10.0.22631-SP0
Python version:          3.11.10
Numpy version:           2.1.2
Scipy version:           1.14.1
Installed devices:
- default.clifford (PennyLane-0.38.0)
- default.gaussian (PennyLane-0.38.0)
- default.mixed (PennyLane-0.38.0)
- default.qubit (PennyLane-0.38.0)
- default.qubit.autograd (PennyLane-0.38.0)
- default.qubit.jax (PennyLane-0.38.0)
- default.qubit.legacy (PennyLane-0.38.0)
- default.qubit.tf (PennyLane-0.38.0)
- default.qubit.torch (PennyLane-0.38.0)
- default.qutrit (PennyLane-0.38.0)
- default.qutrit.mixed (PennyLane-0.38.0)
- default.tensor (PennyLane-0.38.0)
- null.qubit (PennyLane-0.38.0)
- lightning.qubit (PennyLane_Lightning-0.38.0)

Hi @adrien_dev , welcome to the Forum!

It’s normal for it to be slower using mid-circuit measurements. PennyLane currently offers three methods to simulate mid-circuit measurements on classical computers: the deferred measurements principle, dynamic one-shot sampling, and a tree-traversal approach. These methods differ in their memory requirements and computational cost, as well as their compatibility with other features such as shots and differentiation methods. Check the table here to see the details.

In your case, since you’re not using shots, the deferred measurements method is being used by default. This method adds an auxiliary qubit to the circuit for each mid-circuit measurement, leading to overheads of both memory and simulation time that scale exponentially with the number of measurements.

What I see in your code is that you’re adding one mid-circuit measurement per step. In the code you shared that would mean 15 mid-circuit measurements. This means that you’re inadvertently adding 15 extra qubits into your code.

I would recommend in this case that you change the mcm_method. If you’re ok with adding shots for example you can add the one-shot method as shown below.

dev = qml.device("default.qubit",shots=10)

@qml.qnode(dev, mcm_method="one-shot")

Let me know if this works for you.

Thank you Catalina!
That solves the memory issue.
It’s still a bit slow, but as you mentioned, that’s normal, and I can work on optimizing it.

1 Like

That’s great to hear @adrien_dev !