Memory error: am I memorizing all the gate instances?

Hi! I’m writing here because I’m having a problem with a custom gate involving 14 qubits, which I would like to apply several times (on different layers). When I try to apply it, I get an error message about allocated memory. The gate is actually very large, but I noticed that for the first layers it doesn’t cause problems. After about eight layers I get the error message instead. It therefore seems to me that the algorithm memorizes all the gate parameters every time. But I’m not interested in memorizing the entire circuit, but rather just the state vector resulting at each layer. Is my guess correct? How can I solve the problem? Do you have any further advice for me? Here’s the code and errors:

PennyLane version: 0.32.0
Python version: 3.8.8
Numpy version: 1.23.5

import pennylane as qml
import numpy as np

class QuantumDevice:
    def __init__(self, n_qubits):
        self.n_qubits = n_qubits
        self.dev = self._initialize_device()
        self.qnode = QNodeWrapper(self, self.n_qubits)

    def _initialize_device(self):
        return qml.device('default.qubit', wires=self.n_qubits) 
 
    def set_n_qubits(self, n_qubits):
        self.n_qubits = n_qubits
        self.dev = self._initialize_device()     


class QNodeWrapper:
    def __init__(self, device, n_qubits):
        self.device = device
        self.n_qubits = n_qubits
        self.qnode = qml.QNode(self.adiabatic_evolution, self.device.dev)
        self.T = 10
        self.dt = 0.01

    def evaluate(self, x, mu, sigma):
        return self.qnode(x, mu, sigma)

    def gate(self, δμ):

        n_qubits = self.n_qubits
        n_control_qubits = self.n_control_qubits
        n_state_qubits = n_qubits - n_control_qubits
        dt = self.dt
        T = self.T
        N = int(2**n_control_qubits) 

        M = [None]*N

        for k in range(N):
            matrices = [np.array([[np.exp(1j*δμ[i][k]*dt**2/T),    0                               ],
                       [0,                                np.exp(-1j*δμ[i][k]*dt**2/T)  ]]) for i in range(n_state_qubits) ]
        
            M[k] = matrices[0]
            for i in range(1, len(matrices)):
                M[k] = np.kron(M[k], matrices[i])

        # Create the block-diagonal matrix
        dimension = M[0].shape[0]
        D = np.zeros((N * dimension, N * dimension))

        for i in range(N):
            D[i * dimension:(i + 1) * dimension, i * dimension:(i + 1) * dimension] = M[i]

        return D

    def adiabatic_evolution(self, x, mu, sigma):
        
        n_qubits = self.n_qubits
        n_control_qubits = 7
        self.n_control_qubits = n_control_qubits
        n_state_qubits = n_qubits - n_control_qubits
        num_layers = int(self.T/self.dt)

        δμ = np.random.uniform(low= - np.sqrt(3) * sigma, high= + np.sqrt(3) * sigma, size=(n_state_qubits, 2**n_control_qubits))

        for layer in range(num_layers):
            print('layer = ' + str(layer))
            qml.QubitUnitary(self.gate(δμ), wires=range(n_qubits))

        return qml.expval((qml.PauliZ(n_qubits-1)))



quantum_device = QuantumDevice(n_qubits = 14)

mu = 0.
sigma = 0.
features = np.random.rand(2**14)

quantum_device.qnode.evaluate(features, mu, sigma)

Output:

layer = 0
layer = 1
<ipython-input-2-bc9040da9ae1>:51: ComplexWarning: Casting complex values to real discards the imaginary part
  D[i * dimension:(i + 1) * dimension, i * dimension:(i + 1) * dimension] = M[i]
layer = 2
layer = 3
layer = 4
layer = 5
layer = 6
layer = 7
layer = 8
layer = 9

And, finally, the error message:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-3-9604ab584822> in <module>
      5 features = np.random.rand(2**14)
      6 
----> 7 quantum_device.qnode.evaluate(features, mu, sigma)

<ipython-input-2-bc9040da9ae1> in evaluate(self, x, mu, sigma)
     22 
     23     def evaluate(self, x, mu, sigma):
---> 24         return self.qnode(x, mu, sigma)
     25 
     26     def gate(self, δμ, layer):

c:\ProgramData\Anaconda3\lib\site-packages\pennylane\qnode.py in __call__(self, *args, **kwargs)
    972 
    973         # construct the tape
--> 974         self.construct(args, kwargs)
    975 
    976         cache = self.execute_kwargs.get("cache", False)

c:\ProgramData\Anaconda3\lib\site-packages\pennylane\qnode.py in construct(self, args, kwargs)
    870             self.interface = qml.math.get_interface(*args, *list(kwargs.values()))
    871 
--> 872         self._tape = make_qscript(self.func, shots)(*args, **kwargs)
    873         self._qfunc_output = self.tape._qfunc_output
    874 

c:\ProgramData\Anaconda3\lib\site-packages\pennylane\tape\qscript.py in wrapper(*args, **kwargs)
   1529     def wrapper(*args, **kwargs):
   1530         with AnnotatedQueue() as q:
-> 1531             result = fn(*args, **kwargs)
   1532 
   1533         qscript = QuantumScript.from_queue(q, shots)

<ipython-input-2-bc9040da9ae1> in adiabatic_evolution(self, x, mu, sigma)
     68         for layer in range(num_layers):
     69             print('layer = ' + str(layer))
---> 70             qml.QubitUnitary(self.gate(δμ, layer), wires=range(n_qubits))
     71 
     72         return qml.expval((qml.PauliZ(n_qubits-1)))

<ipython-input-2-bc9040da9ae1> in gate(self, δμ, layer)
     46         # Create the block-diagonal matrix
     47         dimension = M[0].shape[0]
---> 48         D = np.zeros((N * dimension, N * dimension))
     49 
     50         for i in range(N):

MemoryError: Unable to allocate 2.00 GiB for an array with shape (16384, 16384) and data type float64

Hi @kBoltzmann,

It does look like there’s a memory overhead here. Let me see if someone in our team has any insights about this. In the meantime, here are some suggestions that might be helpful for you:

It doesn’t look like you’re using the standard way for creating a new device. You can look at the current recommended way here. Note that we’re evolving our device API soon so if you’re interested let me know and I can provide more advice on what to do.

If you prefer not to inherit from an existing device you could try following our guide on creating custom operators, inheriting from qml.operation.Operation.

If you prefer not to do that either then maybe you can try using circuit cutting. Depending on the circuit it can sometimes reduce your memory needs. You can learn more about circuit cutting in this demo or the compilation section of the documentation. For example, if you add qml.WireCut(wires=range(n_qubits)) within each layer and right after you add your qml.QubitUnitary, this will perform a cut on all wires after every layer and potentially reduce your memory needs.

Let me know if any of my suggestions help you!

Hey @kBoltzmann!

The issue seems to be resolved by making this change in your adiabatic_evolution method:

    def adiabatic_evolution(self, x, mu, sigma):
        
        n_qubits = self.n_qubits
        n_control_qubits = 7
        self.n_control_qubits = n_control_qubits
        n_state_qubits = n_qubits - n_control_qubits
        num_layers = int(self.T/self.dt)

        δμ = np.random.uniform(low= - np.sqrt(3) * sigma, high= + np.sqrt(3) * sigma, size=(n_state_qubits, 2**n_control_qubits))

        D = self.gate(δμ)

        for layer in range(num_layers):
            qml.QubitUnitary(D, wires=range(n_qubits))

        return qml.expval((qml.PauliZ(n_qubits-1)))

You were originally re-computing self.gate(δμ) like this:

            qml.QubitUnitary(self.gate(δμ), wires=range(n_qubits))

I think the error you were seeing was an artifact of calling the function too many times and having to allocate more and more memory to simply compute D. Let me know if that helps!

1 Like

Thank you, you are right. But I’ve not been totally sincere. Actually, the definition of the gate I showed before is not the one I am truly concern to use. The difference is that my truly gate depends on the layer in:

matrices = [np.array([[np.exp(1j*δμ[i][k]*dt**2/T*layer),    0                               ],
                       [0,                                np.exp(-1j*δμ[i][k]*dt**2/T*layer)  ]]) for i in range(n_state_qubits) ]

Therefore I need to recalculate D at each iteration of the for loop depending on the current layer, since it also depend on the variable ‘layer’. I didn’t write the original gate here because I wanted to highlight the fact that the code seems to work for about ten layers in a row, regardless of what they are (and therefore the problem I have doesn’t depend on the layer number, only on the quantity).
In any case, I also think that the problem here resides in the allocation of more and more memory each time I compute D. Is there a way to “forget” the previously calculated gate to free up memory space for the next gate?

(Should I edit the original post?)

Hi @kBoltzmann ,

You don’t need to edit the original post. Let us check if there’s a way to forget the previous value. We’ll come back to you soon.

1 Like

@kBoltzmann thanks for the clarification! I’ll get back to you next week to see if there’s a solution. But, keep in mind that what you’re asking to do is quite large (creating a 2^{14} \times 2^{14} matrix, giving that to QubitUnitary, repeating 1000 times, and doing statevector simulation). I’m not saying that this isn’t (or shouldn’t be) possible in PennyLane — there could very well be something under the hood that we’re doing that’s causing this error :slight_smile:.

Will get back to you soon!

1 Like

Hey @kBoltzmann! I don’t have an update at the moment, but just wanted to let you know that we’re still looking into it. Will provide an update ASAP!

1 Like

Update!

It doesn’t have anything to do with PennyLane :sweat_smile:. A unitary matrix created from calling your self.gate function needs new memory allocated to it every time it’s called. It’s akin to this:

matrices = []
for _ in range(layers):
    matrices.append(create_matrix())

Every element of matrices must have its own place in memory. So, if it’s the case that self.gate must be given as the input to QubitUnitary for every layer, then you will hit a memory-allocation wall eventually.

One of our devs recommended that you create a custom operator in PennyLane. This will defer calculating the matrix until it’s needed for the simulation. You might still run into the same problem, though.

Let me know if that helps!

1 Like

Isn’t there a more immediate way to forget the matrix instances from time to time?

In order to create a custom operator in this way, should I decompose it in basic operations? Or can I continue to simply define the matrix form?

Isn’t there a more immediate way to forget the matrix instances from time to time?

I’m afraid not :sweat_smile:. That’s just how memory allocation works under the hood. Every time something new is created at runtime, it needs to go somewhere!

In order to create a custom operator in this way, should I decompose it in basic operations? Or can I continue to simply define the matrix form?

It doesn’t have to be basic operations, just any Operator that is supported on the device you want to run your circuit on :slight_smile:. For instance, you could put a qml.QFT instance in your custom operators decomposition and that’ll work!

1 Like