Out of memory after upgrading to 0.26

We recently upgraded from PennyLane v0.24 to v0.26 and our QAOA code running on lightning.gpu now returns this error:

2022-10-05 11:01:02,128 [ERROR] Error during benchmark run: Unable to allocate 16.0 PiB for an array with shape (33554432, 33554432) and data type complex128
Traceback (most recent call last):
  File "/home/user1/dev/PennylaneQAOA.py", line 308, in run
    params, cost_before = optimizer.step_and_cost(cost_function, params)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/optimize/gradient_descent.py", line 59, in step_and_cost
    g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/optimize/gradient_descent.py", line 117, in compute_grad
    grad = g(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_grad.py", line 115, in __call__
    grad_value, ans = grad_fn(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_grad.py", line 133, in _grad_with_forward
    vjp, ans = _make_vjp(fun, x)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/core.py", line 10, in make_vjp
    end_value, end_node =  trace(start_node, fun, x)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 10, in trace
    end_box = fun(start_box)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/wrap_util.py", line 15, in unary_f
    return fun(*subargs, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/qnode.py", line 661, in __call__
    res = qml.execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/execution.py", line 443, in execute
    res = _execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/autograd.py", line 66, in execute
    return _execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 44, in f_wrapped
    ans = f_wrapped(*argvals, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 48, in f_wrapped
    return f_raw(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/autograd.py", line 110, in _execute
    res, jacs = execute_fn(tapes, **gradient_kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_device.py", line 566, in execute_and_gradients
    res.append(self.batch_execute([circuit])[0])
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_qubit_device.py", line 586, in batch_execute
    res = self.execute(circuit)
  File "/home/user1/dev/quark/src/solvers/PennylaneQAOA.py", line 419, in ret_fun
    returned_value = fun(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_qubit_device.py", line 330, in execute
    results = self.statistics(circuit.observables, circuit=circuit)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 164, in statistics
    return super().statistics(observables, shot_range, bin_size, circuit)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_qubit_device.py", line 740, in statistics
    results.append(self.expval(obs, shot_range=shot_range, bin_size=bin_size))
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 436, in expval
    device_wires, qml.matrix(observable).ravel(order="C")
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/transforms/op_transforms.py", line 213, in __call__
    return self._create_wrapper(obj, *targs, **tkwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/transforms/op_transforms.py", line 412, in _create_wrapper
    wrapper = self.fn(obj, *targs, **tkwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/transforms/op_transforms.py", line 274, in fn
    raise e1 from None
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/transforms/op_transforms.py", line 258, in fn
    return self._fn(obj, *args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/ops/functions/matrix.py", line 125, in matrix
    return qml.utils.sparse_hamiltonian(op, wires=wire_order).toarray()
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/scipy/sparse/_compressed.py", line 1051, in toarray
    out = self._process_toarray_args(order, out)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/scipy/sparse/_base.py", line 1291, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 16.0 PiB for an array with shape (33554432, 33554432) and data type complex128
2022-10-05 11:01:02,139 [ERROR] Wrong number of items passed 0, placement implies 1

The same code runs fine on v0.24.
The error happens in the step_and_cost() function when we call it with a 25 qubit problem.
Is there any explanation for this?

Hey @leolettuce

Thanks for your question!

Are you running this on your own computer? If so, do you know how much RAM your CPU and GPU have? 25 qubits is quite a bit to simulate, but if it was working in v0.24 and now it isn’t, then we’ll definitely take a closer look at this.

Are you able to provide your code? I’m tagging @mlxd here as well, since he is our GPU & performance guru.

Hi @leolettuce thanks for reprting this. I’d like to ask for some information first:

  • Which gradient method are you using, “parameter-shift” or “adjoint”?
  • Are you taking the expval of a Hamiltonian as part of your circuit?
  • Can you share a minimum working example of your script to replicate the issue?

For the v0.24 version, this relied on CPU-backed qml.expval(H), which we now moved to the GPU as part of 0.26. I suspect this is due to this update in how we handle Hamiltonians in LightningGPU, but will wait until you confirm with the above.

Hi @isaacdevlugt and @mlxd!

Thank you for your quick answer.

  • I am running this on an NVIDIA DGX A100 station.
  • I use the “adjoint” method as I made the experience that it is the fastest
  • Yes I am taking the expval of a Hamiltonian
  • I provided a minimum working example below. I directly copied an example problem which is not beautiful but does its job.

Code:

import pennylane as qml
import numpy as np
from pennylane import numpy as npqml
from matplotlib import pyplot as plt
import networkx as nx
import time

depth = 1
steps = 1

J = np.array([[ 0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  2.16541777,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.0869443 ,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.29850113,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.53121915,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  2.16541777,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.0869443 ,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.29850113,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.53121915,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  2.16541777,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.0869443 ,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.29850113,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.53121915,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.16541777,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.0869443 ,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.29850113,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.53121915],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         2.16541777,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.0869443 ,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.29850113,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.53121915,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.16541777,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  0.13367576,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.16552135,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.27684974,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.16541777,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  0.13367576,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.16552135,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.27684974,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.16541777,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  0.13367576,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.16552135,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.27684974,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.16541777,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.13367576,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.16552135,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.27684974],
       [ 2.16541777,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.13367576,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.16552135,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.27684974,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.0869443 ,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.13367576,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  0.212079  ,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38660695,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.0869443 ,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.13367576,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  0.212079  ,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38660695,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.0869443 ,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.13367576,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  0.212079  ,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38660695,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.0869443 ,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.13367576,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.212079  ,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38660695],
       [ 2.0869443 ,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.13367576,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.212079  ,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38660695,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.29850113,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.16552135,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.212079  ,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  1.38249074,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.29850113,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.16552135,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.212079  ,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  1.38249074,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.29850113,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.16552135,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.212079  ,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  1.38249074,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.29850113,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.16552135,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.212079  ,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38249074],
       [ 2.29850113,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.16552135,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.212079  ,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.38249074,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.53121915,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.27684974,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38660695,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38249074,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588],
       [ 0.        ,  6.81965294,  2.53121915,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.27684974,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38660695,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38249074,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588],
       [ 0.        ,  0.        ,  6.81965294,  2.53121915,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.27684974,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38660695,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38249074,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.53121915,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.27684974,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38660695,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38249074,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588],
       [ 2.53121915,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.27684974,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38660695,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38249074,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

t = np.array([100.        , 100.        , 100.        , 100.        ,
       100.        ,  89.31876451,  89.31876451,  89.31876451,
        89.31876451,  89.31876451,  89.47444728,  89.47444728,
        89.47444728,  89.47444728,  89.47444728,  89.95301974,
        89.95301974,  89.95301974,  89.95301974,  89.95301974,
        94.99016845,  94.99016845,  94.99016845,  94.99016845,
        94.99016845])

sigzsigz_arr = np.array([[qml.PauliZ(i) @ qml.PauliZ(j) for i in range(len(J))] for j in range(len(J))])
sigz_arr = [qml.PauliZ(i) for i in range(len(t))]

cost_h = qml.Hamiltonian([*t, *J.flatten()], [*sigz_arr, *sigzsigz_arr.flatten()], simplify=True)
mixer_h = -1 * qml.qaoa.mixers.x_mixer(range(len(J)))

qubits = len(J)

def qaoa_layer(gamma, alpha):
    qml.qaoa.cost_layer(gamma, cost_h)
    qml.qaoa.mixer_layer(alpha, mixer_h)

def circuit(params, **kwargs):
    for w in range(qubits):
        qml.Hadamard(wires=w)
    qml.layer(qaoa_layer, depth, params[0], params[1])

dev = qml.device("lightning.gpu", wires=range(qubits), custom_decomps={'PauliRot': qml.PauliRot.compute_decomposition})

@qml.qnode(dev, diff_method='adjoint')
def cost_function(params):
    circuit(params)
    return qml.expval(cost_h)

optimizer = qml.GradientDescentOptimizer()
params = npqml.array([[0.5]*depth, [0.5]*depth], requires_grad=True)

print("Optimization loop started!")

for i in range(steps):
    t0 = time.time()
    params, cost_before = optimizer.step_and_cost(cost_function, params)
    t1 = time.time()
    print(f"Cost at step {i}: {cost_before} \t time: {t1-t0} seconds")

print("Optimal Parameters")
print(params)

@qml.qnode(dev)
def probability_circuit(gamma, alpha):
    circuit([gamma, alpha])
    return qml.probs(wires=range(qubits))

probs_raw = probability_circuit(params[0], params[1])
indx = np.ndindex(*[2] * qubits)
probs = {p: probs_raw[i] for i, p in enumerate(indx)}
best_bitstring = max(probs, key=probs.get)

print(f"Best bitstring: {best_bitstring} with prob: {probs[best_bitstring]}")

It is an interesting fact that you handle qml.expval(H) differently in v0.26. This might explain it as my example code works fine in v0.24.

Thanks @leolettuce it seems to be as I expected.

We are currently working on an update to avoid this intermediate matrix creation that is causing the problem. I’ll message here once we have the fix merged into the main branch of the lightning.gpu repository.

3 Likes

Hi @leolettuce

Just confirming we have a fix internally and can report your above script works fine with the changes. We ran a 10 step and 2 layer problem, which concluded with the results:

Optimal Parameters
[[5.51110721 3.28775179]
 [0.53031989 0.48307691]]
Best bitstring: (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) with prob: 1.2300820777116298e-06

We expect to have this available for use sometime early next week. However, just so we can offer the best suggestions for running your problem, do you have exclusive access to all cards on the DGX-A100 system, or are you restricted to a single card?

We have redefined how we handle terms in a Hamiltonian, and this update best supports systems with multiple GPUs running the same pipeline in parallel – it should be transparent to you, and should be enabled with some additional keyword arguments. Let us know, and we can post again with some suggestions once the fix is released.

1 Like

Hi @mlxd, that’s great news!

I have access to all cards on the DGX-A100 and would be interested in using them :slight_smile:

Hi @leolettuce

We can now confirm an updated release of Lightning GPU has been pushed to PyPI. You should be able to get the newest version by upgrading with:

python -m pip install --upgrade pennylane-lightning pennylane-lightning[gpu]

or by creating a new Python virtualenv.

For your provided example, it may be worth taking note of the feature on parallelization over observables at the end of https://docs.pennylane.ai/projects/lightning-gpu/en/latest/devices.html

I have made small modifications to your script to help with this, namely telling the device to use the observable batching over multiple GPUs (or the commented out explicit batch number if memory is an issue), and disabling the differentiability of your output probabilities to avoid the overheads there:

import pennylane as qml
import numpy as np
from pennylane import numpy as npqml
from matplotlib import pyplot as plt
import networkx as nx
import time

depth = 1
steps = 1

J = np.array([[ 0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  2.16541777,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.0869443 ,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.29850113,  0.        ,  0.        ,  0.        ,
         6.81965294,  2.53121915,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  2.16541777,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.0869443 ,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.29850113,  0.        ,  0.        ,
         0.        ,  6.81965294,  2.53121915,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  2.16541777,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.0869443 ,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.29850113,  0.        ,
         0.        ,  0.        ,  6.81965294,  2.53121915,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.16541777,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.0869443 ,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.29850113,
         0.        ,  0.        ,  0.        ,  6.81965294,  2.53121915],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         2.16541777,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.0869443 ,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.29850113,  0.        ,  0.        ,  0.        ,  6.81965294,
         2.53121915,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.16541777,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  0.13367576,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.16552135,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.27684974,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.16541777,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  0.13367576,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.16552135,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.27684974,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.16541777,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  0.13367576,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.16552135,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.27684974,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.16541777,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.13367576,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.16552135,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.27684974],
       [ 2.16541777,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.13367576,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.16552135,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.27684974,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.0869443 ,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.13367576,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  0.212079  ,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38660695,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.0869443 ,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.13367576,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  0.212079  ,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38660695,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.0869443 ,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.13367576,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  0.212079  ,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38660695,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.0869443 ,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.13367576,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.212079  ,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38660695],
       [ 2.0869443 ,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.13367576,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.212079  ,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38660695,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.29850113,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.16552135,  0.        ,  0.        ,  0.        ,
         6.81965294,  0.212079  ,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588,
         6.81965294,  1.38249074,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  6.81965294,  2.29850113,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.16552135,  0.        ,  0.        ,
         0.        ,  6.81965294,  0.212079  ,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588,
         0.        ,  6.81965294,  1.38249074,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  6.81965294,  2.29850113,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.16552135,  0.        ,
         0.        ,  0.        ,  6.81965294,  0.212079  ,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588,
         0.        ,  0.        ,  6.81965294,  1.38249074,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.29850113,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.16552135,
         0.        ,  0.        ,  0.        ,  6.81965294,  0.212079  ,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38249074],
       [ 2.29850113,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.16552135,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.212079  ,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.38249074,  0.        ,  0.        ,  0.        ,  6.81965294],
       [ 6.81965294,  2.53121915,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.27684974,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38660695,  0.        ,  0.        ,  0.        ,
         6.81965294,  1.38249074,  0.        ,  0.        ,  0.        ,
         0.        , 13.63930588, 13.63930588, 13.63930588, 13.63930588],
       [ 0.        ,  6.81965294,  2.53121915,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.27684974,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38660695,  0.        ,  0.        ,
         0.        ,  6.81965294,  1.38249074,  0.        ,  0.        ,
         0.        ,  0.        , 13.63930588, 13.63930588, 13.63930588],
       [ 0.        ,  0.        ,  6.81965294,  2.53121915,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.27684974,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38660695,  0.        ,
         0.        ,  0.        ,  6.81965294,  1.38249074,  0.        ,
         0.        ,  0.        ,  0.        , 13.63930588, 13.63930588],
       [ 0.        ,  0.        ,  0.        ,  6.81965294,  2.53121915,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.27684974,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38660695,
         0.        ,  0.        ,  0.        ,  6.81965294,  1.38249074,
         0.        ,  0.        ,  0.        ,  0.        , 13.63930588],
       [ 2.53121915,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.27684974,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38660695,  0.        ,  0.        ,  0.        ,  6.81965294,
         1.38249074,  0.        ,  0.        ,  0.        ,  6.81965294,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

t = np.array([100.        , 100.        , 100.        , 100.        ,
       100.        ,  89.31876451,  89.31876451,  89.31876451,
        89.31876451,  89.31876451,  89.47444728,  89.47444728,
        89.47444728,  89.47444728,  89.47444728,  89.95301974,
        89.95301974,  89.95301974,  89.95301974,  89.95301974,
        94.99016845,  94.99016845,  94.99016845,  94.99016845,
        94.99016845])

sigzsigz_arr = np.array([[qml.PauliZ(i) @ qml.PauliZ(j) for i in range(len(J))] for j in range(len(J))])
sigz_arr = [qml.PauliZ(i) for i in range(len(t))]

cost_h = qml.Hamiltonian([*t, *J.flatten()], [*sigz_arr, *sigzsigz_arr.flatten()], simplify=True)
mixer_h = -1 * qml.qaoa.mixers.x_mixer(range(len(J)))

qubits = len(J)

def qaoa_layer(gamma, alpha):
    qml.qaoa.cost_layer(gamma, cost_h)
    qml.qaoa.mixer_layer(alpha, mixer_h)

def circuit(params, **kwargs):
    for w in range(qubits):
        qml.Hadamard(wires=w)
    qml.layer(qaoa_layer, depth, params[0], params[1])

dev = qml.device("lightning.gpu", wires=range(qubits), custom_decomps={'PauliRot': qml.PauliRot.compute_decomposition}, batch_obs=True)

# You may also explicitly set this to use at most 1..n observables per GPU to ensure memory walls are not hit, at the expense of more compute time. e.g:
#dev = qml.device("lightning.gpu", wires=range(qubits), custom_decomps={'PauliRot': qml.PauliRot.compute_decomposition}, batch_obs=8)

@qml.qnode(dev, diff_method='adjoint')
def cost_function(params):
    circuit(params)
    return qml.expval(cost_h)

optimizer = qml.GradientDescentOptimizer()
params = npqml.array([[0.5]*depth, [0.5]*depth], requires_grad=True)

print("Optimization loop started!")

for i in range(steps):
    t0 = time.time()
    params, cost_before = optimizer.step_and_cost(cost_function, params)
    t1 = time.time()
    print(f"Cost at step {i}: {cost_before} \t time: {t1-t0} seconds")

print("Optimal Parameters")
print(params)

@qml.qnode(dev, diff_method=None)
def probability_circuit(gamma, alpha):
    circuit([gamma, alpha])
    return qml.probs(wires=range(qubits))

probs_raw = probability_circuit(params[0], params[1])
indx = np.ndindex(*[2] * qubits)
probs = {p: probs_raw[i] for i, p in enumerate(indx)}
best_bitstring = max(probs, key=probs.get)

print(f"Best bitstring: {best_bitstring} with prob: {probs[best_bitstring]}")

Feel free to try this out and let us know how it goes!

1 Like

Hi @leolettuce just a quick update. We have identified another issue with the new batching support, so I suggest waiting before making use of this new functionality. I’ll report back once we have a solution in place.

1 Like

Thank you @mlxd

I am looking forward to the new batching support :slight_smile:

I upgraded pennylane-lightning and pennylane-lightning-gpu to v0.26.1 and tested your modified code without setting batch_obs=True. I still get a memory error:

Optimization loop started!
Traceback (most recent call last):
  File "/home/user1/dev/qaoa_026.py", line 179, in <module>
    params, cost_before = optimizer.step_and_cost(cost_function, params)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/optimize/gradient_descent.py", line 59, in step_and_cost
    g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/optimize/gradient_descent.py", line 117, in compute_grad
    grad = g(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_grad.py", line 115, in __call__
    grad_value, ans = grad_fn(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_grad.py", line 133, in _grad_with_forward
    vjp, ans = _make_vjp(fun, x)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/core.py", line 10, in make_vjp
    end_value, end_node =  trace(start_node, fun, x)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 10, in trace
    end_box = fun(start_box)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/wrap_util.py", line 15, in unary_f
    return fun(*subargs, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/qnode.py", line 661, in __call__
    res = qml.execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/execution.py", line 443, in execute
    res = _execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/autograd.py", line 66, in execute
    return _execute(
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 44, in f_wrapped
    ans = f_wrapped(*argvals, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/autograd/tracer.py", line 48, in f_wrapped
    return f_raw(*args, **kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/interfaces/autograd.py", line 110, in _execute
    res, jacs = execute_fn(tapes, **gradient_kwargs)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane/_device.py", line 567, in execute_and_gradients
    jacs.append(gradient_method(circuit, **kwargs))
  File "/home/user1/anaconda3/envs/my_env/lib/python3.9/site-packages/pennylane_lightning_gpu/lightning_gpu.py", line 379, in adjoint_jacobian
    jac = adj.adjoint_jacobian(self._gpu_state, obs_serialized, ops_serialized, tp_shift)
pennylane_lightning_gpu.lightning_gpu_qubit_ops.PLException: [/project/pennylane_lightning_gpu/src/util/DataBuffer.hpp][Line:51][Method:DataBuffer]: Error in PennyLane Lightning: out of memory

Did you observe a similar error?

Hi @leolettuce

Unfortunately your example will not work without the batching support. We explicitly aim to preallocate all memory for a given example up front, which in this case is too great, as each observable in the adjoint pipeline will require a copy of the statevector, and the Hamiltonian is decomposed into many separate sub-terms.

As such, just like one might expect with backprop in each of the ML frameworks, memory becomes the issue, and I would not expect this example to work without batching the observables into a series of chunks.

The update LightningGPU v0.26.2 batch support is now complete, and should enable your example to run (I clocked the initial optimization step at 90s on 4 cards). I’d suggest pulling in the latest update once more, and setting batch_obs=True. If you wish to run this without batching support, you should also be able to explicitly convert the dense H to a sparse PennyLane Hamiltonian, and run it with the parameter-shift pipeline.

Let me know if you need further assistance.

1 Like