Error faced in training the quantum network for estimating parameters

Hello, whenever I try to run my program i faced an issue in training model as mentioned below. As I am new to Pennylane and QML I am having a hard time understanding what the error is. The problem I have implemented is a regression or parameter estimation problem and I am not very sure if this circuit can be used for regression. I went through the photonic architecture based function fitting demo in pennylane but that specific architecture is kind of slow for my requirement and I feel like the one below has more flexibility in terms of developing the circuits (I may be wrong). And in the former i can’t give (900,1000) as input and the only way i can do it is using Amplitude Embedding hence I resorted to this. Please go through this and let me know what can be done. Thank you.

  • I have attached the code below. I initiate inputs to 10 qubits and require only 5 outputs to estimate 5 parameters, hence i have used separate CNOT gates to entangle neighboring qubits.
  • I have provided random x_train and y_train inputs here but of the same size as the actual data. If required, I can provide the actual dataset as well.
  • Note: I am new to QML and pennlane and I am running the whole code using Google Colab and all the packages are up-to-date.
n_qubits = 10
layers = 2
dev = qml.device("lightning.qubit", wires=n_qubits)

def qnn(weights, inputs):
    qml.AmplitudeEmbedding(inputs, wires=range(n_qubits),pad_with=0.5)
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    for i in range(n_qubits-1):
      if i%2==0:
    # qml.BasicEntanglerLayers(weights, wires=range(n_qubits))
    # qml.RandomLayers(weights,wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)[1::2]]

var_init = pnp.random.randn(layers, n_qubits, 3,requires_grad=True)

def mse(observed, predictions):
    loss = 0
    for l, p in zip(observed, predictions):
        loss = loss + (l - p) ** 2

    loss = loss / len(observed)
    return loss

def cost(var, features, observed):
    preds = [qnn(var,x) for x in features]
    return mse(observed, preds)

x_train = np.random.randn(900,1000)
y_train = np.random.randn(900,5)

opt = qml.AdamOptimizer(0.1, beta1=0.9, beta2=0.999)

var = var_init
for it in range(100):
    (var, _, _), _cost = opt.step_and_cost(cost, var, x_train, y_train)
    print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

This is the error I get when I try to train my model:

TypeError                                 Traceback (most recent call last)
<ipython-input-45-100808a25dcc> in <cell line: 4>()
      3 var = var_init
      4 for it in range(100):
----> 5     (var, _, _), _cost = opt.step_and_cost(cost, var, x_train, y_train)
      6     print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

4 frames
/usr/local/lib/python3.10/dist-packages/pennylane/ in _grad_with_forward(fun, x)
    138         if not vspace(ans).size == 1:
--> 139             raise TypeError(
    140                 "Grad only applies to real scalar-output functions. "
    141                 "Try jacobian, elementwise_grad or holomorphic_grad."

TypeError: Grad only applies to real scalar-output functions. Try jacobian, elementwise_grad or holomorphic_grad.

The output of qml.about().

Name: PennyLane
Version: 0.33.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Lightning

Platform info: Linux-5.15.120±x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.23.5
Scipy version: 1.11.3

I noticed the problem was in my error function and found the reason for the problem but now it leads me to another error.

def mse(observed, predictions):
    loss = 0
    for l, p in zip(observed, predictions):
        loss = loss + (l - p) ** 2

    loss = loss / len(observed)
    return np.mean(loss)

I realized that before mse function was returning an array which isn’t what i require and solved it by returning the mean of the loss array.

The error I am currently facing is again due to the loss function and I think it is because I am using np.mean(loss) now. Can someone please help me this??

TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a real number, not 'ArrayBox'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-14-100808a25dcc> in <cell line: 4>()
      3 var = var_init
      4 for it in range(100):
----> 5     (var, _, _), _cost = opt.step_and_cost(cost, var, x_train, y_train)
      6     print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

12 frames
/usr/local/lib/python3.10/dist-packages/pennylane/optimize/ in step_and_cost(self, objective_fn, grad_fn, *args, **kwargs)
     57         """
---> 59         g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
     60         new_args = self.apply_grad(g, args)

/usr/local/lib/python3.10/dist-packages/pennylane/optimize/ in compute_grad(objective_fn, args, kwargs, grad_fn)
    115         """
    116         g = get_gradient(objective_fn) if grad_fn is None else grad_fn
--> 117         grad = g(*args, **kwargs)
    118         forward = getattr(g, "forward", None)

/usr/local/lib/python3.10/dist-packages/pennylane/ in __call__(self, *args, **kwargs)
    116             return ()
--> 118         grad_value, ans = grad_fn(*args, **kwargs)  # pylint: disable=not-callable
    119         self._forward = ans

/usr/local/lib/python3.10/dist-packages/autograd/ in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

/usr/local/lib/python3.10/dist-packages/pennylane/ in _grad_with_forward(fun, x)
    134         difference being that it returns both the gradient *and* the forward pass
    135         value."""
--> 136         vjp, ans = _make_vjp(fun, x)
    138         if not vspace(ans).size == 1:

/usr/local/lib/python3.10/dist-packages/autograd/ in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

/usr/local/lib/python3.10/dist-packages/autograd/ in trace(start_node, fun, x)
      8     with trace_stack.new_trace() as t:
      9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
     11         if isbox(end_box) and end_box._trace == start_box._trace:
     12             return end_box._value, end_box._node

/usr/local/lib/python3.10/dist-packages/autograd/ in unary_f(x)
     13                 else:
     14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
     16             if isinstance(argnum, int):
     17                 x = args[argnum]

<ipython-input-13-d579b6ae7b17> in cost(var, features, observed)
      2     preds = [qnn(var,x) for x in features]
      3     # mse = tf.keras.losses.MeanSquaredError()
----> 4     return mse(observed, preds)

<ipython-input-12-cc48b631e2ca> in mse(observed, predictions)
      6     loss = loss / len(observed)
----> 7     return np.mean(loss)

/usr/local/lib/python3.10/dist-packages/numpy/core/ in mean(*args, **kwargs)

/usr/local/lib/python3.10/dist-packages/numpy/core/ in mean(a, axis, dtype, out, keepdims, where)
   3430             return mean(axis=axis, dtype=dtype, out=out, **kwargs)
-> 3432     return _methods._mean(a, axis=axis, dtype=dtype,
   3433                           out=out, **kwargs)

/usr/local/lib/python3.10/dist-packages/numpy/core/ in _mean(a, axis, dtype, out, keepdims, where)
    188             ret = arr.dtype.type(ret / rcount)
    189         else:
--> 190             ret = ret.dtype.type(ret / rcount)
    191     else:
    192         ret = ret / rcount

ValueError: setting an array element with a sequence.

Update 2
The below is the updated cost function and training loop.

def cost(var, features, observed):
    preds = qnn(var, features)
    preds_np = np.array(preds).T

    mse = sklearn.metrics.mean_squared_error(observed, preds_np)
    return mse

opt = qml.AdamOptimizer(0.1, beta1=0.9, beta2=0.999)
num_train = len(x_train)
batch_size = 256
var = var_init
for it in range(100):
    batch_index = np.random.randint(0, num_train, (batch_size,))
    x_train_batch = x_train[batch_index]
    y_train_batch = y_train[batch_index]
    var, _cost = opt.step_and_cost(lambda v: cost(v,x_train_batch,y_train_batch), var)
    print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

The remaining code pretty much remains the same.

I have tried all I could for now but the TypeError: float() argument must be a string or a real number, not ‘ArrayBox’ still persists. I have added the error below as well.

what I could understand is that somehow my predictions are returning array[array(value),array(value)…and so on] which I don’t understand since I have made sure to make everything numpy.float. And below it says that my prediction (preds_np is anarray box), so the error lies in the calculation of mean squared error. Apart from that I have tried manually making a mse function which gave the same error and i tried other ways as well and everything returned the same error.

Shapes: observed=(256, 5), preds_np=(256, 5)
Observed: [[ 0.07636211  0.01042428  1.68223176  0.3865876   2.59282841]
 [ 0.85141523 -1.76938919  1.65395276 -1.24754269 -1.01027803]
 [ 0.48676007 -0.00888649 -0.03757296  1.75628312 -0.31670208]
 [-1.24609538  0.82554933 -0.01931672 -1.10620168  0.14732945]
 [ 0.35853445  0.06638076  0.49601253 -0.73526325 -0.77044028]
 [ 0.4579714   0.18093778 -0.10214571 -0.22933184 -1.03835437]]
Preds_np: [[<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f557af40>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed54c0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f4fb6ec0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7b8cdc0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7badfc0>]
 [<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35fbec6d80>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed5440>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7c58b80>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7b8d740>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bad5c0>]
 [<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35fcd60700>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed4d40>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7c58c00>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bba600>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bafbc0>]
 [<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed5180>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35fcd8d840>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7b8c480>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bae200>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f8cbaa80>]
 [<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed5280>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35fcd8c7c0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7b8d100>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bae000>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f8cb9780>]
 [<autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7ed4cc0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f4fb5fc0>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7b8c040>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35e7bae900>
  <autograd.numpy.numpy_boxes.ArrayBox object at 0x7c35f8cbb480>]]
TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a real number, not 'ArrayBox'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-239-956141b9c0ef> in <cell line: 5>()
      7     x_train_batch = x_train[batch_index]
      8     y_train_batch = y_train[batch_index]
----> 9     var, _cost = opt.step_and_cost(lambda v: cost(v,x_train_batch,y_train_batch), var)
     10     print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

13 frames
/usr/local/lib/python3.10/dist-packages/pennylane/optimize/ in step_and_cost(self, objective_fn, grad_fn, *args, **kwargs)
     57         """
---> 59         g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
     60         new_args = self.apply_grad(g, args)

/usr/local/lib/python3.10/dist-packages/pennylane/optimize/ in compute_grad(objective_fn, args, kwargs, grad_fn)
    115         """
    116         g = get_gradient(objective_fn) if grad_fn is None else grad_fn
--> 117         grad = g(*args, **kwargs)
    118         forward = getattr(g, "forward", None)

/usr/local/lib/python3.10/dist-packages/pennylane/ in __call__(self, *args, **kwargs)
    116             return ()
--> 118         grad_value, ans = grad_fn(*args, **kwargs)  # pylint: disable=not-callable
    119         self._forward = ans

/usr/local/lib/python3.10/dist-packages/autograd/ in nary_f(*args, **kwargs)
     18             else:
     19                 x = tuple(args[i] for i in argnum)
---> 20             return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
     21         return nary_f
     22     return nary_operator

/usr/local/lib/python3.10/dist-packages/pennylane/ in _grad_with_forward(fun, x)
    134         difference being that it returns both the gradient *and* the forward pass
    135         value."""
--> 136         vjp, ans = _make_vjp(fun, x)
    138         if not vspace(ans).size == 1:

/usr/local/lib/python3.10/dist-packages/autograd/ in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

/usr/local/lib/python3.10/dist-packages/autograd/ in trace(start_node, fun, x)
      8     with trace_stack.new_trace() as t:
      9         start_box = new_box(x, t, start_node)
---> 10         end_box = fun(start_box)
     11         if isbox(end_box) and end_box._trace == start_box._trace:
     12             return end_box._value, end_box._node

/usr/local/lib/python3.10/dist-packages/autograd/ in unary_f(x)
     13                 else:
     14                     subargs = subvals(args, zip(argnum, x))
---> 15                 return fun(*subargs, **kwargs)
     16             if isinstance(argnum, int):
     17                 x = args[argnum]

<ipython-input-239-956141b9c0ef> in <lambda>(v)
      7     x_train_batch = x_train[batch_index]
      8     y_train_batch = y_train[batch_index]
----> 9     var, _cost = opt.step_and_cost(lambda v: cost(v,x_train_batch,y_train_batch), var)
     10     print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

<ipython-input-238-515f0ead4b63> in cost(var, features, observed)
      7     print("Preds_np:", preds_np)
----> 9     mse = sklearn.metrics.mean_squared_error(observed, preds_np)
     10     return mse

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/ in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
    440     0.825...
    441     """
--> 442     y_type, y_true, y_pred, multioutput = _check_reg_targets(
    443         y_true, y_pred, multioutput
    444     )

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/ in _check_reg_targets(y_true, y_pred, multioutput, dtype)
    100     check_consistent_length(y_true, y_pred)
    101     y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
--> 102     y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
    104     if y_true.ndim == 1:

/usr/local/lib/python3.10/dist-packages/sklearn/utils/ in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    877                     array = xp.astype(array, dtype, copy=False)
    878                 else:
--> 879                     array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
    880             except ComplexWarning as complex_warning:
    881                 raise ValueError(

/usr/local/lib/python3.10/dist-packages/sklearn/utils/ in _asarray_with_order(array, dtype, order, copy, xp)
    183     if xp.__name__ in {"numpy", "numpy.array_api"}:
    184         # Use NumPy API to support order
--> 185         array = numpy.asarray(array, order=order, dtype=dtype)
    186         return xp.asarray(array, copy=copy)
    187     else:

ValueError: setting an array element with a sequence.

Hey @G_Akash, welcome to the forum, and QML for that matter :cowboy_hat_face:! Really appreciate you giving PennyLane a try :raised_hands:

I’ll get back to you tomorrow :grin:

1 Like

Hey @isaacdevlugt, thank you for the welcome.

With regards to the problem, I was able to find out the issue based on similar issues faced by other people in the forum and I realised that I did not import numpy from pennylane which was giving this error. So now the code is running but I still am not able to run it the way I want it. First off, it takes a humungous time to optimize when compared to a hybrid model or a classical model. Secondly, for reasons that I don’t know, the weights are not exactly being optimized. I have attached the revised code below


def gauss(A,mu,s,x):
  return A*(np.exp(-(x-mu)**2/(2*(s)**2)))

def line(m,c,x):
  return m*x+c

N = 1000 #no. of curves
x = np.linspace(-10,10,1000)

#Generating paramters from a normal disribution
A =  3*np.random.normal(size=N)
A = A-np.min(A)
mu = 0.5*np.random.normal(size=N)
mu = mu-np.min(mu)
s = 0.4*np.random.normal(size=N)
s = s - np.min(s)
m = 25*np.random.normal(size=N)
m = m-np.min(m)
c = 250*np.random.normal(size=N)

# Creating 1000 gaussian function with different parameters
X_gauss = []
for i in range(N):
X_gauss = np.asarray(X_gauss)
# Creating 1000 linear function with different parameters
X_line = []
for i in range(N):
X_line = np.asarray(X_line)

X_data = X_gauss + X_line #Complete input dataset
Y_data = np.array([A,mu,s,m,c]).T  #Output Dataset

## Normalizing
scaler_x =  StandardScaler()
X = scaler_x.transform(X_data)

scaler_y = StandardScaler()
Y = scaler_y.transform(Y_data)

#Train & test data split
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size = 0.1,random_state = 2)
x_train, y_train = shuffle(x_train, y_train, random_state=2)


from pennylane import numpy as np

n_qubits = 10
layers = 3
dev = qml.device("lightning.qubit", wires=n_qubits)

def qnn(weights, inputs):
    qml.AmplitudeEmbedding(inputs, wires=range(n_qubits),pad_with=0.5)
    qml.BasicEntanglerLayers(weights, wires=range(n_qubits))
    # qml.RandomLayers(weights,wires=range(n_qubits))
    # qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    for i in range(n_qubits-1):
      if i%2==0:
    out =  [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)[1::2]]
    return out

var_init = np.random.randn(layers, n_qubits,requires_grad=True)

def mse(observed,predictions):
  loss = 0
  for l, p in zip(observed, predictions):
      loss = loss + (l - p) ** 2

  loss = loss / len(observed)
  return np.mean(np.array(loss))

def cost(var, features, observed):
    preds = qnn(var, features)
    preds_np = np.array(preds,requires_grad=False).T

    # print("Shapes: observed={}, preds_np={}".format(observed.shape, preds_np.shape))
    # print("Observed:", observed)
    # print("Preds_np:", preds_np)

    cost = mse(observed, preds_np)
    return cost

def predict(var,features):
    preds = qnn(var, features)
    preds_np = np.array(preds,requires_grad=False).T
    return preds_np

x_train = np.array(x_train,requires_grad=False)
y_train = np.array(y_train,requires_grad=False)
x_test = np.array(x_test,requires_grad=False)
y_test = np.array(y_test,requires_grad=False)

opt = qml.GradientDescentOptimizer(0.5)
num_train = len(x_train)
batch_size = 256
var = var_init
for it in range(100):
    batch_index = np.random.randint(0, num_train, (batch_size,))
    x_train_batch = x_train[batch_index]
    y_train_batch = y_train[batch_index]
    (var,_,_), _cost = opt.step_and_cost(cost, var,x_train_batch,y_train_batch)
    print("Iter: {:5d} | Cost: {:0.7f} ".format(it, _cost))

The below was the output when performing optimization. As you can notice, the cost is actually not reducing rather just fluctuating about a mean value. I interrupted the run since just for 47 iterations it took about 1.5 hours. I am not sure if the problem is the way I am giving the inputs or if its with the optimizer.

Iter:     0 | Cost: 1.0262603 
Iter:     1 | Cost: 1.0714397 
Iter:     2 | Cost: 0.9919349 
Iter:     3 | Cost: 1.0263416 
Iter:     4 | Cost: 0.9702063 
Iter:     5 | Cost: 1.0137099 
Iter:     6 | Cost: 0.9921724 
Iter:     7 | Cost: 0.9564291 
Iter:     8 | Cost: 1.0681145 
Iter:     9 | Cost: 1.0018073 
Iter:    10 | Cost: 1.0337959 
Iter:    11 | Cost: 0.9845097 
Iter:    12 | Cost: 1.0391747 
Iter:    13 | Cost: 0.9958522 
Iter:    14 | Cost: 1.0203588 
Iter:    15 | Cost: 0.9757819 
Iter:    16 | Cost: 1.0648614 
Iter:    17 | Cost: 1.0381451 
Iter:    18 | Cost: 0.9311377 
Iter:    19 | Cost: 0.9928702 
Iter:    20 | Cost: 1.0282486 
Iter:    21 | Cost: 0.9701088 
Iter:    22 | Cost: 1.0893348 
Iter:    23 | Cost: 1.0686738 
Iter:    24 | Cost: 1.0427799 
Iter:    25 | Cost: 0.9902046 
Iter:    26 | Cost: 1.0055220 
Iter:    27 | Cost: 1.0186820 
Iter:    28 | Cost: 1.0431788 
Iter:    29 | Cost: 0.9940230 
Iter:    30 | Cost: 1.0011947 
Iter:    31 | Cost: 1.0071937 
Iter:    32 | Cost: 0.9964447 
Iter:    33 | Cost: 0.9874292 
Iter:    34 | Cost: 0.9739903 
Iter:    35 | Cost: 1.0177391 
Iter:    36 | Cost: 1.0043821 
Iter:    37 | Cost: 1.0131872 
Iter:    38 | Cost: 1.0328467 
Iter:    39 | Cost: 1.0112351 
Iter:    40 | Cost: 1.0668956 
Iter:    41 | Cost: 0.9830537 
Iter:    42 | Cost: 0.9398889 
Iter:    43 | Cost: 0.9663632 
Iter:    44 | Cost: 0.9398969 
Iter:    45 | Cost: 1.0388253 
Iter:    46 | Cost: 1.0033963 
Iter:    47 | Cost: 1.0774254 

If there is a better way to perform the computation for this type of dataset, do let me know. Thank you

@G_Akash glad to hear you fixed your original issue!

Regarding performance issues — steps to optimized parameters — it’s a tough thing to narrow down. I’m not familiar with your dataset or the problem you’re trying to solve, but the question of getting better performance out of your code is hard to answer. It comes down to several things, including your choice of ansatz, the optimizer, hyperparameters (e.g., learning rate, batch size, etc.), and your choice of cost function. Getting the right mix of these things for your particular problem can be challenging, but that’s the way it goes in a lot of machine learning problems.

I would try tinkering with the optimizer first, then your parameter initialization strategy, then you can start thinking about tweaking your ansatz and more serious parts to your code. Let me know if this helps!

@G_Akash you can also try using lightning.qubit and setting diff_method="adjoint" in your QNode :slight_smile:. Here are the details you’ll need to know to get started with lightning :zap: : Lightning Qubit device — Lightning 0.33.1 documentation

Hi @isaacdevlugt. Thank you for the suggestion, I will try those and get back to you. :grin:

1 Like

Hey @isaacdevlugt, regarding the optimizers, would QNGoptimizer (Quantum Gradient Descent) perform better than this for this specific problem?

Also, if it’ll help you narrow down the issue, if you know about the problem in detail, please let me know. I can explain it.

Hey @isaacdevlugt. I used the adjoint method and it helped in reducing the run time by a lot but the weights are still not being optimized. The loss is still fluctuating between a value and is not reducing sequentially.

Hey @G_Akash,

Glad to see that adjoint-diff helped a bit!

but the weights are still not being optimized. The loss is still fluctuating between a value and is not reducing sequentially.

I’m not familiar with your dataset or the problem you’re trying to solve. It sounds like your model is working — it’s changing the parameters according to an update rule based on differentiating your cost function — just not working well. It’s going to be trial and error on your end to see what will end up working well.

It might be beneficial to reduce the size of your problem to see why your algorithm isn’t working well. Using the same data that you’re currently using but smaller (e.g., if the data is images, use smaller images, say) is a good place to start. Then, you can reduce the size of your model, too, and start picking it apart piece by piece.

Hope this helps!

1 Like

Thank you @isaacdevlugt. I’ll let you know how it goes :grin:

1 Like

Hey @isaacdevlugt, I hope you are doing well, and by the way, happy holidays. :grin:

So, regarding the above, I tried working with the reduced version of the dataset. So to do so, I used a feed-forward autoencoder to the latent representation of the dataset used i.e. from (2000,512) to (2000,64). As suggested, I used the adjoint differentiation method and it helped in saving a lot of time. Apart from that, I altered the training process a little by reducing the batch size (from 256 to 32), considering more no. of epochs (100 to 500), and performing validation as well. Now, the network is able to learn, unlike the last time. To verify if the network is performing better with an increase in layers, it does that too but I suppose it faces barren plateau, so increasing the no. of layers beyond a certain value doesn’t seem to benefit the learning. Now, the issue is that the learning is not efficient; for example, the expected loss at the end of the process must be around 0.01, but the minimum it can reach is about 0.34 or more. Do you have any suggestions to improve on this? I tried using a more complex (more multi-qubit gates) circuit, but it seems to perform poorly compared to the present circuit (if required, I can give you the details of this alternate circuit).


n_qubits = int(np.log2(x_train.shape[1]))
n_layers = 10
dev = qml.device("lightning.qubit", wires=n_qubits)

def qnn(inputs,weights):
    qml.AmplitudeEmbedding(inputs, wires=range(n_qubits),pad_with=0.5)
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    # for i in range(n_qubits-1):
    #   if i%2 == 0:
    #     qml.CZ(wires=[i,i+1])
    #     qml.PauliX(wires=i)
    #     qml.CY(wires=[i,i+1])
    out =  [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)[1::2]]
    return out


var_init = np.random.randn(n_layers, n_qubits,3, requires_grad=True)


def mse(observed,predictions):
  loss = 0
  for l, p in zip(observed, predictions):
      loss = loss + (l - p) ** 2

  loss = loss / len(observed)
  return np.mean(np.array(loss))

def cost(var, features, observed):
    preds = qnn(features,var)
    preds_np = np.array(preds).T
    cost = mse(observed, preds_np)
    return cost


opt = qml.AdamOptimizer(0.1,beta1=0.9,beta2=0.999)
num_train = len(x_train)
num_val =len(x_val)
batch_size = 32
var = var_init
for it in range(500):
    batch_train = np.random.randint(0, num_train, (batch_size,))
    x_train_batch = x_train[batch_train]
    y_train_batch = y_train[batch_train]
    batch_val = np.random.randint(0, num_val, (batch_size,))
    x_val_batch = x_val[batch_val]
    y_val_batch = y_val[batch_val]
    (var,_,_), _cost = opt.step_and_cost(cost, var,x_train_batch,y_train_batch)
    val_cost = cost(var,x_val_batch,y_val_batch)
    print("Iter: {:5d} | Cost: {:0.7f} | Val_Cost: {:0.7f} ".format(it, _cost,val_cost))


test_loss = cost(var,x_test,y_test)


This is what i got as the loss function so in the end, its learning for a while and after it starts to oscillate about a mean value just like earlier but this time it has at least learned bit more.

I am able to achieve over 53% predictions accuracy.


  1. Used autoencoder to reduce the dimensionality
  2. Changed few hyperparameters.
  3. Increased layers to check more for increase in performance.
  4. Learning but not efficient enough and I am guessing the network suffers from barren plateaus.
  5. For unknown reasons, a more complex (more multi-qubit gates) network performed poorer than the model used above.
  6. Most importantly, since my output is the values of the 3 parameters defining a gaussian function, I would like the network to only give me measurements from 3 qubits. What I have done is to just measure alternate qubits which i don’t think is intuitive nor a good strategy. If you have suggestion for that please let me know.
  7. Finally for optimization or finding better initial conditions, I was wondering on using RNN based on Learning to learn with quantum neural networks and another method using Grover algorithm for efficient parameter search. Do let me know if those methods would be a good option to look into

Thank you and once again happy holidays :heart:

Hey @G_Akash, happy New Year!

Glad to see that you’re getting somewhere! I’m not familiar with your dataset or the problem you’re trying to solve, but also figuring out the right combination of every hyperparameter, loss function, etc., is the age-old problem in machine learning — it’s a lot of trial and error!

Now that your code is at least working, it might be worthwhile to try different cost functions or different parameter initialization strategies. Beyond that, I’m not sure that I can help you more :sweat_smile:

Hey @isaacdevlugt, Happy New Year!!

So, I am trying to perform regression, in a sense, to predict/estimate the parameters that define the Gaussian signal. The input is different realizations of Gaussian signals, and the output is its corresponding Gaussian parameters. As you said, it will be a lot of trial and error.

So, could you suggest any other cost functions that can be used for regression purpose? I have been using Mean-squared-error for now. Thank you for your help :heart:

Hey @G_Akash,

I’m honestly not too sure here :sweat_smile:. It might be a bit of work on your end to figure this one out, but sometimes that’s the way it goes! If you encounter any issues along the way that are related to pennylane (e.g., bugs, questions about how to use features) then please write back to us :slight_smile:

Hey @isaacdevlugt, I hope you are doing well. I just wanted to let you know that I figured out how to make my circuit perform extremely better. It involved some preprocessing of the data, usage of autoencoder, addition of a classical bias, and most importantly, I switched from lightning.qubit to default.qubit with “best” diff_method, and to my surprise, the latter was much faster than the former. I don’t understand why that is the case. I am able to achieve an accuracy of 96.3% as opposed to 53% from before.

Currently, I am using autograd framework. I am thinking of using JAX instead for faster computation, or should I give some other framework a shot?

Finally, I was wondering if there is a way to carry out multiprocessing (i.e parallel processing using different cores), but I highly doubt that would be possible. If not, is there any way to compute using a GPU (not lightning.gpu). Do let me know your thoughts on this.

Hey @G_Akash,

Glad to hear that you were able to make things perform better!

I switched from lightning.qubit to default.qubit with “best” diff_method, and to my surprise, the latter was much faster than the former.

For some regimes of qubit counts, lightning be similar in speed to default qubit. But it really shouldn’t be slower to the extent that you’re saying :thinking:… I’d make sure that you’re using the latest version of PennyLane and that you use adjoint differentiation with lightning:

dev = qml.device("lightning.qubit", wires=4)

@qml.qnode(dev, diff_method="adjoint")

Currently, I am using autograd framework. I am thinking of using JAX instead for faster computation, or should I give some other framework a shot?

JAX-JIT is a great interface to use for performance. I would even go a step further and try to use the PennyLane Catalyst plugin: Catalyst — Catalyst 0.4.1 documentation

If you’re using default.qubit, you don’t need to specify the interface explicitly, btw :slight_smile:. It will use the correct interface automatically!

I was wondering if there is a way to carry out multiprocessing

There are ways to leverage parallel computation with PennyLane :):

Let me know if this helps!

Hey @isaacdevlugt

I’d make sure that you’re using the latest version of PennyLane and that you use adjoint differentiation with lightning

I did that, but I am not sure why lightning.qubit underperformed. For now, I am sticking with default.qubit with backdrop differentiation and JAX-JIT framework. The JAX-JIT interface yields me about the same result under a quarter of the previous training time, although I can decrease it further by using my system’s GPU. However, I am not entirely sure if I wrote a consistent and efficient code. I have given the same below. Kindly let me know your suggestions.

n_qubits = int(np.log2(x_train.shape[1]))  ##which is 7 wires
n_layer = 25
dev = qml.device("default.qubit", wires=n_qubits)

def circ(weights,inputs):
    qml.AmplitudeEmbedding(features=inputs, wires=range(n_qubits),normalize=True,pad_with=0.5)
    StronglyEntanglingLayers(weights, wires=range(n_qubits),imprimitive=qml.ops.CNOT)
    out =  [qml.expval(qml.PauliZ(i)+qml.PauliZ(i+1)) for i in range(n_qubits-1)[0::2]]
    return out

def qnn(params,inputs):
    weights = params["weights"]
    bias = params["bias"]
    circ_out = circ(weights,inputs)
    out = []
    for i in range(len(circ_out)):
        out.append(circ_out[i] + bias[i])
    return out   ```

key = jax.random.PRNGKey(1234)
var_init = jax.random.uniform(key,(n_layer,n_qubits,6),minval=0,maxval=1)
bias_init = jnp.zeros(3)
params = {"weights": var_init, "bias": bias_init}

def mse(observed,predictions):
  loss = jnp.sum((observed - predictions) ** 2 / len(observed))
  return jnp.mean(loss)

def predict(params,features):
    preds = qnn(params,features)
    preds_np = jnp.asarray(preds).T
    return preds_np

batched_predict = jax.vmap(predict, in_axes=(None, 0))

def cost(params, features, observed):
    preds = batched_predict(params,features)
    cost = mse(observed, preds)
    return cost

opt = optax.adam(learning_rate=0.01)
opt_state = opt.init(params)

def update_step(params, opt_state, features, observed):
    train_cost, grads = jax.value_and_grad(cost)(params, features, observed)
    updates, opt_state = opt.update(grads, opt_state)
    params = optax.apply_updates(params, updates)
    return params, opt_state, train_cost

train_loss = []
val_loss = []
num_train = len(x_train)
num_val =len(x_val)
batch_size = 1024
epoch = 200
key_t = random.PRNGKey(0)
for i in range(epoch):
    key_t, key_v = random.split(key_t)
    idx_train = random.choice(key_t,num_train,shape=(batch_size,))
    idx_val = random.choice(key_v,num_val,shape=(batch_size,))
    x_train_batch = jnp.asarray(x_train[idx_train])
    y_train_batch = jnp.asarray(y_train[idx_train])
    x_val_batch = jnp.asarray(x_val[idx_val])
    y_val_batch = jnp.asarray(y_val[idx_val])
    params, opt_state, train_cost = update_step(params, opt_state, x_train_batch, y_train_batch)
    val_cost = cost(params,x_val_batch,y_val_batch)
    print("Epoch: {:5d} | Loss: {:0.7f} | Val_Loss: {:0.7f} ".format(i+1, train_cost,val_cost))


Note that I have changed StronglyEntanglingLayers class to include a few more gates.

Interesting! I’ll have to defer to @mlxd here on the lightning slowdown observation :thinking: