Lightning gpu with TensorFlow interface

I can only get the lightning gpu backend working with the Tf interface if I force everything to be a tf.float64 and use the command “tf.keras.backend.set_floatx(‘float64’)”. If I try the same but fixing everything as tf.float32 and using “tf.keras.backend.set_floatx(‘float32’)” I get type errors. If I change the backend away from lightning gpu I can use tf.float32. Is this behaviour expected?

Thanks

Hi @Bnesh , welcome to the Forum!

Yes, lightning.gpu works together with cuStateVec library and I believe that library only supports 64-bit floats.

Thank you for posting it here though! It’s good to have additional visibility over this.

Thanks for the response. Part of my code takes second order derivatives of circuits, this works as expected using default.qubit however when swapping to lightning.gpu all of my second order derivatives are within machine error of 0. Is this expected? I.e can the lightning.gpu backend only take first order derivatives?

Ahh I see I have to use a different differentiation method like “parameter-shift” or “finite-diff” which are both pretty slow. Do you have any advice for this? I initially moved away from Yao hoping to take advantage of some A100s and cuQuantum but even with gpu acceleration my code runs considerably slower.

Hi @Bnesh,

I’m not sure about the first part of the issue but I can check about this.

About the second part, performance will indeed be slower with “parameter-shift”. Comparing CPU vs GPU performance will largely depend on the number of qubits and the depth of your circuits. You only start seeing a performance improvement when using GPUs on circuits that have over 20 qubits.

This paper has some nice performance comparisons between different devices.

If you’re ok with sharing your code and the performance comparison that you experienced I can try making some suggestions that could help.

Hi @Bnesh ,

I’m coming back regarding your question about your issues with lightning.gpu. In principle there shouldn’t be any issues. However this can depend on what exactly you’re running. Have you set max_diff=2 when instantiating your QNode? You can see an example in this post here, where they calculate second order derivatives in two different ways and then compare them.

I hope this helps you!

Hello,

Thanks a lot for the response, apologies for my delayed response, I was on annual leave.

I’ll try the suggestion you gave and see if it fixes the issue. I have a different issue when trying to use lightning.gpu or lightning with the TensorFlow interface and creating a Keras layer out of the quantum circuit. My code works fine with the default.qubit interface and trains as expected however when swapping to the lightning interface I get “ValueError: Cannot convert a partially known TensorShape to a Tensor” as an error.

If lightning should work fine with the TensorFlow interface I can produce a MWE and create a new thread.

Thanks in advance

Hi @Bnesh ,

Lightning (both CPU and GPU) does indeed work with the TensorFlow interface and KerasLayer.

For example I ran the first example in the Keras Layers demo with no issues.

I do want to note that this only works with Keras 2 and not with Keras 3. If you’re using TensorFlow >= 2.16 then you’ll need to follow the instructions here.
In my case for example in Colab I ran:

!pip install tf_keras
import os
os.environ["TF_USE_LEGACY_KERAS"] = "1"

Aside from this I used this line to ignore all of the TensorFlow warnings: tf.get_logger().setLevel('ERROR')

Remember that if you’re getting a “float32 vs float64” error or similar you should add this line before you start using Keras:

tf.keras.backend.set_floatx('float64')

I hope this helps!
Let us know if you’re having any further issues.

Hello,

Thanks for the response and confirming that. I produced a MWE and it turns out the issue is not with lightning but using adjoint differentiation method, even with default.qubit using the diff_method = “adjoint” argument causes issues. I traced it back to decorating some of the functions with “@tf.function”.

If I remove the @tf.function lines I can use the diff_method = “adjoint” without issues and therefore lightning.gpu. If the @tf.function lines are included the code will execute properly provided default.qubit is used alongside the default differentiation method. I’ve attached the MWE. Is this behaviour expected?

For my full use case (combing classical and quantum) using @tf.function massively speeds up the code. I have removed lots of code here to produce the MWE so some of the choices may seem odd but I left them in to keep the structure of the program the same.

!python -m pip install pennylane pennylane-lightning pennylane-lightning-gpu --upgrade
!pip install custatevec-cu12
!pip install tensorflow==2.15
import numpy as np
import tensorflow as tf
import pennylane as qml
import keras
from keras import layers
tf.get_logger().setLevel('ERROR')
tf.keras.backend.set_floatx('float64')

@tf.numpy_function(Tout=tf.float64)
def true_sol(x):
    t = x[:,0]
    x = x[:,1]
    return (t + x).astype(np.float64)



xlength = 2
L1 = 1
n_wires = 4
dev = qml.device("lightning.qubit", wires=n_wires)


@qml.qnode(dev, interface='tf',diff_method="adjoint")
def circuit(inputs, weights):
    inputs = tf.transpose(inputs)

    for l in range(L1):
      for i in range(n_wires):
        qml.CNOT(wires=[i, (i + 1) % n_wires])

      for i in range(n_wires):
        qml.RZ(weights[l,i,xlength+1] +2*tf.einsum('i,ik->k', weights[l,i,0:xlength], inputs),wires=i)
        qml.RX(2*weights[l,i,xlength],wires=i)

      for i in range(n_wires):
        qml.CNOT(wires=[i, (i + 1) % n_wires])

    return [qml.expval(qml.PauliZ(i)) for i in range(n_wires)]



weights = tf.Variable(tf.random.normal([L1,n_wires, xlength+2],dtype=tf.float64))
weight_shapes = {"weights": (L1,n_wires, xlength+2)}
qlayer = qml.qnn.KerasLayer(circuit, weight_shapes, output_dim=n_wires) 


def modelx():
  inputs = keras.Input(shape=(2,))
  x = qlayer(inputs)
  outputclass = layers.Dense(1)(x)
  model = keras.Model(inputs=inputs, outputs=outputclass)
  return model



#@tf.function
def loss(xr):
  pred = tf.squeeze(model(xr)) 
  true = true_sol(xr)
  loss = (1/Nr)*tf.reduce_sum(tf.math.square(pred - true))
  return loss


#@tf.function
def grad(xr):
  with tf.GradientTape() as tape:
    #loss_value = loss_PINN(xr,xb,i)
    loss_value = loss(xr)
  return loss_value, tape.gradient(loss_value, model.trainable_variables)

#@tf.function
def train_step(xr):
    loss_value, grads = grad(xr)
    clipped_grads = [tf.clip_by_value(g,-1,1) for g in grads]
    optimizer.apply_gradients(zip(clipped_grads, model.trainable_variables))
    return loss_value

def training(steps):
  for i in range(steps):
    lossl = train_step(full_samples(Nr))
    print(lossl.numpy())

def full_samples(N):
  return tf.convert_to_tensor(np.random.uniform(0, 1, size=(N, 2)).astype(np.float64)) 




Nr = 64
model = modelx()
optimizer = keras.optimizers.Adam(1e-3)
steps = 100
training(steps)
print("Training Done")

Thanks in advance

Hi @Bnesh ,

Thank you for making the MWE. I can replicate the behaviour. When you comment the tf.function decorators the code runs but when I uncomment them I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-3d54eda5944b> in <cell line: 93>()
     91 optimizer = keras.optimizers.Adam(1e-3)
     92 steps = 100
---> 93 training(steps)
     94 print("Training Done")

10 frames
<ipython-input-4-3d54eda5944b> in training(steps)
     78 def training(steps):
     79   for i in range(steps):
---> 80     lossl = train_step(full_samples(Nr))
     81     print(lossl.numpy())
     82 

/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py in error_handler(*args, **kwargs)
    151     except Exception as e:
    152       filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153       raise e.with_traceback(filtered_tb) from None
    154     finally:
    155       del filtered_tb

/tmp/__autograph_generated_fileddnlkzds.py in tf__train_step(xr)
      8                 do_return = False
      9                 retval_ = ag__.UndefinedReturnValue()
---> 10                 (loss_value, grads) = ag__.converted_call(ag__.ld(grad), (ag__.ld(xr),), None, fscope)
     11                 clipped_grads = [ag__.converted_call(ag__.ld(tf).clip_by_value, (ag__.ld(g), -1, 1), None, fscope) for g in ag__.ld(grads)]
     12                 ag__.converted_call(ag__.ld(optimizer).apply_gradients, (ag__.converted_call(ag__.ld(zip), (ag__.ld(clipped_grads), ag__.ld(model).trainable_variables), None, fscope),), None, fscope)

/tmp/__autograph_generated_filerddf9hx9.py in tf__grad(xr)
      9                 retval_ = ag__.UndefinedReturnValue()
     10                 with ag__.ld(tf).GradientTape() as tape:
---> 11                     loss_value = ag__.converted_call(ag__.ld(loss), (ag__.ld(xr),), None, fscope)
     12                 try:
     13                     do_return = True

/tmp/__autograph_generated_fileoyk4ym58.py in tf__loss(xr)
      8                 do_return = False
      9                 retval_ = ag__.UndefinedReturnValue()
---> 10                 pred = ag__.converted_call(ag__.ld(tf).squeeze, (ag__.converted_call(ag__.ld(model), (ag__.ld(xr),), None, fscope),), None, fscope)
     11                 true = ag__.converted_call(ag__.ld(true_sol), (ag__.ld(xr),), None, fscope)
     12                 loss = 1 / ag__.ld(Nr) * ag__.converted_call(ag__.ld(tf).reduce_sum, (ag__.converted_call(ag__.ld(tf).math.square, (ag__.ld(pred) - ag__.ld(true),), None, fscope),), None, fscope)

/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py in tf__call(self, inputs)
     35                 batch_dims = ag__.Undefined('batch_dims')
     36                 ag__.if_stmt(ag__.ld(has_batch_dim), if_body, else_body, get_state, set_state, ('batch_dims', 'inputs'), 2)
---> 37                 results = ag__.converted_call(ag__.ld(self)._evaluate_qnode, (ag__.ld(inputs),), None, fscope)
     38 
     39                 def get_state_1():

/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py in tf___evaluate_qnode(self, x)
     59                         do_return = False
     60                         raise
---> 61                 ag__.if_stmt(ag__.converted_call(ag__.ld(isinstance), (ag__.ld(res), (ag__.ld(list), ag__.ld(tuple))), None, fscope), if_body_1, else_body_1, get_state_1, set_state_1, ('do_return', 'retval_', 'res'), 2)
     62                 return fscope.ret(retval_, do_return)
     63         return tf___evaluate_qnode

/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py in if_body_1()
     43                         nonlocal res
     44                         pass
---> 45                     ag__.if_stmt(ag__.converted_call(ag__.ld(len), (ag__.ld(x).shape,), None, fscope) > 1, if_body, else_body, get_state, set_state, ('res',), 1)
     46                     try:
     47                         do_return = True

/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py in if_body()
     38                     def if_body():
     39                         nonlocal res
---> 40                         res = [ag__.converted_call(ag__.ld(tf).reshape, (ag__.ld(r), (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(x),), None, fscope)[0], ag__.converted_call(ag__.ld(tf).reduce_prod, (ag__.ld(r).shape[1:],), None, fscope))), None, fscope) for r in ag__.ld(res)]
     41 
     42                     def else_body():

/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py in <listcomp>(.0)
     38                     def if_body():
     39                         nonlocal res
---> 40                         res = [ag__.converted_call(ag__.ld(tf).reshape, (ag__.ld(r), (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(x),), None, fscope)[0], ag__.converted_call(ag__.ld(tf).reduce_prod, (ag__.ld(r).shape[1:],), None, fscope))), None, fscope) for r in ag__.ld(res)]
     41 
     42                     def else_body():

ValueError: in user code:

    File "<ipython-input-3-78b4de627fee>", line 73, in train_step  *
        loss_value, grads = grad(xr)
    File "<ipython-input-4-3d54eda5944b>", line 68, in grad  *
        loss_value = loss(xr)
    File "<ipython-input-4-3d54eda5944b>", line 58, in loss  *
        pred = tf.squeeze(model(xr))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_file85erbwpk.py", line 37, in tf__call
        results = ag__.converted_call(ag__.ld(self)._evaluate_qnode, (ag__.ld(inputs),), None, fscope)
    File "/tmp/__autograph_generated_filexj74w4xm.py", line 61, in tf___evaluate_qnode
        ag__.if_stmt(ag__.converted_call(ag__.ld(isinstance), (ag__.ld(res), (ag__.ld(list), ag__.ld(tuple))), None, fscope), if_body_1, else_body_1, get_state_1, set_state_1, ('do_return', 'retval_', 'res'), 2)
    File "/tmp/__autograph_generated_filexj74w4xm.py", line 45, in if_body_1
        ag__.if_stmt(ag__.converted_call(ag__.ld(len), (ag__.ld(x).shape,), None, fscope) > 1, if_body, else_body, get_state, set_state, ('res',), 1)
    File "/tmp/__autograph_generated_filexj74w4xm.py", line 40, in if_body
        res = [ag__.converted_call(ag__.ld(tf).reshape, (ag__.ld(r), (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(x),), None, fscope)[0], ag__.converted_call(ag__.ld(tf).reduce_prod, (ag__.ld(r).shape[1:],), None, fscope))), None, fscope) for r in ag__.ld(res)]
    File "/tmp/__autograph_generated_filexj74w4xm.py", line 40, in <listcomp>
        res = [ag__.converted_call(ag__.ld(tf).reshape, (ag__.ld(r), (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(x),), None, fscope)[0], ag__.converted_call(ag__.ld(tf).reduce_prod, (ag__.ld(r).shape[1:],), None, fscope))), None, fscope) for r in ag__.ld(res)]

    ValueError: Exception encountered when calling layer 'keras_layer_2' (type KerasLayer).
    
    in user code:
    
        File "/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py", line 414, in call  *
            results = self._evaluate_qnode(inputs)
        File "/usr/local/lib/python3.10/dist-packages/pennylane/qnn/keras.py", line 442, in _evaluate_qnode  *
            res = [tf.reshape(r, (tf.shape(x)[0], tf.reduce_prod(r.shape[1:]))) for r in res]
    
        ValueError: Cannot convert a partially known TensorShape <unknown> to a Tensor.
    
    
    Call arguments received by layer 'keras_layer_2' (type KerasLayer):
      • inputs=tf.Tensor(shape=(64, 2), dtype=float64)

I’m wondering if it has anything to do with this bug. I’m not sure.

I don’t know if it’s possible in your case but something that can help with speedups is using Jax instead of TensorFlow. I understand that you probably have a whole workflow using TensorFlow and it’s not guaranteed that it will work with Jax, but if the speed issue is really blocking you maybe you can try it.

I got some additional info from my colleague Christina.

Using TensorFlow autograph (decorating with tf.function ) only works with default.qubit, and doesn’t really work for any non-numpy based device. Switching to a different device is likely to cause issues. This is probably why you’re seeing the issue with lightning and adjoint.