Gradients of quantum generator with tf.interface is none

I’m trying to make a QGAN with a quantum generator and a classical NN as the discriminator but I’m having trouble fetching the gradients of the gen_weights in a trainable circuit.

However, finding the gradients of the discriminator’s trainable variables work. I’ve tested each loss function and it works as well (I’m using tf.BinaryCrossEntropy). I have a hunch that the problem here is that the generator loss isn’t being differentiated beyond the point of the classical NN but I’m not sure why this is the case. This is what the structure of each loss looks like:

  • disc_loss: generate array of data from gen_circuit(gen_weights) --> feed both generated data and real data into discriminator NN --> sum up cross-entropies between the confidence rates of both real and fake data, and ideal
  • gen_loss: generate array of data from gen_circuit(gen_weights) --> feed generated data into discriminator NN --> find cross-entropy between discriminator output for fake data and a ones matrix

tf can fetch gradients of each loss with respect to the classical NN weights but can’t reach back further like in the case of gen_loss to calculate the gradients of gen_weights. So how can I fetch these gradients as well? Here is the relevant code sample which return none when fetching the generator gradients but returns the relevant tensor when fetching discriminator’s gradients:

Thank you!

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def generator_loss(fake_output):
    """Calculating loss"""
    return cross_entropy(np.ones_like(fake_output), fake_output)

def discriminator_loss(fake_output, real_output):    
    """Compute discriminator loss.""" 
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

def train_step(equity_data, gen_weights):
    """Run train step on provided image batch."""
    with tf.GradientTape() as disc_tape, tf.GradientTape() as gen_tape: 
        generated_prices = [equity_data[0], gen_circuit(equity_data[0], gen_weights)] 
        
        """Reshaping equity arrays to feed into discrim"""
        gen_prices_in_one_arr = reshape_to_one_axis(generated_prices)
        real_prices_in_one_arr = reshape_to_one_axis(equity_data)

        """Getting outputs from discrim"""
        real_output = discriminator(real_prices_in_one_arr)
        fake_output = discriminator(gen_prices_in_one_arr)

        """Calculating loss"""
        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)
    
    gradients_of_generator = gen_tape.gradient(gen_loss, gen_weights)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
    print(gradients_of_generator )
#     generator_optimizer.apply_gradients(
#         zip(gradients_of_generator, gen_weights))
#     discriminator_optimizer.apply_gradients(
#         zip(gradients_of_discriminator, discriminator.trainable_variables))

    return gen_loss, disc_loss

Hi @Pavan, thank you for your question.

Looking at your code, this seems like a completely classical model built using tensorflow and keras, right? So unless I’m missing something, it means that the difficulties you are observing are not related to PennyLane, correct?

I can still try to help out, but I want to make sure that I understand what the problem is exactly

Hi,

I should have included the gen_circuit function too, since it’s an evaluation of a q_node. Here is the qnode being evaluated in the train_step function:

def gen_ansatz(w):
    qml.broadcast(unitary=qml.RY, pattern = 'single', wires = wires, parameters = w[0:15])
    for k in range(1, int(len(w)) // 15):
        qml.broadcast(unitary=qml.CZ, pattern = 'ring', wires=wires)
        qml.broadcast(unitary=qml.RY, pattern = 'single', wires = wires, parameters = w[(15*k):(15*(k+1))])
@qml.qnode(dev, interface="tf")
def gen_circuit(b_seq, gen_weights):
    qml.templates.AngleEmbedding(b_seq, wires, rotation='X')
    gen_ansatz(gen_weights)
    return [qml.expval(qml.PauliZ(i)) for i in range(4)]

Hi @Pavan!

I went through your code and I think I identified the issues. I was seeing errors when calling gen_ansatz(), so I played with that function on its own and found a couple of problems related to the shape of the parameter.

I believe you need to make sure that w is a one-dimensional array of size 15**2. In your case you are passing init_gen_weights, but initializing it to have dimension 15. When I change the line to init_gen_weights = np.random.normal(0, 1, 15 **2), I don’t see any errors.

Let me know if that fixes your problem!

Hi Juan,

I just tried that and it still returns an array None for the generator’s gradients. I don’t think the parameter’s size is what matters here since I made it such that any size that is a multiple of 15 works (15 params per layer).

I think one problem could be that gen_ansatz() isn’t a QNode function making TF unable to differentiate its weights. However, when I tried just moving the content in gen_ansatz to the broader gen_circuit() QNode with tf-interface, I still encountered the same None gradient when evaluating for its’ gradients in the train_step() function.

Thank you!
Pavan

Hmmm, this may be a tough nut to crack. I realize that I also changed the gen_ansatz() function to the following:

def gen_ansatz(w):
qml.broadcast(unitary=qml.RY, pattern = 'single', wires = wires, parameters = w[0:15])
for k in range(1, 15):
    qml.broadcast(unitary=qml.CZ, pattern = 'ring', wires=wires)
    qml.broadcast(unitary=qml.RY, pattern = 'single', wires = wires, parameters = w[(15*k):(15*(k+1))])

This, together with the change to the dimension I mentioned above, makes it so that when I run your gen_grad() function I don’t see any immediate issues.

I am sharing my notebook below. Please let me know if you still see any issue
QGAN_JM.ipynb (28.9 KB)

Thank you for that but actually the gen_grad() wasn’t what I was trying to fix. That is a toy function I made to test if it’s possible to calculate gradients of the quantum circuit which I found it was.

My question has to do with the train_step() function below that and why it isn’t calculating the generator’s gradients with respect to the actual loss function. When calling it, it returns None.

Thanks!

Got it! Helps a lot to have clear explanations of the precise issue. Let me dig deeper and see what’s going on

1 Like

Hi @Pavan! This is a puzzling problem, I’d like to get to the bottom of it :thinking: Unfortunately, it’s a bit hard to debug as the provided Jupyter notebook is quite big, and includes a lot of auxiliary code and functions!

Do you think you might be able to reduce it down to a minimal non-working Python script example? That is, remove as much of the code as possible such that the strange behaviour (gradients being returned as None) remains?

That would be a huge help, and allow us to easily work out what is going wrong :slight_smile:

In the meantime, since the problem seems to be occuring within gen_circuit, I have some ideas/questions that might shine some light on the problem.

  1. Is gen_circuit, isolated by itself, differentiable? That is, can you do

    with tf.GradientTape() as tape:
        res = np.sum(gen_circuit(b_seq, gen_weights))
    
    grad = tape.gradient(res, gen_weights)
    
  2. If grad remains None, it might be worth investigating the internals of the QNode to see where the differentiability breaks. For example, what happens if you simplify gen_ansatz bit by bit? If you replace qml.broadcast with a manual for loop?

Hi @josh, thanks for helping :slight_smile: The forum says I can’t attach anything since I’m a new user but I’ll link the minimum non-working py file as a GitHub gist here. It’s basically just a parameterized RY gate on a single wire quantum circuit that feeds the expectation value into a single neuron neural network.

After trying the first suggestion, it does still return None when calling the gradient but I’m going to experiment with it more now. Thanks for the suggestion!

Hey, quick update: I was experimenting with the block you sent and I found that it returns a gradient only if the import statement is import numpy as np instead of importing numpy from pennylane like from pennylane import numpy as np.

In the latter case, you get an error: TypeError: iteration over a 0-d array This still doesn’t fix the minimum non-working example but it is a good thing to note :slight_smile:

Hi @josh, thanks so much for your help but I think I figured it out! The problem in the minimum non-working example is that I didn’t feed in a tf.Variable to the tape.gradient() func. When extending it to my actual code sample, I encountered more problems but fixed them after realizing that all data transformations have to be done using tf functions instead of np functions like np.reshape() to maintain mutability of the tape.

This is what the final train_step function looks like!

def train_step(equity_data, gen_weights):
    """Run train step on provided image batch."""
    with tf.GradientTape() as disc_tape, tf.GradientTape() as gen_tape: 
        generated_prices = tf.concat([equity_data[0], gen_circuit(equity_data[0], gen_weights)], 0)
        generated_prices = tf.reshape(generated_prices, (1,1,19))

        real_prices = tf.concat([equity_data[0], equity_data[1]], 0)
        real_prices = tf.reshape(real_prices, (1,1,19))
        """Getting outputs from discrim"""
        real_output = discriminator(real_prices)
        fake_output = discriminator(generated_prices)

        """Calculating loss"""
        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)
    
    print(f"Generator loss: {gen_loss} \n Discriminator loss: {disc_loss}")
    gradients_of_generator = gen_tape.gradient(gen_loss, gen_weights)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(
        zip([gradients_of_generator], [gen_weights]))
    discriminator_optimizer.apply_gradients(
        zip(gradients_of_discriminator, discriminator.trainable_variables))
    
    return gen_loss, disc_loss, gen_weights

@Pavan great to hear you got it working! Let us know if you have any other questions :slight_smile:

1 Like