Batch circuits training with QCBM style circuit and Torchlayer

Hi, I’m trying to train a QCBM-style quantum circuit model. Being a QCBM circuit the circuit requires no inputs (for my purposes there are no embedding layers). However, TorchLayer requires the qnode to have inputs as a parameter. Therefore currently just to add a batch dimension ive used qml.batch_input and an embedding layer.

Is the correct approach to use qml.batch_params? For a given batch (lets say 16) i would like the 16 batch circuits to execute individually resulting in different bitstrings (due to ‘qml.sample()’). During loss calculation/backprop to update the weights of the qnode.

Overall I essentially want the same behaviour currently but without having to use ‘inputs’ .

pi = math.pi
class Binary_Generator(nn.Module):
    def __init__(
        self,
    ) -> None:
        super(Binary_Generator, self).__init__()
        self.n_qubits = 16
        self.depth = 4
        self.device = qml.device('default.qubit', wires = self.n_qubits, shots = 1)
        q_weight_shapes = {"q_weights_y": (self.depth * self.n_qubits),
                           "q_weights_z": (self.depth * self.n_qubits)}
        init_method = {"q_weights_y": lambda x : torch.nn.init.uniform_(x, -pi, pi),
                        "q_weights_z": lambda x : torch.nn.init.uniform_(x, -pi, pi)}
        self.q_generator = qml.QNode(self._circuit, self.device, interface="torch")
        batch_q_circuit = qml.batch_input(self.q_generator, argnum = 0)
        self.batch_q_generator = TorchLayer(batch_q_circuit, q_weight_shapes, init_method = init_method)

    def __str__(self):
        return f"QuantumGenerator({self.n_qubits}) "

    def _embed_features(self, features):
        wires = range(self.n_qubits)
        AngleEmbedding(features, wires=wires, rotation="X")

    def _circuit(self, inputs, q_weights_y, q_weights_z):
        """Builds the circuit to be fed to the connector as a QML node"""
        self._embed_features(inputs)
        q_weights_y = q_weights_y.reshape(self.depth, self.n_qubits)
        q_weights_z = q_weights_z.reshape(self.depth, self.n_qubits)
        # Repeated layer
        for i in range(self.depth):
            for y in range(self.n_qubits):
                qml.RY(q_weights_y[i][y], wires = y)
                qml.RZ(q_weights_z[i][y], wires = y)
            for y in range(self.n_qubits - 1):
                qml.CNOT(wires=[y, y + 1])

        return qml.sample()

    def forward(self, inputs: Tensor):
        return self.batch_q_generator(inputs)

Hi @Aaron_Thomas, welcome to the forum!

Just so that we’re on the same page, could you tell me what QCBM stands for here?

We built qml.qnn.TorchLayer to live on top of Torch’s Module class, which can be seen as a transformation of an input vector x, which we call inputs in TorchLayer.

I’d like to understand more what you’re trying to do for your use case. Is your objective to return a batch of bitstrings from a parametrized circuit (with no input parameters)? Do you intend to optimize the circuit, and if so what is your cost function?

Hi there @Tom_Bromley, thank you for taking the time to respond to my question.

Here by QCBM, I am referring to a quantum circuit-born machine, my only point being that there are no inputs. In my case, i want to start with the initial qubit register, apply my circuit and then measure out the bitstring at the end using qml.sample().

Essentially the behavior I currently have is that for a batch of size 1, or just an execution of a circuit I output a bitstring using something like the following.

pi = math.pi
class Generator(nn.Module):
    def __init__(
        self,
    ) -> None:
        super(Binary_Generator, self).__init__()
        self.n_qubits = 16
        self.depth = 4
        self.device = qml.device('default.qubit', wires = self.n_qubits, shots = 1)
        q_weight_shapes = {"q_weights_y": (self.depth * self.n_qubits),
                           "q_weights_z": (self.depth * self.n_qubits)}
        init_method = {"q_weights_y": lambda x : torch.nn.init.uniform_(x, -pi, pi),
                        "q_weights_z": lambda x : torch.nn.init.uniform_(x, -pi, pi)}
        q_generator = qml.QNode(self._circuit, self.device, interface="torch")
        self.q_generator = TorchLayer(q_generator, q_weight_shapes, init_method = init_method)

    def _circuit(self, inputs, q_weights_y, q_weights_z):
        """Builds the circuit to be fed to the connector as a QML node"""
    
        q_weights_y = q_weights_y.reshape(self.depth, self.n_qubits)
        q_weights_z = q_weights_z.reshape(self.depth, self.n_qubits)
        # Repeated layer
        for i in range(self.depth):
            for y in range(self.n_qubits):
                qml.RY(q_weights_y[i][y], wires = y)
                qml.RZ(q_weights_z[i][y], wires = y)
            for y in range(self.n_qubits - 1):
                qml.CNOT(wires=[y, y + 1])

        return qml.sample()

    def forward(self, inputs: Tensor):
        return self.q_generator(inputs)

I run an instance of the class by doing the following, note how in the _circuit() method inputs is unused.

generator = Generator()
inputs = torch.tensor([])
output_bitstring = generator(inputs)

Now for some given data (which are represented as bitstrings), I want to train the parameters theta of the Generator class using a BCE loss function s.t that outputs of the generator align with the data. I’ve got the loss function/optimization to do so working however instead of training one circuit or batch_size = 1. I would like to train this class/circuit in an analogous way to how you do batch_training of inputs on a typical ml application. So if I would like to do a batch_size = 64 on the forward pass the circuit executes 64 times all updating the same params theta of the network upon backpropagation.

The issue comes that because there is no ‘input’ in the typical sense, doing training in this way needs to be defined differently as I believe qml.batch_input needs the params along the axis you’ve defined through argnum to be nontrainable parameters, here there are no parameters to start with as I never use inputs when calling the generator object.

Hi @Aaron_Thomas!

I’ve had a go at replicating what I think you are trying to do:

import pennylane as qml
import torch
from math import pi
from torch.nn.functional import binary_cross_entropy

batches = 10
batch_size = 100

n_qubits = 2
depth = 4
wires = range(n_qubits)

dev = qml.device("default.qubit")

@qml.qnode(dev)
def circuit(q_weights_y, q_weights_z):
    for i in range(depth):
        for y in range(n_qubits):
            qml.RY(q_weights_y[i][y], wires=y)
            qml.RZ(q_weights_z[i][y], wires=y)
        for y in range(n_qubits - 1):
            qml.CNOT(wires=[y, y + 1])
    return qml.probs(wires)

q_weights_y = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_z = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_y.retain_grad()
q_weights_z.retain_grad()

# Generate some random data to train against
data = torch.randint(2, size=(batches, batch_size, n_qubits))

# Convert to probabilities
probs = torch.tensor(qml.probs().process_samples(data, wires))

# Probability distribution sampled from PennyLane circuit
circuit_probs = torch.stack(circuit(q_weights_y, q_weights_z, shots=[batch_size] * batches))

loss = binary_cross_entropy(circuit_probs, probs)
loss.backward()
gradients = (q_weights_y, q_weights_z)

Let me know if this makes sense to you or whether you’re looking for something else. I’m not sure if using a TorchLayer is needed here. One potential issue with the above is that the trained probability distribution will scale with the number of qubits.

Hi @Tom_Bromley, thank you for your response.
I may not have been as clear, for my use case im using a hybrid model hence the inclusion of the Torchlayer for the quantum circuit. The classical network is a discriminator that looks something like the following:

class Network(nn.Module):
    """Fully connected classical network"""

    def __init__(self, n_qubits):
        super(Binary_Discriminator,self).__init__()

        self.model = nn.Sequential(
            # Inputs to first hidden layer (num_input_features -> 64)
            nn.Linear(n_qubits, 64),
            nn.LeakyReLU(),
            nn.Sigmoid(),
        )
    def forward(self, 
                input):
        return self.model(input)

I am training the output of the quantum network (bitstrings as for my use case i use qml.sample()) adversarially using the above network to discriminate fake/real data.

Ive attempted to do something like the following to execute the circuits in parallel:

@transform
def batch_circ(tape: QuantumTape, batch_size: int],) -> (Sequence[QuantumTape], Callable):
   
    output_tapes = []

    def _split_operations(ops, num_tapes):

        new_ops = [[] for _ in range(num_tapes)]
        
        for op in ops:
            for b in range(num_tapes):
                new_ops[b].append(op)
    
        return new_ops
    
    for ops in _split_operations(tape.operations, batch_size):
        new_tape = QuantumScript(
            ops, tape.measurements, shots=tape.shots, trainable_params=tape.trainable_params
        )
        output_tapes.append(new_tape)

    def processing_fn(res):
        return _nested_stack(res)

    return output_tapes, processing_fn

This outputs the correct shape (length: batch_size) when you apply the batch_circ function to a Qnode. However, all of the output bitstrings are the same!!! I would expect given there is a probability over all the different basis that this wouldnt/couldnt be the case.

Hi @Aaron_Thomas, thanks for the extra detail! I just have a few questions:

im using a hybrid model hence the inclusion of the Torchlayer for the quantum circuit

I’m not sure why a TorchLayer is needed to make the model hybrid, is this just for convenience?

I am training the output of the quantum network (bitstrings as for my use case i use qml.sample())

For a single bitstring sample, how will you differentiate through the quantum circuit? This is technically possible in some cases, as we discuss in the note on this page, but I just want to make sure that I understand correctly that this is what you want to do. An alternative could be to return the probability distribution from the quantum circuit, which is deterministic and can be differentiated through.

Hi @Tom_Bromley, Upon reflection I do believe you are correct I do not need a Torchlayer here so thank you for pointing that out.

And yes I am differentiating through the bitstrings using some of the caveats mentioned on that page (1 shot sampling + parameter shift differentiation). I am using bitstrings due to the size of my circuit, 16 qubits, which would output a 2 ** 16 length probability distribution over all the bases. This is too large to process for my use case due to RAM requirements when loading in my dataset (which would then have 2 **16 elements additionally) + training. That does however work on smaller circuits/datasets (I’ve tested up to 6 qubits this way).

This function batch_circ (although it could do with cleaning up) does operate almost how I want it to, outputting shape (batch_size, n_qubits) and replacing batch_input. I just am not sure why each bitstring is the same in the list, as its a stochastic process, even though it is the same circuit with the same params being executed, the output should vary. I ran a for-loop over the circuit to test this and it was the case, e.g.:

q_network = Generator() #Instantiate class/quantum network with bitstring output

bitstring_list = []
for i in range(batch_size):
      input = torch.tensor([])
      bitstring = q_network(input)
      bitstring_list.append(bitstring)

And as expected the resulting list of bitstrings is all different. I suspect there is something deeper going on in the callable processing_fn , specifically in the _nested_stack function that is causing this behavior?

Hi @Aaron_Thomas, sorry for the late reply.

And yes I am differentiating through the bitstrings using some of the caveats mentioned on that page (1 shot sampling + parameter shift differentiation). I am using bitstrings due to the size of my circuit, 16 qubits, which would output a 2 ** 16 length probability distribution over all the bases. This is too large to process for my use case due to RAM requirements when loading in my dataset (which would then have 2 **16 elements additionally) + training. That does however work on smaller circuits/datasets (I’ve tested up to 6 qubits this way).

Makes sense now, got it! I’m not sure if a transform is the way to go here - if we want to generate a collection of samples from runs of the circuit with the same parameters, we can use shot-batching, e.g., by setting shots=[1]*batch_size to get us batch_size-many samples.

I’m also a bit cautious because torch is not compatible with differentiating samples. Instead, one can return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)] to get shots-like results that are torch-compatible and differentiable.

Does the script below help?

import pennylane as qml
import torch
from math import pi
from torch.nn.functional import binary_cross_entropy

batch_size = 5

n_qubits = 10
depth = 4
wires = range(n_qubits)

dev = qml.device("default.qubit")

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(q_weights_y, q_weights_z):
    for i in range(depth):
        for y in range(n_qubits):
            qml.RY(q_weights_y[i][y], wires=y)
            qml.RZ(q_weights_z[i][y], wires=y)
        for y in range(n_qubits - 1):
            qml.CNOT(wires=[y, y + 1])
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

q_weights_y = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_z = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_y.retain_grad()
q_weights_z.retain_grad()

# Generate some random data to train against
data = torch.randint(2, size=(batch_size, n_qubits), dtype=torch.float64)

# # Probability distribution sampled from PennyLane circuit
samples = circuit(q_weights_y, q_weights_z, shots=[1] * batch_size)
samples = (torch.stack([torch.stack(s) for s in samples]) + 1) / 2

loss = binary_cross_entropy(samples, data)
loss.backward()
gradients = (q_weights_y.grad, q_weights_z.grad)

Hi there @Tom_Bromley apologies for the long time to reply. May I ask, why is torch not compatible with diff w.r.t samples? My training function does run so this surprises me.

Thank you for the idea of shot batching i can see if this implementation will work with my use case. However, to expand on my idea to use a transform, why would that not be okay to generate a collection of samples from runs of the circuit? Once you _split_operations (create batch size amount copies of the circuit) why does the processing_fn then return the same output for all circuits?

Another test I did was to use the batch_input decorater with angle encoding, however I set the parameters of the angle encoding to all have the same constant value (so each circuit is again the same). This also results in the same output for all circuits? I would expect the behaviour of the batch operations to naturally execute the circuits independently so you DONT get the same outputs, these are quantum circuits so naturally, the output shouldn’t be deterministic. However, if this is the expected behavior why is this?

Hi @Aaron_Thomas, sorry we’re a bit slow to reply at the moment, it’s been a busy month with QHack and now preparation for our PennyLane 0.35 release next week :tada: I’ll try to get back with an answer as soon as possible!

Hi @Aaron_Thomas, just coming back to this now. Do you think it would make sense to have a brief call to discuss more about what you’re trying to do with PennyLane? We’re happy to help, and it might be easier than doing this via messages in the forum.

Hi @Tom_Bromley, thank you for getting back to me and the kind offer of a uick call to hash out the problems. I think however your idea of shot batching is exactly the suggestion i was looking for! I believe the code you put above is what I was aiming for and using the batching approach I was attempting with the transform is overkill. I can also see qml.probs() doesnt have a grad_fn() and hence there would be no gradient propagation. I recreated your code but utilising a TorchLayer instead as you can see below, sometimes I want to do the same computation that you put above, however, I find using a torchlayer more convenient for things like initialization and other purposes.

from pennylane.qnn import TorchLayer
import pennylane as qml

    batch_size = 2
    n_qubits = 3
    depth = 4

    dev = qml.device("default.qubit", shots= [1] *batch_size)

    def circuit():

        @qml.qnode(dev, diff_method="parameter-shift", interface = 'torch')
        def _qnode(inputs, q_weights_y, q_weights_z):
            for i in range(depth):
                for y in range(n_qubits):
                    qml.RY(q_weights_y[i][y], wires=y)
                    qml.RZ(q_weights_z[i][y], wires=y)
                for y in range(n_qubits - 1):
                    qml.CNOT(wires=[y, y + 1])
            return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]
        
        q_weight_shapes = {"q_weights_y": (depth, n_qubits),
                           "q_weights_z": (depth, n_qubits)}
        
        q_layer = TorchLayer(_qnode, q_weight_shapes)

        return q_layer

    
    q_layer = circuit()
    inputs = torch.tensor([]) #dummy input
    output = q_layer(inputs)
    print(output)

However, recreating shot batching with a torchlayer unfortunately isn’t possible as this example. An error occurs, namely:

in _evaluate_qnode
    return torch.hstack(res).type(x.dtype)
           ^^^^^^^^^^^^^^^^^
TypeError: expected Tensor as element 0 in argument 0, but got list

As far as I can diagnose this is simply an issue with the way the function _evaluate_qnode is coded, i.e. it can’t handle shot batching. The res argument you get just before this line in _evaluate_qnode is the same line that you get at the output of:

samples = circuit(q_weights_y, q_weights_z, shots=[1] * batch_size)

from your code above. Im not quite sure what the fix is for this however I wanted to.point it our.
Thank you so much for all your help.

Also, just to note as it’s a consequence ive just run into. As we can no longer use the torchlayer when later doing optimization of a ML model and using standard optimizers. In my case I am using the Adam optimizer based on torch.optim.Adam, you can no longer pass into the optimizer something along the lines of model. parameters (). Instead, you must explicitly give the parameters of your model!

This makes changing models around a bit of a hassle if you want to change your circuit/structure.

So overall it seems there are pros and cons of using the TorchLayer currently as implemented. If you want to do shot batching, or change the number of shots at run time, Torchlayer cant handle that. However the standard qnode doesn’t interface as well due to things like the .paramaters() that you can no longer do (to the best of my knowledge) and in my opinion at least i prefer the init_method that comes with TorchLayer.

Just putting this bug report here for the paper trail: [BUG] Shot-Batching does not work with TorchLayers · Issue #5319 · PennyLaneAI/pennylane · GitHub :slight_smile:

Hi @Tom_Bromley, so I’ve implemented the quantum circuit how we’ve discussed, obtaining differentiable bitstring-like results using [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)] and then shifting the results as you outlined:

samples = circuit(q_weights_y, q_weights_z, shots=[1] * batch_size)
samples = (torch.stack([torch.stack(s) for s in samples]) + 1) / 2

When implemented this runs/works smoothly however I didn’t achieve the results I expected to (as I’m trying to recreate results from a particular paper). The paper in question is the following: [2309.12681] Tight and Efficient Gradient Bounds for Parameterized Quantum Circuits which uses a QGAN (QCBM generator) to produce samples (bitstrings) which are then fed to a discriminator network along with synthetic data. I contacted the authors and they confirmed that they used sampling of the basis states to generate the bitstrings, not the [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)] + shifting method implemented here. They didn’t expand however on how they did this in a differentiable way however they have used the ADAM optimizer for their generator and parameter shift rules to obtain gradients. From what I can see from pennylane qml.sample() returns bitstrings but not a gradient function for back-propogation. Is there another way in which we can sample bitstrings with a grad_fn as they have done here? perhaps a manual implementation of parameter shift although I am unsure.

I hope this was clear as to what I am trying to achieve with pennylane.

Hi @Aaron_Thomas,
Thanks for your comment here. Tom will take a look in the next few days. He’ll get back to you next week.