Batch circuits training with QCBM style circuit and Torchlayer

Hi, I’m trying to train a QCBM-style quantum circuit model. Being a QCBM circuit the circuit requires no inputs (for my purposes there are no embedding layers). However, TorchLayer requires the qnode to have inputs as a parameter. Therefore currently just to add a batch dimension ive used qml.batch_input and an embedding layer.

Is the correct approach to use qml.batch_params? For a given batch (lets say 16) i would like the 16 batch circuits to execute individually resulting in different bitstrings (due to ‘qml.sample()’). During loss calculation/backprop to update the weights of the qnode.

Overall I essentially want the same behaviour currently but without having to use ‘inputs’ .

pi = math.pi
class Binary_Generator(nn.Module):
    def __init__(
        self,
    ) -> None:
        super(Binary_Generator, self).__init__()
        self.n_qubits = 16
        self.depth = 4
        self.device = qml.device('default.qubit', wires = self.n_qubits, shots = 1)
        q_weight_shapes = {"q_weights_y": (self.depth * self.n_qubits),
                           "q_weights_z": (self.depth * self.n_qubits)}
        init_method = {"q_weights_y": lambda x : torch.nn.init.uniform_(x, -pi, pi),
                        "q_weights_z": lambda x : torch.nn.init.uniform_(x, -pi, pi)}
        self.q_generator = qml.QNode(self._circuit, self.device, interface="torch")
        batch_q_circuit = qml.batch_input(self.q_generator, argnum = 0)
        self.batch_q_generator = TorchLayer(batch_q_circuit, q_weight_shapes, init_method = init_method)

    def __str__(self):
        return f"QuantumGenerator({self.n_qubits}) "

    def _embed_features(self, features):
        wires = range(self.n_qubits)
        AngleEmbedding(features, wires=wires, rotation="X")

    def _circuit(self, inputs, q_weights_y, q_weights_z):
        """Builds the circuit to be fed to the connector as a QML node"""
        self._embed_features(inputs)
        q_weights_y = q_weights_y.reshape(self.depth, self.n_qubits)
        q_weights_z = q_weights_z.reshape(self.depth, self.n_qubits)
        # Repeated layer
        for i in range(self.depth):
            for y in range(self.n_qubits):
                qml.RY(q_weights_y[i][y], wires = y)
                qml.RZ(q_weights_z[i][y], wires = y)
            for y in range(self.n_qubits - 1):
                qml.CNOT(wires=[y, y + 1])

        return qml.sample()

    def forward(self, inputs: Tensor):
        return self.batch_q_generator(inputs)

Hi @Aaron_Thomas, welcome to the forum!

Just so that we’re on the same page, could you tell me what QCBM stands for here?

We built qml.qnn.TorchLayer to live on top of Torch’s Module class, which can be seen as a transformation of an input vector x, which we call inputs in TorchLayer.

I’d like to understand more what you’re trying to do for your use case. Is your objective to return a batch of bitstrings from a parametrized circuit (with no input parameters)? Do you intend to optimize the circuit, and if so what is your cost function?

Hi there @Tom_Bromley, thank you for taking the time to respond to my question.

Here by QCBM, I am referring to a quantum circuit-born machine, my only point being that there are no inputs. In my case, i want to start with the initial qubit register, apply my circuit and then measure out the bitstring at the end using qml.sample().

Essentially the behavior I currently have is that for a batch of size 1, or just an execution of a circuit I output a bitstring using something like the following.

pi = math.pi
class Generator(nn.Module):
    def __init__(
        self,
    ) -> None:
        super(Binary_Generator, self).__init__()
        self.n_qubits = 16
        self.depth = 4
        self.device = qml.device('default.qubit', wires = self.n_qubits, shots = 1)
        q_weight_shapes = {"q_weights_y": (self.depth * self.n_qubits),
                           "q_weights_z": (self.depth * self.n_qubits)}
        init_method = {"q_weights_y": lambda x : torch.nn.init.uniform_(x, -pi, pi),
                        "q_weights_z": lambda x : torch.nn.init.uniform_(x, -pi, pi)}
        q_generator = qml.QNode(self._circuit, self.device, interface="torch")
        self.q_generator = TorchLayer(q_generator, q_weight_shapes, init_method = init_method)

    def _circuit(self, inputs, q_weights_y, q_weights_z):
        """Builds the circuit to be fed to the connector as a QML node"""
    
        q_weights_y = q_weights_y.reshape(self.depth, self.n_qubits)
        q_weights_z = q_weights_z.reshape(self.depth, self.n_qubits)
        # Repeated layer
        for i in range(self.depth):
            for y in range(self.n_qubits):
                qml.RY(q_weights_y[i][y], wires = y)
                qml.RZ(q_weights_z[i][y], wires = y)
            for y in range(self.n_qubits - 1):
                qml.CNOT(wires=[y, y + 1])

        return qml.sample()

    def forward(self, inputs: Tensor):
        return self.q_generator(inputs)

I run an instance of the class by doing the following, note how in the _circuit() method inputs is unused.

generator = Generator()
inputs = torch.tensor([])
output_bitstring = generator(inputs)

Now for some given data (which are represented as bitstrings), I want to train the parameters theta of the Generator class using a BCE loss function s.t that outputs of the generator align with the data. I’ve got the loss function/optimization to do so working however instead of training one circuit or batch_size = 1. I would like to train this class/circuit in an analogous way to how you do batch_training of inputs on a typical ml application. So if I would like to do a batch_size = 64 on the forward pass the circuit executes 64 times all updating the same params theta of the network upon backpropagation.

The issue comes that because there is no ‘input’ in the typical sense, doing training in this way needs to be defined differently as I believe qml.batch_input needs the params along the axis you’ve defined through argnum to be nontrainable parameters, here there are no parameters to start with as I never use inputs when calling the generator object.

Hi @Aaron_Thomas!

I’ve had a go at replicating what I think you are trying to do:

import pennylane as qml
import torch
from math import pi
from torch.nn.functional import binary_cross_entropy

batches = 10
batch_size = 100

n_qubits = 2
depth = 4
wires = range(n_qubits)

dev = qml.device("default.qubit")

@qml.qnode(dev)
def circuit(q_weights_y, q_weights_z):
    for i in range(depth):
        for y in range(n_qubits):
            qml.RY(q_weights_y[i][y], wires=y)
            qml.RZ(q_weights_z[i][y], wires=y)
        for y in range(n_qubits - 1):
            qml.CNOT(wires=[y, y + 1])
    return qml.probs(wires)

q_weights_y = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_z = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_y.retain_grad()
q_weights_z.retain_grad()

# Generate some random data to train against
data = torch.randint(2, size=(batches, batch_size, n_qubits))

# Convert to probabilities
probs = torch.tensor(qml.probs().process_samples(data, wires))

# Probability distribution sampled from PennyLane circuit
circuit_probs = torch.stack(circuit(q_weights_y, q_weights_z, shots=[batch_size] * batches))

loss = binary_cross_entropy(circuit_probs, probs)
loss.backward()
gradients = (q_weights_y, q_weights_z)

Let me know if this makes sense to you or whether you’re looking for something else. I’m not sure if using a TorchLayer is needed here. One potential issue with the above is that the trained probability distribution will scale with the number of qubits.

Hi @Tom_Bromley, thank you for your response.
I may not have been as clear, for my use case im using a hybrid model hence the inclusion of the Torchlayer for the quantum circuit. The classical network is a discriminator that looks something like the following:

class Network(nn.Module):
    """Fully connected classical network"""

    def __init__(self, n_qubits):
        super(Binary_Discriminator,self).__init__()

        self.model = nn.Sequential(
            # Inputs to first hidden layer (num_input_features -> 64)
            nn.Linear(n_qubits, 64),
            nn.LeakyReLU(),
            nn.Sigmoid(),
        )
    def forward(self, 
                input):
        return self.model(input)

I am training the output of the quantum network (bitstrings as for my use case i use qml.sample()) adversarially using the above network to discriminate fake/real data.

Ive attempted to do something like the following to execute the circuits in parallel:

@transform
def batch_circ(tape: QuantumTape, batch_size: int],) -> (Sequence[QuantumTape], Callable):
   
    output_tapes = []

    def _split_operations(ops, num_tapes):

        new_ops = [[] for _ in range(num_tapes)]
        
        for op in ops:
            for b in range(num_tapes):
                new_ops[b].append(op)
    
        return new_ops
    
    for ops in _split_operations(tape.operations, batch_size):
        new_tape = QuantumScript(
            ops, tape.measurements, shots=tape.shots, trainable_params=tape.trainable_params
        )
        output_tapes.append(new_tape)

    def processing_fn(res):
        return _nested_stack(res)

    return output_tapes, processing_fn

This outputs the correct shape (length: batch_size) when you apply the batch_circ function to a Qnode. However, all of the output bitstrings are the same!!! I would expect given there is a probability over all the different basis that this wouldnt/couldnt be the case.

Hi @Aaron_Thomas, thanks for the extra detail! I just have a few questions:

im using a hybrid model hence the inclusion of the Torchlayer for the quantum circuit

I’m not sure why a TorchLayer is needed to make the model hybrid, is this just for convenience?

I am training the output of the quantum network (bitstrings as for my use case i use qml.sample())

For a single bitstring sample, how will you differentiate through the quantum circuit? This is technically possible in some cases, as we discuss in the note on this page, but I just want to make sure that I understand correctly that this is what you want to do. An alternative could be to return the probability distribution from the quantum circuit, which is deterministic and can be differentiated through.

Hi @Tom_Bromley, Upon reflection I do believe you are correct I do not need a Torchlayer here so thank you for pointing that out.

And yes I am differentiating through the bitstrings using some of the caveats mentioned on that page (1 shot sampling + parameter shift differentiation). I am using bitstrings due to the size of my circuit, 16 qubits, which would output a 2 ** 16 length probability distribution over all the bases. This is too large to process for my use case due to RAM requirements when loading in my dataset (which would then have 2 **16 elements additionally) + training. That does however work on smaller circuits/datasets (I’ve tested up to 6 qubits this way).

This function batch_circ (although it could do with cleaning up) does operate almost how I want it to, outputting shape (batch_size, n_qubits) and replacing batch_input. I just am not sure why each bitstring is the same in the list, as its a stochastic process, even though it is the same circuit with the same params being executed, the output should vary. I ran a for-loop over the circuit to test this and it was the case, e.g.:

q_network = Generator() #Instantiate class/quantum network with bitstring output

bitstring_list = []
for i in range(batch_size):
      input = torch.tensor([])
      bitstring = q_network(input)
      bitstring_list.append(bitstring)

And as expected the resulting list of bitstrings is all different. I suspect there is something deeper going on in the callable processing_fn , specifically in the _nested_stack function that is causing this behavior?

Hi @Aaron_Thomas, sorry for the late reply.

And yes I am differentiating through the bitstrings using some of the caveats mentioned on that page (1 shot sampling + parameter shift differentiation). I am using bitstrings due to the size of my circuit, 16 qubits, which would output a 2 ** 16 length probability distribution over all the bases. This is too large to process for my use case due to RAM requirements when loading in my dataset (which would then have 2 **16 elements additionally) + training. That does however work on smaller circuits/datasets (I’ve tested up to 6 qubits this way).

Makes sense now, got it! I’m not sure if a transform is the way to go here - if we want to generate a collection of samples from runs of the circuit with the same parameters, we can use shot-batching, e.g., by setting shots=[1]*batch_size to get us batch_size-many samples.

I’m also a bit cautious because torch is not compatible with differentiating samples. Instead, one can return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)] to get shots-like results that are torch-compatible and differentiable.

Does the script below help?

import pennylane as qml
import torch
from math import pi
from torch.nn.functional import binary_cross_entropy

batch_size = 5

n_qubits = 10
depth = 4
wires = range(n_qubits)

dev = qml.device("default.qubit")

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(q_weights_y, q_weights_z):
    for i in range(depth):
        for y in range(n_qubits):
            qml.RY(q_weights_y[i][y], wires=y)
            qml.RZ(q_weights_z[i][y], wires=y)
        for y in range(n_qubits - 1):
            qml.CNOT(wires=[y, y + 1])
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

q_weights_y = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_z = torch.rand((depth, n_qubits), requires_grad=True, dtype=torch.float64) * 2 * pi - pi
q_weights_y.retain_grad()
q_weights_z.retain_grad()

# Generate some random data to train against
data = torch.randint(2, size=(batch_size, n_qubits), dtype=torch.float64)

# # Probability distribution sampled from PennyLane circuit
samples = circuit(q_weights_y, q_weights_z, shots=[1] * batch_size)
samples = (torch.stack([torch.stack(s) for s in samples]) + 1) / 2

loss = binary_cross_entropy(samples, data)
loss.backward()
gradients = (q_weights_y.grad, q_weights_z.grad)

Hi there @Tom_Bromley apologies for the long time to reply. May I ask, why is torch not compatible with diff w.r.t samples? My training function does run so this surprises me.

Thank you for the idea of shot batching i can see if this implementation will work with my use case. However, to expand on my idea to use a transform, why would that not be okay to generate a collection of samples from runs of the circuit? Once you _split_operations (create batch size amount copies of the circuit) why does the processing_fn then return the same output for all circuits?

Another test I did was to use the batch_input decorater with angle encoding, however I set the parameters of the angle encoding to all have the same constant value (so each circuit is again the same). This also results in the same output for all circuits? I would expect the behaviour of the batch operations to naturally execute the circuits independently so you DONT get the same outputs, these are quantum circuits so naturally, the output shouldn’t be deterministic. However, if this is the expected behavior why is this?

Hi @Aaron_Thomas, sorry we’re a bit slow to reply at the moment, it’s been a busy month with QHack and now preparation for our PennyLane 0.35 release next week :tada: I’ll try to get back with an answer as soon as possible!