Noise? Using shots vs no shots?

Hi, I am working with my design on a Hybrid Classical - Quantum model using Pennylane and Pytorch

class QuantumModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.q_layer = qml.qnn.TorchLayer(full_circuit, weight_shapes)
        self.fc = nn.Linear(n_qubits, n_classes, dtype = torch.float64)  

    def forward(self, x):
        q_out = self.q_layer(x)
        logits = self.fc(q_out)  # raw scores, no softmax here
        return logits

where the full_circuit will measure all qubits, then I will have a Linear connection from 10 nodes to 3 nodes to classify 3 classes. However, I am using this device

dev = qml.device("default.qubit", wires=n_qubits)

which from what I’ve read is a ideal simulator, with no noise.
My professor requires me to train my circuit with noise in anyway possible

  1. I have tried using default.mixed and qiskit.aer device but it will take too long so are there any alternatives to simulating noise like using actual hardware?
  2. What is the difference between using no shots vs shots? I have read that using no shots means you are working analytically but can someone be more specific between the difference?

Hi @hiimhoanglam , welcome to the Forum!

You may find your answer to the question on Noise in our Codebook module on Noisy Quantum Theory. Note that using default.mixed with over 16 qubits is not recommended.

You may find the answer to your question on Shots on our Codebook topic on Measurements in PennyLane.

I hope this helps!

I have read the Codebook on Shots but I am still a bit unsure about the difference between no shots and shots

So no shots = analytic = Ideal simulation. But for larger circuit, the simulation will get longer exponentially? With shots, this is simulating actual hardware right? Because in actual quantum hardware we have to take the average of a certain number of circuit runs?

Hi @hiimhoanglam ,

Simulation of large circuits gets exponentially harder because the state gets exponentially bigger. This is regardless of whether or not you use shots, hence the reason why we’re building actual quantum computers.

Note that our simulators model the answer that a quantum computer would produce, but not the process of the quantum computer itself. We basically calculate a lot of vector and matrix multiplications, with very smart math under the hood, so that you can simulate your quantum algorithms. You can then decide whether you want to get the answer as shots, by taking a number of samples based on the probabilities given by the state, or you can get an expectation value (amongst other results) which gives you the analytical value based on those probabilities, basically equivalent to having a very very large number of shots.

Your choice on using shots or not then depends on whether you specifically want to take into account the statistics given by a certain number of shots or not.

I hope this helps clarify some things!

Thank you so much for the explanation!

1 Like

Hi, I also have another questions about the training of my Hybrid Pennylane - Torch model.

import torch
import torch.nn as nn
# from qiskit_aer.noise import NoiseModel, depolarizing_error
import pennylane as qml
from pennylane import numpy as np
n_qubits = 10
n_classes = 3
dev = qml.device("default.mixed", wires=n_qubits, shots = 1)

def state_preparation(features):
    qml.AmplitudeEmbedding(features, wires=range(n_qubits), normalize=True)

def embedding_layer(params):
    # params shape: (10,)
    for i in range(n_qubits):
        qml.H(wires=i)
    for i in range(n_qubits):
        qml.RY(params[i], wires=i)

def vqc(params, index): # Main VQC
    # params shape: (12,)
    qml.RX(params[0], wires=index)
    qml.RY(params[1], wires=index + 1)
    qml.RY(params[2], wires=index)
    qml.RZ(params[3], wires=index + 1)
    qml.RZ(params[4], wires=index)
    qml.RX(params[5], wires=index + 1)
    qml.CZ(wires=[index, index + 1])
    qml.RZ(params[6], wires=index)
    qml.RX(params[7], wires=index + 1)
    qml.RY(params[8], wires=index)
    qml.RZ(params[9], wires=index + 1)
    qml.RX(params[10], wires=index)
    qml.RY(params[11], wires=index + 1)
    qml.CNOT(wires=[index + 1, index])

@qml.qnode(dev, interface="torch")
def full_circuit(inputs, all_params):
    embed_len = 10
    vqc1_len = 12 * (n_qubits // 2)  # 60
    vqc2_len = 12 * (n_qubits // 2)  # another 60
    sel_len = embed_len + vqc1_len + vqc2_len # Actually on 30 parameters

    embedding_params = all_params[:embed_len]
    vqc1_params = all_params[embed_len:embed_len + vqc1_len]
    vqc2_params = all_params[embed_len + vqc1_len:sel_len]
    sel_params = all_params[sel_len:]

    vqc1_params = vqc1_params.view(n_qubits // 2, 12)
    vqc2_params = vqc2_params.view(n_qubits // 2, 12)
    sel_params = sel_params.view(1, n_qubits, 3)

    state_preparation(inputs)
    embedding_layer(embedding_params)

    for i in range(n_qubits // 2):
        vqc(vqc1_params[i], i * 2)

    for i in range(n_qubits // 2):
        vqc(vqc2_params[i], i * 2)

    for i in range(n_qubits):
        qml.CNOT(wires=[i, (i + 1) % n_qubits])
    qml.DepolarizingChannel(0.01, wires = 0)

    qml.StronglyEntanglingLayers(weights=sel_params, wires=range(n_qubits))

    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

# Total number of parameters
total_num_params = (
    10 +                  # embedding_layer
    12 * (n_qubits // 2) +  # vqc1
    12 * (n_qubits // 2) +  # vqc2 (repeated)
    1 * n_qubits * 3        # StronglyEntanglingLayers
)
weight_shapes = {"all_params": total_num_params}

# PyTorch model wrapper
class QuantumModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.q_layer = qml.qnn.TorchLayer(full_circuit, weight_shapes, torch.nn.init.normal_)
        self.fc = nn.Linear(n_qubits, n_classes, dtype = torch.float64)  

    def forward(self, x):
        q_out = self.q_layer(x)
        logits = self.fc(q_out)  # raw scores, no softmax here
        return logits

import torch.optim as optim
import matplotlib.pyplot as plt

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = QuantumModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.5, weight_decay = 0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

num_epochs = 20

# Track losses and accuracies
train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

for epoch in range(1, num_epochs + 1):
    model.train()
    train_loss = 0.0
    correct_train = 0
    total_train = 0

    for inputs, labels in train_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        logits = model(inputs)
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item() * inputs.size(0)
        pred_labels = torch.argmax(logits, dim=1)
        correct_train += (pred_labels == labels).sum().item()
        total_train += labels.size(0)

    train_loss /= total_train
    train_acc = correct_train / total_train

    model.eval()
    val_loss = 0.0
    correct_val = 0
    total_val = 0

    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            logits = model(inputs)
            loss = criterion(logits, labels)

            val_loss += loss.item() * inputs.size(0)
            pred_labels = torch.argmax(logits, dim=1)
            correct_val += (pred_labels == labels).sum().item()
            total_val += labels.size(0)

    val_loss /= total_val
    val_acc = correct_val / total_val

    # Store metrics
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)

    print(f"Epoch {epoch}/{num_epochs} — Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
    scheduler.step()

My data has length 1024 and I have been training with around 1000 samples for 250 minutes, but no result has been printed out(like for each epoch). I want to ask some questions:

  1. What is the method of differentiating for default.mixed device? I believe it is not backpropagation anymore so it would take longer?

  2. Can you suggest how can I add noise to my circuit while still able to run the circuit training process with my model? My problem is a multiclass problem.

Hi @hiimhoanglam ,

I don’t know if there’s a profiler you can use, maybe like SnakeViz, so that you can see where the big slowdown is happening.

My guess is that it’s happening in the state preparation.

There’s unfortunately no easy answer to your question. default.mixed is just slow and there’s nothing to do about it, so if you really want to simulate noise then that complicates things.

The easiest thing you can do is just reduce the size of your problem (to a handful of features) and see if you’re at least able to run something. Then you can try to eliminate slowdowns by trying a different state preparation, etc. In fact, efficient state preparation is an active area of research so this is not a fully solved problem.

You could also try using qiskit.aer (see this demo for example) but it may have other limitations so I recommend that you try it with a small example first before changing your full code to work with this device.

I hope this helps and let us know if you have any further questions!

Thank you for your information.

I have tested as you said for reduced size of my problem such as: Number of shots(1-10-100-1000) / Number of samples(1-10-50-100-500) / Device types (qiskit.aer-default.mixed-cirq.mixedsimulator) and the worst running time is the cirq mixed simulator and the best is default.mixed.

Actually however, I have tested spsa as the differentiation method and it is much faster then parameter-shift or finite-difference. I have read that the two latter differentiation method is closer to actual quantum hardware execution due to two times the circuit has to be run.

However, is SPSA a differentiation method that is close to actual quantum hardware execution?

I want to simulate my circuit close to actual quantum hardware execution.

Hi @hiimhoanglam ,

I’m glad to hear you found an option that worked! spsa_grad gives you a coarse approximation of the gradient. It’s based on finite differences so errors may accumulate. However, it is indeed compatible with actual quantum hardware.

All three methods you mentioned (parameter-shift, finite-difference, and SPSA grad) are compatible with actual quantum hardware. There’s not one that is “closer” or “further” to quantum hardware execution. The main differences are:

  • parameter-shift allows you to calculate gradients analytically (exactly), with no approximations.
  • finite-difference is not exact, it can lead to errors.
  • SPSA grad is a statistical estimator that uses a given number of function evaluations that does not depend on the number of parameters. While this bounds the cost of the estimation, it also implies that the returned values are not exact (even with shots=None ).

This hopefully helps you see why parameter-shift can be great for small circuits but SPSA grad might be your best option for larger ones.

If your goal is to run something that’s as close as possible to running on quantum hardware then using shots, noise models, and one of these hardware-compatible differentiation methods is a good way to go. There are however more nuances and details you will need to cover such as what gates are available in your target hardware, what’s the connectivity of the qubits, etc. I don’t know if you want to go that deep. For all practical purposes most of this compilation happens under the hood so it’s not something that you need to specify manually.

Let me know if this answers your question!

Thank you so much for your answer. I understand what I can do now.

As you said there are more details I have to go deeper to simulate quantum hardware execution, where and how can I learn about those information?

Hi @hiimhoanglam , there’s a whole world of knowledge that will largely depend on how thorough you want to be at each stage and what kind of quantum architecture you want to target.

At the highest level you will find programs like the ones you have, which define everything in terms of logical gates. You can then go into compilation which may or may not involve error correction depending on what you want to do. Once your program is compiled it will run in a runtime. This will make calls to a quantum device interface which is itself connected to a specific device. Once you’re on the “device side” if I may call it that, there are still a lot of stages which are often also called compilation. This includes routing, control, and more. Finally, you have something running on a specific hardware, which may include microwave pulses, lasers, etc. Then you decode your result and send it back to the user through another long series of steps. All of this is very device-dependent, and mostly still work-in-progress for many architectures. This is why there’s no easy recipe for running things “close to hardware”, at least yet.

That being said, there are indeed resources which you can use to learn about the different components I mentioned.

To learn about compilation you can go to our Quantum Compilation hub, go to the Catalyst docs, and explore our blogs and demos on compilation. Note that Catalyst has different components including a compiler and a runtime.

Our page on Fault-Tolerant Quantum Computing (FTQC) includes info on device implementations, Quantum Error Correction, and more. I would also recommend our demo on the game of surface codes.

I know this is a lot so I would recommend starting with one step at a time, maybe just compilation with Catalyst, or using decompositions to decompose to a target gate-set, and then go from there if you want.

I hope this helps!

Thank you. I have all the information now

1 Like