The backpropagation ability when using AmplitudeEmbedding

Hello everyone,

I’m new to quantum machine learning and I have a question about PennyLane. I’ve read the documentation and it states that the AmplitudeEmbedding is not differentiable. However, when I use this code for training and monitor the gradients and weights with WandB, I see that the weights and gradients are not constant.

This is confusing to me. Can I use AmplitudeEmbedding in a training model without interrupting backpropagation?

This is my code:

# import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import math
import wandb
import pennylane as qml
from pennylane import Rotation
from tqdm import tqdm
from datetime import datetime

class QuantumLayer(torch.nn.Module):
    def __init__(self, in_features, out_features, num_qlayers=1, qdevice="default.qubit", diff_method="best"):
        super().__init__()

        def circuit(inputs, weights):
            qml.AmplitudeEmbedding(inputs, wires=range(num_qubits), normalize=True, pad_with=0.0)
            qml.templates.StronglyEntanglingLayers(weights, wires=range(num_qubits))
            return qml.probs(wires=range(num_qubits))
        max_features = max(in_features, out_features)
        num_qubits = math.ceil(math.log2(max_features))
        
        dev = qml.device(qdevice, wires=num_qubits)
        qlayer = qml.QNode(circuit, dev, interface="torch", diff_method=diff_method)
        self.linear = qml.qnn.TorchLayer(qlayer, {"weights": (num_qlayers, num_qubits, 3)})
        self.in_features = in_features
        self.out_features = out_features

    def forward(self, inputs):
        return self.linear(inputs)[...,:self.out_features]

class twolayer_model(torch.nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.layer1 = QuantumLayer(in_features, hidden_features, num_qlayers=2)
        self.layer2 = QuantumLayer(hidden_features, out_features, num_qlayers=2)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        return x


current_datetime = datetime.now()
formatted_datetime = str(current_datetime.strftime("%Y_%m_%d_%H_%M_%S"))
wandb.init(
    project= "QuantumTrans",
    name = f"ckp_{formatted_datetime}_minist_amplitude_gradient",
)
# Hyperparameters
batch_size = 4
learning_rate = 0.0001
num_epochs = 10

# Transform to flatten the images and normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))  # Flatten the images
])

# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_dataset_small = torch.utils.data.Subset(train_dataset, range(0, 10000))  # Use a smaller subset for quick testing
train_loader = DataLoader(dataset=train_dataset_small, batch_size=batch_size, shuffle=False)


# Initialize model, loss function, and optimizer
model = twolayer_model(28*28, 64, 10)
wandb.watch(model, log="all")
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Training loop
steps = 0 
for epoch in range(num_epochs):
    for images, labels in tqdm(train_loader):
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        wandb.log({"steps": steps, "loss": loss.item()})
        steps += 1
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

print("Training complete!")
wandb.finish()

This is my gradients and weights histogram plot, histograms are changing across training steps:

Hi @Trung_Pham ,

Welcome to the Forum!

There’s a tricky nuance in the wording in the documentation. When we say that the features argument in AmplitudeEmbedding is in general not differentiable we mean that it’s not always differentiable. In some cases however it can be differentiable.

That being said, I’m not sure I understand the core of your question. I would definitely expect the weights and gradients to change after every iteration. This is what allows you to train. Do you have a smaller example, maybe something more similar to the example in the docs for TorchLayer, that shows the confusing/unexpected behaviour?

Thanks for replying to me,

So, in this case, the AmplitudeEmbedding layers are differentiable, right? Can you give me more information on when that layer is and isn’t differentiable? Also, regarding the confusion in my core question, I initially saw that the AmplitudeEmbedding documentation says it’s non-differentiable, so I expected that when using it in training, the entire gradients and weights of the models would be constant, but my experiments showed the opposite.

Hi @Trung_Pham ,

qml.AmplitudeEmbedding uses qml.StatePrep under the hood.

If the StatePrep operation is not supported natively on the target device, PennyLane will attempt to decompose the operation using the method developed by Möttönen et al. (Quantum Info. Comput., 2005). This is done with qml.MottonenStatePreparation.

Due to non-trivial classical processing of the state vector, this template is not always fully differentiable. This method is rather complicated so I don’t know exactly why/when it’s not differentiable.

Note that this method is used only in case StatePrep is not supported in the target device. Since you’re using default.qubit and StatePrep is supported there then you won’t have to worry about MottonenStatePreparation in this case, and AmplitudeEmbedding should be differentiable.