The backpropagation ability when using AmplitudeEmbedding

Trung_Pham · September 15, 2025, 12:37pm

Hello everyone,

I’m new to quantum machine learning and I have a question about PennyLane. I’ve read the documentation and it states that the AmplitudeEmbedding is not differentiable. However, when I use this code for training and monitor the gradients and weights with WandB, I see that the weights and gradients are not constant.

This is confusing to me. Can I use AmplitudeEmbedding in a training model without interrupting backpropagation?

This is my code:

# import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import math
import wandb
import pennylane as qml
from pennylane import Rotation
from tqdm import tqdm
from datetime import datetime

class QuantumLayer(torch.nn.Module):
    def __init__(self, in_features, out_features, num_qlayers=1, qdevice="default.qubit", diff_method="best"):
        super().__init__()

        def circuit(inputs, weights):
            qml.AmplitudeEmbedding(inputs, wires=range(num_qubits), normalize=True, pad_with=0.0)
            qml.templates.StronglyEntanglingLayers(weights, wires=range(num_qubits))
            return qml.probs(wires=range(num_qubits))
        max_features = max(in_features, out_features)
        num_qubits = math.ceil(math.log2(max_features))
        
        dev = qml.device(qdevice, wires=num_qubits)
        qlayer = qml.QNode(circuit, dev, interface="torch", diff_method=diff_method)
        self.linear = qml.qnn.TorchLayer(qlayer, {"weights": (num_qlayers, num_qubits, 3)})
        self.in_features = in_features
        self.out_features = out_features

    def forward(self, inputs):
        return self.linear(inputs)[...,:self.out_features]

class twolayer_model(torch.nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.layer1 = QuantumLayer(in_features, hidden_features, num_qlayers=2)
        self.layer2 = QuantumLayer(hidden_features, out_features, num_qlayers=2)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        return x


current_datetime = datetime.now()
formatted_datetime = str(current_datetime.strftime("%Y_%m_%d_%H_%M_%S"))
wandb.init(
    project= "QuantumTrans",
    name = f"ckp_{formatted_datetime}_minist_amplitude_gradient",
)
# Hyperparameters
batch_size = 4
learning_rate = 0.0001
num_epochs = 10

# Transform to flatten the images and normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))  # Flatten the images
])

# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_dataset_small = torch.utils.data.Subset(train_dataset, range(0, 10000))  # Use a smaller subset for quick testing
train_loader = DataLoader(dataset=train_dataset_small, batch_size=batch_size, shuffle=False)


# Initialize model, loss function, and optimizer
model = twolayer_model(28*28, 64, 10)
wandb.watch(model, log="all")
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Training loop
steps = 0 
for epoch in range(num_epochs):
    for images, labels in tqdm(train_loader):
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        wandb.log({"steps": steps, "loss": loss.item()})
        steps += 1
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

print("Training complete!")
wandb.finish()

This is my gradients and weights histogram plot, histograms are changing across training steps:

CatalinaAlbornoz · September 15, 2025, 9:31pm

Hi @Trung_Pham ,

Welcome to the Forum!

There’s a tricky nuance in the wording in the documentation. When we say that the features argument in AmplitudeEmbedding is in general not differentiable we mean that it’s not always differentiable. In some cases however it can be differentiable.

That being said, I’m not sure I understand the core of your question. I would definitely expect the weights and gradients to change after every iteration. This is what allows you to train. Do you have a smaller example, maybe something more similar to the example in the docs for TorchLayer, that shows the confusing/unexpected behaviour?

Trung_Pham · September 16, 2025, 6:37am

Thanks for replying to me,

So, in this case, the AmplitudeEmbedding layers are differentiable, right? Can you give me more information on when that layer is and isn’t differentiable? Also, regarding the confusion in my core question, I initially saw that the AmplitudeEmbedding documentation says it’s non-differentiable, so I expected that when using it in training, the entire gradients and weights of the models would be constant, but my experiments showed the opposite.

CatalinaAlbornoz · September 18, 2025, 12:43am

Hi @Trung_Pham ,

qml.AmplitudeEmbedding uses qml.StatePrep under the hood.

If the StatePrep operation is not supported natively on the target device, PennyLane will attempt to decompose the operation using the method developed by Möttönen et al. (Quantum Info. Comput., 2005). This is done with qml.MottonenStatePreparation.

Due to non-trivial classical processing of the state vector, this template is not always fully differentiable. This method is rather complicated so I don’t know exactly why/when it’s not differentiable.

Note that this method is used only in case StatePrep is not supported in the target device. Since you’re using default.qubit and StatePrep is supported there then you won’t have to worry about MottonenStatePreparation in this case, and AmplitudeEmbedding should be differentiable.

Topic		Replies	Views
Differentiation with AmplitudeEmbedding PennyLane Help	2	1076	October 26, 2020
Parameters seem to be fail update when using AmplitudeEmbedding PennyLane Development	2	985	July 3, 2020
Calculate gradients using qml.AmplitudeEmbedding PennyLane Help	5	458	July 21, 2023
Differentiation method and Amplitude embedding PennyLane Help	3	847	June 29, 2021
ValueError: Cannot differentiate with respect to parameter PennyLane Help	3	924	May 26, 2021

The backpropagation ability when using AmplitudeEmbedding

Related topics