Title: RuntimeError: shape is invalid for input of size in PennyLane TorchLayer Despite Correct Input Shape
Body:
Hello,
I am building a hybrid quantum-classical model for time-series classification using PennyLane and PyTorch. The pipeline involves using a classical CNN to extract features, which are then fed into a Variational Quantum Classifier (VQC).
I am stuck on a persistent RuntimeError during the VQC training. A diagnostic print statement confirms the data entering the training function has the correct shape, but the error suggests an internal shape mismatch inside the TorchLayer forward pass. I have been stuck in a loop of errors for a while and would be very grateful for some help.
The Main Problem
The VQC training fails with the following error, even though my diagnostic print shows the input data (Xtr_q) has the correct shape of (1600, 8):
[VQC] Training failed on fold; falling back to classical for this fold.
Error: shape ‘[32, 256]’ is invalid for input of size 2048
This error implies that a tensor with 2048 elements (i.e., a shape of (32, 64) for a batch of 32) is being created and incorrectly reshaped, but my circuit is defined to output a vector of 256 probabilities (2
8
).
Minimal Reproducible Code
Here is a self-contained script that reproduces the error. It uses dummy data with the same shapes as my actual pipeline.
Python
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import pennylane as qml
from pennylane.qnn import TorchLayer
from sklearn.model_selection import StratifiedKFold
# ---------- 1. CONFIGURATION AND DUMMY DATA ----------
CONFIG = {
“seed”: 42,
“vqc”: {
“n_qubits”: 8,
“ansatz_depth”: 2,
“shots”: 512,
“backend”: “default.qubit”,
“opt_steps”: 4, # Reduced for quick testing
“learning_rate”: 0.05,
},
“cv_folds”: 5,
“device”: “cpu”,
}
# Generate dummy data with the same shapes as the real pipeline
np.random.seed(CONFIG\[“seed”\])
X_q_ready = np.random.rand(2000, CONFIG\[“vqc”\]\[“n_qubits”\]) \* np.pi
X_classical = np.random.rand(2000, 16)
y = np.random.randint(0, 2, 2000)
cls_probs_cv = np.random.rand(2000)
# ---------- 2. VQC DEFINITIONS (The part that is failing) ----------
def get_circuit(n_qubits, depth):
def circuit(inputs, weights):
for i in range(n_qubits):
qml.RY(inputs\[i\], wires=i)
for d in range(depth):
for i in range(n_qubits):
qml.Rot(\*weights\[d, i\], wires=i)
for i in range(n_qubits - 1):
qml.CNOT(wires=\[i, i + 1\])
# This should return a vector of 2^8 = 256 probabilities
return qml.probs(wires=range(n_qubits))
weight_shapes = {"weights": (depth, n_qubits, 3)}
return circuit, weight_shapes
class VQCModule(nn.Module):
def *init*(self, qlayer, n_qubits):
super().*init*()
self.qlayer = qlayer
self.fc = nn.Linear(2\*\*n_qubits, 2)
def forward(self, x):
q_out = self.qlayer(x)
logits = self.fc(q_out)
return F.log_softmax(logits, dim=1)
def train_vqc(model, X_train, y_train, cfg):
opt = torch.optim.Adam(model.parameters(), lr=cfg\[‘vqc’\]\[‘learning_rate’\])
loss_fn = nn.NLLLoss()
ds = TensorDataset(torch.tensor(X_train, dtype=torch.float32), torch.tensor(y_train, dtype=torch.long))
dl = DataLoader(ds, batch_size=32, shuffle=True)
model.train()
print(" \[VQC Training\] Starting…")
for step in range(cfg\[‘vqc’\]\[‘opt_steps’\]):
for X_batch, y_batch in dl:
X_batch, y_batch = X_batch.to(cfg\[“device”\]), y_batch.to(cfg\[“device”\])
opt.zero_grad()
loss = loss_fn(model(X_batch), y_batch)
loss.backward()
opt.step()
return model
# ---------- 3. EXECUTION RUNNER ----------
def run_vqc_cv(X_q, y, cls_probs, cfg):
skf = StratifiedKFold(n_splits=cfg\[“cv_folds”\], shuffle=True, random_state=cfg\[“seed”\])
for fold, (train_idx, test_idx) in enumerate(skf.split(X_q, y), 1):
print(f"\\n--- Fold {fold}/{cfg['cv_folds']} ---")
Xtr_q, ytr = X_q[train_idx], y[train_idx]
# Diagnostic print to confirm input shape
print(f"[DIAGNOSTIC] VQC training initiated with X_train shape: {Xtr_q.shape}")
try:
vqc_cfg = cfg["vqc"]
circuit, w_shapes = get_circuit(vqc_cfg["n_qubits"], vqc_cfg["ansatz_depth"])
dev_train = qml.device(vqc_cfg["backend"], wires=vqc_cfg["n_qubits"])
qnode_train = qml.QNode(circuit, dev_train, interface="torch", diff_method="parameter-shift")
qlayer_train = TorchLayer(qnode_train, w_shapes)
model_train = VQCModule(qlayer_train, n_qubits=vqc_cfg["n_qubits"]).to(cfg["device"])
trained_model = train_vqc(model_train, Xtr_q, ytr, cfg)
print(f"[VQC] Fold {fold} completed successfully.")
except Exception as e:
print(f"[VQC] Training failed on fold. Error: {e}")
# In my full code, this falls back to a classical model
continue
# ---------- 4. RUN THE CODE ----------
run_vqc_cv(X_q_ready, y, cls_probs_cv, CONFIG)
My Debugging Journey & The Error Loop
I have been trying to solve this for a long time and have encountered a series of errors that seem to lead in a circle.
Initial Error: probabilities do not sum to 1.
Fix: Added a Linear → Softmax head to the VQCModule.
Shape Mismatch Errors: shape ‘[16, 2]’ is invalid for input of size 16.
Fix: Tried various ways to match the nn.Linear layer’s input dimension to the TorchLayer output.
PennyLane API Errors: … no attribute ‘num_wires’ and ‘TorchLayer’ object has no attribute ‘out_features’.
Fix: Updated the code to avoid these deprecated attributes and pass n_qubits manually.
The Error Loop: This led me to two conflicting approaches for the quantum circuit’s return statement:
If I use return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)], I get the error:
RuntimeError: shape ‘[32, -1]’ is invalid for input of size 8
(This implies TorchLayer is incorrectly outputting a single sample’s result instead of a batch).
If I use return qml.math.stack(…) to force a correctly shaped output, I get the error:
RuntimeError: Can’t call numpy() on Tensor that requires grad.
(This implies a conflict with PyTorch’s gradient calculation).
This loop led me to the current qml.probs() approach, which should theoretically solve both issues, but it results in the shape ‘[32, 256]’ is invalid for input of size 2048 error.
My Question:
Given that the input data to the training function is verifiably correct ((1600, 8)), why would the TorchLayer using qml.probs produce a tensor that appears to have 64 features (2048 / 32 = 64) instead of the expected 256 features (2
8
)? Is this a known issue with this PennyLane/PyTorch version, or is there a subtle error I am still missing in my VQC implementation?
My Environment:
pennylane: 0.42.3
torch: 2.8.0+cu126
