qml.RZ does not work with torch.vmap (but other gates do?)

z.ouyang · December 6, 2025, 3:56am

Hello,

I have a use-case wherein I need to process inputs of large shapes (e.g. [64, 128, 51]) through a VQC circuit. To do this, I’ve been making use of torch.vmap to help me speed things up.

I noticed something a little curious about the qml.RZ operation. In the example below, I attempt to process a tensor with shape [2, 2, 2] through a single-gate Z rotation circuit and return expectations on each entry in the tensor.

import pennylane as qml
import torch

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev, interface="torch")
def ansatz(x):
    qml.RZ(x, wires=0)
    return qml.expval(qml.PauliZ(0))

x = torch.tensor([[[0.1, 0.2], [0.3, 0.4]],[[0.1, 0.2], [0.3, 0.4]]])
res = torch.vmap(
    lambda x_i: torch.vmap(
        lambda x_j: ansatz(x_j), in_dims=0)(x_i),
        in_dims=0
    )(x)

print(res)

I get a huge error message, which culminates in “RuntimeError: Cannot access data pointer of Tensor that doesn’t have storage”, as follows:

Traceback (most recent call last):
  File "c:\Users\ouyzh\Documents\Quantum Activations\repo\quantum-activations\test.py", line 13, in <module>
    res = torch.vmap(
          ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\apis.py", line 202, in wrapped
    return vmap_impl(
           ^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\vmap.py", line 334, in vmap_impl
    return _flat_vmap(
           ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\vmap.py", line 484, in _flat_vmap
    batched_outputs = func(*batched_inputs, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ouyzh\Documents\Quantum Activations\repo\quantum-activations\test.py", line 14, in <lambda>
    lambda x_i: torch.vmap(
                ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\apis.py", line 202, in wrapped
    return vmap_impl(
           ^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\vmap.py", line 334, in vmap_impl
    return _flat_vmap(
           ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\_functorch\vmap.py", line 484, in _flat_vmap
    batched_outputs = func(*batched_inputs, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ouyzh\Documents\Quantum Activations\repo\quantum-activations\test.py", line 15, in <lambda>
    lambda x_j: ansatz(x_j), in_dims=0)(x_i),
                ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\workflow\qnode.py", line 895, in __call__
    return self._impl_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\workflow\qnode.py", line 868, in _impl_call
    res = execute(
          ^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\workflow\execution.py", line 238, in execute
    results = run(tapes, device, config, inner_transform)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\workflow\run.py", line 298, in run
    results = inner_execute(tapes)
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\workflow\run.py", line 263, in inner_execute
    results = device.execute(transformed_tapes, execution_config=execution_config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\modifiers\simulator_tracking.py", line 28, in execute
    results = untracked_execute(self, circuits, execution_config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\modifiers\single_tape_support.py", line 30, in execute
    results = batch_execute(self, circuits, execution_config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\logging\decorators.py", line 61, in wrapper_entry
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\default_qubit.py", line 823, in execute
    return tuple(
           ^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\default_qubit.py", line 824, in <genexpr>
    _simulate_wrapper(
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\default_qubit.py", line 1189, in _simulate_wrapper
    return simulate(circuit, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\logging\decorators.py", line 61, in wrapper_entry
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\qubit\simulate.py", line 370, in simulate
    state, is_state_batched = get_final_state(
                              ^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\logging\decorators.py", line 61, in wrapper_entry
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\qubit\simulate.py", line 201, in get_final_state
    state = apply_operation(
            ^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\qubit\apply_operation.py", line 237, in apply_operation
    return _apply_operation_default(op, state, is_state_batched, debugger)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\qubit\apply_operation.py", line 263, in _apply_operation_default
    return apply_operation_einsum(op, state, is_state_batched=is_state_batched)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\devices\qubit\apply_operation.py", line 84, in apply_operation_einsum
    mat = op.matrix() + 0j
          ^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\operation.py", line 831, in matrix
    canonical_matrix = self.compute_matrix(*self.parameters, **self.hyperparameters)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\ops\qubit\parametric_ops_single_qubit.py", line 465, in compute_matrix
    return diags[:, :, np.newaxis] * qml.math.cast_like(qml.math.eye(2, like=diags), diags)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\math\utils.py", line 291, in cast_like
    dtype = ar.to_numpy(tensor2).dtype.type
            ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\autoray\autoray.py", line 1132, in to_numpy
    return do("to_numpy", x)
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\autoray\autoray.py", line 81, in do
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages\pennylane\math\single_dispatch.py", line 641, in _to_numpy_torch
    return x.detach().cpu().numpy()
           ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cannot access data pointer of Tensor that doesn't have storage

However, these errors complete disappear if I simply change the gate to an X rotation:

qml.RX(x, wires=0)

I successfully get the expectation values on each entry in the tensor:

tensor([[[0.9950, 0.9801],
         [0.9553, 0.9211]],

        [[0.9950, 0.9801],
         [0.9553, 0.9211]]], dtype=torch.float64)

This is also not an issue for qml.RY.

I’m just wondering if this is expected behaviour or not, and if so, I’m also wondering why this might be the case. Here’s the output of qml.about():

Name: pennylane
Version: 0.43.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: 
Author: 
Author-email: 
License-Expression: Apache-2.0
Location: C:\Users\ouyzh\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing_extensions
Required-by: pennylane_lightning

Platform info:           Windows-10-10.0.26200-SP0
Python version:          3.11.9
Numpy version:           2.3.4
Scipy version:           1.16.2
JAX version:             0.8.1
Installed devices:
- default.clifford (pennylane-0.43.1)
- default.gaussian (pennylane-0.43.1)
- default.mixed (pennylane-0.43.1)
- default.qubit (pennylane-0.43.1)
- default.qutrit (pennylane-0.43.1)
- default.qutrit.mixed (pennylane-0.43.1)
- default.tensor (pennylane-0.43.1)
- null.qubit (pennylane-0.43.1)
- reference.qubit (pennylane-0.43.1)
- lightning.qubit (pennylane_lightning-0.43.0)

Thanks so much for your help!

CatalinaAlbornoz · December 9, 2025, 12:13am

Hi @z.ouyang , thank you for reporting this. It does seem strange. Can you please share the version of Torch you’re using?

z.ouyang · December 9, 2025, 5:08pm

Thanks for your response. I’m using version 2.8.0+cu129 on Windows 11 Home.

CatalinaAlbornoz · December 9, 2025, 11:06pm

Hi @z.ouyang ,

I think this might be related to this PyTorch issue.

Based on this post, it looks like you may be able to use a circuit identity to circumvent the problem. Instead of using RZ you could write the equivalent circuit in terms of RX and RY.

I know this isn’t the ideal solution, but it’s a workaround that may help in the meantime.

import pennylane as qml
import torch

dev = qml.device("default.qubit",wires=1)

@qml.qnode(dev)
def ansatz(x):
    #qml.RZ(x, wires=0)
    qml.RX(torch.pi/2, wires=0)
    qml.RY(x, wires=0)
    qml.RX(-torch.pi/2, wires=0)
    return qml.expval(qml.PauliZ(0))

x = torch.tensor([[[0.1, 0.2], [0.3, 0.4]],[[0.1, 0.2], [0.3, 0.4]]])
res = torch.vmap(
    lambda x_i: torch.vmap(
        lambda x_j: ansatz(x_j), in_dims=0)(x_i),
        in_dims=0
    )(x)

print(res)

Note that I haven’t verified the expression so you may want to do the math to validate that just in case.

I hope this helps!

CatalinaAlbornoz · December 12, 2025, 4:38pm

Hi @z.ouyang , it looks like the problem might be in a cast somewhere.
We’ve opened an issue on the PennyLane repo.

Thanks for bringing this to our attention!

CatalinaAlbornoz · December 12, 2025, 6:48pm

Hi @z.ouyang ,

Here’s the PR with the fix!

One of our developers noticed that you are using both torch.vmap and native parameter broadcasting, which seems potentially unintentional.

Once the fix is merged (probably by mid January) then you should be able to use the code example in the PR with no problems.

import pennylane as qml
import torch

x = torch.tensor([[0.1, 0.2, 0.3]])
res = torch.vmap(qml.RZ.compute_matrix, in_dims=0)(x)

print(res)

Let me know if you have any questions about this!

z.ouyang · December 12, 2025, 8:22pm

Hi @CatalinaAlbornoz,

Thanks so much for your responses, and for the PennyLane team’s prompt addressing of this issue! I can give some more context as to why I chose to use vmap alongside PennyLane’s native parameter broadcasting.

In short, I want to see if I can use PennyLane VQCs as activation functions within a traditional neural network structure. I’m at a point where I’m able to do this, but I want to optimize the circuit runs as much as possible. Currently, the inputs to the neurons all have shape [A, B, 51].

Ideally, I would be able to use the native broadcasting on the entire shape, but I haven’t found a way to do this since I believe the native broadcasting implementation is done by inference. Therefore, I use torch.vmap on the first two layers and broadcast on the final [51,].

I should let you know that I’m not that experienced with PyTorch or PennyLane! If there is a more idiomatic way to do this, please let me know.

Thanks again!

CatalinaAlbornoz · December 15, 2025, 4:52pm

Hi @z.ouyang ,

Thanks for sharing this context.

I guess the most idiomatic way would be to flatten the inputs and only use vmap. I don’t know if this would work for your specific application but you could give it a try! I’ve added some example code below.

I hope this helps!

import pennylane as qml
import torch

# --- Device ---
dev = qml.device("default.qubit",wires=1)

# --- qnode ---
@qml.qnode(dev)
def ansatz(x):
    qml.RX(torch.pi/2, wires=0)
    qml.RY(x, wires=0)
    qml.RX(-torch.pi/2, wires=0)
    return qml.expval(qml.PauliZ(0))

# --- Parameters ---
A = 2 # First batch size
B = 3 # Second batch size
Features = 3 # Feature size (you use 51)

# Create the structured input (Original Shape: [A, B, Features])
input_data = torch.randn(A, B, Features, requires_grad=True)
print(f"\nOriginal Input Shape: {input_data.shape}")

# Flatten the batch dimensions (New Shape: [A*B, Features])
# The '-1' tells PyTorch to calculate the total size of the first two dimensions
batched_input = input_data.reshape(-1, Features)
print(f"Vmap Input Shape:     {batched_input.shape}")

# Apply torch.vmap
# in_dims=0 tells vmap to iterate over the first dimension (the batch dimension)
vmap_qnode = torch.vmap(ansatz, in_dims=0)

# Run the batched operation
batched_results_flat = vmap_qnode(batched_input)
print(f"Vmap Output Shape (Flat): {batched_results_flat.shape}")

# Reshape the output back (Final Shape: [A, B, 3])
# We reshape to [A, B, -1] to tell PyTorch to calculate the last dimension
final_results = batched_results_flat.reshape(A, B, -1)
print(f"Final Output Shape:     {final_results.shape}")

CatalinaAlbornoz · February 5, 2026, 8:31pm

Hi @z.ouyang,

I wanted to confirm that the issue with vmap + broadcasting +RZ has now been fixed!

You can now use RZ normally with the newest stable version of PennyLane (v0.44 at the moment).

Thank you for reporting this and helping us improve PennyLane

Topic		Replies	Views
Parameter broadcasting problem with torch node PennyLane Help	13	1133	August 3, 2023
Issue with parallelization/broadcasting with Pytorch PennyLane Help	1	31	February 25, 2026
Is parameter broadcasting of qml.qnn.TorchLayer available in the inherited class? PennyLane Help	3	391	July 5, 2023
Parameter broadcast bug PennyLane Help	8	2001	September 20, 2023
Behaviour of pennylane torch layer with batched inputs PennyLane Help	3	125	November 15, 2024

qml.RZ does not work with torch.vmap (but other gates do?)

Related topics