Torch Tensor Batching Speep-up

takh2324 · May 11, 2024, 4:41am

Hello. I am using a Pennylane-Torch Interface.
I was expecting a linear speed up with respect to number of batch when batching with Torch Tensor (as long as my device can run all the batches in parallel). But it seems like I am only getting an constant speed-up. Is there any way I can get around it?

Here is a minimal working example.

import pennylane as qml
from pennylane import numpy as np
import torch
import time

dev = qml.device('default.mixed', wires=1)

rho = np.array([[0.5, 0.5],[0.5,0.5]])
@qml.qnode(dev, interface="torch")
def circ_torch(theta):
    qml.QubitDensityMatrix(rho, wires=0)
    qml.RZ(-1 * theta, wires=0)
    return [qml.expval(qml.Identity(0)), qml.expval(qml.PauliZ(0))]
    

@qml.qnode(dev)
def circ(theta):
    qml.QubitDensityMatrix(rho, wires=0)
    qml.RZ(-1 * theta, wires=0)
    return [qml.expval(qml.Identity(0)), qml.expval(qml.PauliZ(0))]

thetas = torch.linspace(0,1,1000)

time1 = time.time()
circ_torch(thetas)
time2 = time.time()
for i in range(len(thetas)):
    x = circ(thetas[i])
time3 = time.time()

print(time2 - time1)
print(time3 - time2)

Output is

0.5288441181182861
0.7281041145324707

I expected ~1000 time speed up when using Torch Layer.

isaacdevlugt · May 13, 2024, 9:12pm

Hey @takh2324, welcome back!

default.mixed actually doesn’t support parameter-broadcasting natively; you can provide operators an argument with a leading batch dimension, but what happens under the hood is a glorified for loop.

You can see this with the following example:

import pennylane as qml
import torch

from pennylane import numpy as np

#dev = qml.device('default.qubit', wires=1)
dev = qml.device('default.mixed', wires=1)

@qml.qnode(dev)
def circ_torch(theta):
    qml.RZ(theta, wires=0)
    return qml.expval(qml.Z(0))

with qml.Tracker(dev) as tracker:
    circ_torch(torch.tensor([0.1, 0.2, 0.3]))

print(tracker.totals)

With default.qubit, the number of times the device executes something can be determined with the simulations key:

{'batches': 1, 'simulations': 1, 'executions': 3}

With default.mixed, the number of times the device executes something can be determined with the executions key:

{'executions': 3, 'batches': 1, 'batch_len': 3}

As you can see, the device executes something three times as opposed to default.qubit only needing one execution. At the moment, default.mixed is using our old device API. It will be updated soon! At that point, broadcasting should be natively supported.

Now, as for why you’re seeing a speedup, I’m not 100% sure . It’s probably better to use time.process_time() instead of time.time() . The former is the CPU time, so it really measures how much time is spent computing what you are doing in the process. The latter is just wall clock time, so it would depend on what else is running on your laptop .

Let me know if this helps!

Topic		Replies	Views
2D batching of TorchLayer input PennyLane Help	5	382	July 6, 2023
Parameter broadcasting problem with torch node PennyLane Help	13	1007	August 3, 2023
Batching both parameters and input PennyLane Help	3	476	January 18, 2024
Batching in TorchLayer PennyLane Help	35	2315	February 23, 2024
Inputs dimension mix with batch dimension in qml.qnn.TorchLayer PennyLane Help	8	508	December 22, 2023

Torch Tensor Batching Speep-up

Related topics