Hello. I am using a Pennylane-Torch Interface.

I was expecting a linear speed up with respect to number of batch when batching with Torch Tensor (as long as my device can run all the batches in parallel). But it seems like I am only getting an constant speed-up. Is there any way I can get around it?

Here is a minimal working example.

```
import pennylane as qml
from pennylane import numpy as np
import torch
import time
dev = qml.device('default.mixed', wires=1)
rho = np.array([[0.5, 0.5],[0.5,0.5]])
@qml.qnode(dev, interface="torch")
def circ_torch(theta):
qml.QubitDensityMatrix(rho, wires=0)
qml.RZ(-1 * theta, wires=0)
return [qml.expval(qml.Identity(0)), qml.expval(qml.PauliZ(0))]
@qml.qnode(dev)
def circ(theta):
qml.QubitDensityMatrix(rho, wires=0)
qml.RZ(-1 * theta, wires=0)
return [qml.expval(qml.Identity(0)), qml.expval(qml.PauliZ(0))]
thetas = torch.linspace(0,1,1000)
time1 = time.time()
circ_torch(thetas)
time2 = time.time()
for i in range(len(thetas)):
x = circ(thetas[i])
time3 = time.time()
print(time2 - time1)
print(time3 - time2)
```

Output is

```
0.5288441181182861
0.7281041145324707
```

I expected ~1000 time speed up when using Torch Layer.