Backward function takes long and batches

Thanks for the reply @Tom_Bromley

I am using pennylane-0.28.0 and for a circuit with batched inputs I still get dev.num_executions==10 if the batch size is 10 when using the qiskit.ibmq device. Your example gives dev.num_executions==1 with the default.qubit device, and I can reproduce that.

The transpose in the return of the forward function is rearranging the output and changing the first dimension from being the batch size to being the number of wires. Removing the .T seems to be correct but I may be wrong.

I see that the TorchLayer forward pass is unstacking the first dimension:

        if len(inputs.shape) > 1:
            # If the input size is not 1-dimensional, unstack the input along its first dimension,
            # recursively call the forward pass on each of the yielded tensors, and then stack the
            # outputs back into the correct shape
            reconstructor = [self.forward(x) for x in torch.unbind(inputs)]
            return torch.stack(reconstructor)

while in your example you call qlayer.qnode on the entire batch.
This seems to do what I expect and a single job is submitted to the IBMQ backend with a number of circuits equal to the batch size.

Can you explain (some of) the reason(s) why this is not the default behavior?

Thank you again for the quick reply!