TorchLayer accepts batched inputs, no batch-level optimization is going on under the hood. You can check out how things work in the forward method of
There might be a couple of reasons why the hybrid model you are using is taking longer to train than a simple fully connected classical layer. From a fundamental perspective, we do expect the training times to increase exponentially on a simulator as we scale the number of qubits. This is what provides the nice motivation to construct the quantum hardware.
On the other hand, for a small number of qubits we can still try a couple of things to extract more performance. One approach is to optimize the way we differentiate the circuit. In older versions of PennyLane, the
diff_method="parameter-shift" method was used for Torch, you can check out more details here. Luckily, in the new version of PennyLane released a few days ago, we added support for backpropagation in the Torch interface. This simulator-only approach can provide a big speedup! In fact, I just tried running this tutorial and it took 8 seconds to train in the latest version of PennyLane and 44 seconds with an older version
So in summary, although there are some fundamental reasons why we might expect training to be tough on quantum simulators, you could try upgrading your PennyLane version and you might get a speedup without having to change any code!