I had a question about using PennyLane for a QRL framework.
Specifically, I am testing out the Variational Soft Actor-Critic Method implemented by Lan et al. (2021). The tests take place in the Pendulum-v0 environment. The quantum part is a 3-qubit, 41 parameter, hardware efficient ansatz PQC with a Linear Classical Layer for outputting the mean and the standard deviation of a Gaussian policy.
What I find is that the tests take ‘super’ long. To give you an example, after 5 epochs of the exploration phase (each of which take about 3 seconds in real time), the 6th epoch takes around 30minutes. This is a gigantic jump, and not necessarily something I’d expect for 41 parameters.
The differentiation method is ‘best’ and I’ve tried both the inherent qubit simulator and qualcs, the results are the same.
So my questions is, does anybody have an idea why this is happening, and why this would be intuitively expected?