So some of the problem comes from the number of gates. Though the number of parameters is O(10) the number of gates using these parameters is O(100).
The exponential issue with the pyqvm is less obvious to me. Maybe it comes from computing U^(dag)? I appreciate the description above but can’t immediately see why it would slow down the computation so much.