Gradient Question for RL Demo

Hello!

I was looking at the Gate calibration with reinforcement learning demo and was wondering if the functions to compute the reinforce_gradient_with_baseline and compute_baseline are specific towards having the logic built with qml.states() or if it’d be possible to use expval instead.

Also was curious about how the gradients would be modified to be able to select multiple actions at the same time if we were to drive multiple qubits. So how score_funcs can be updated so that it can learn from multiple actions that can selected.

Thanks ahead for the assistance!

Hi @apple_jay ,

You’re asking some good questions! I don’t know the answer but I’ll try to get some answers for you.

Thanks for looking into it and also having awesome demos to learn off of!

1 Like

Hi @apple_jay , I got some answers from my colleague Korbinian!

The Gate calibration with reinforcement learning demo is kind of reproducing the paper in the first reference, which uses the state because we are on a very small Hilbert space, so state tomography is feasible.

That being said, you should in principle be able to do the same with just expectation values, which is what “model free reinforcement learning / gate calibration” is all about. Note however that the authors in this other paper seem to argue that this has practical problems, though I haven’t read the paper so I’m not sure.

Before going into the question of modifying the gradients I would first try to test the demo by modifying it towards using expectation values, and if it works please let us know here and we can look into other questions that you may have then!

I hope this helps!

Hi Catalina,

Thank you for the resources and paper, they were very insightful! I was able to get the work updated with exp values so thank you for letting me know that it’s possible.

I’m working on wanting to tie in multiple action choices now and wanted to get a good opinion on how to setup a policy the can drive the different actions for qubits. Thank you in advance for the help and research!

Hi @apple_jay ,

I’m glad these resources helped! Unfortunately setting up a policy for different actions is beyond my expertise. We may have something on the Quantum Compilation hub, but again I’m not really sure what to look for here. If you find any good resources for this please share them here if you can! I’m curious to see what you can find on this.