Gradient Question for RL Demo

Hello!

I was looking at the Gate calibration with reinforcement learning demo and was wondering if the functions to compute the reinforce_gradient_with_baseline and compute_baseline are specific towards having the logic built with qml.states() or if it’d be possible to use expval instead.

Also was curious about how the gradients would be modified to be able to select multiple actions at the same time if we were to drive multiple qubits. So how score_funcs can be updated so that it can learn from multiple actions that can selected.

Thanks ahead for the assistance!

Hi @apple_jay ,

You’re asking some good questions! I don’t know the answer but I’ll try to get some answers for you.

Thanks for looking into it and also having awesome demos to learn off of!

1 Like

Hi @apple_jay , I got some answers from my colleague Korbinian!

The Gate calibration with reinforcement learning demo is kind of reproducing the paper in the first reference, which uses the state because we are on a very small Hilbert space, so state tomography is feasible.

That being said, you should in principle be able to do the same with just expectation values, which is what “model free reinforcement learning / gate calibration” is all about. Note however that the authors in this other paper seem to argue that this has practical problems, though I haven’t read the paper so I’m not sure.

Before going into the question of modifying the gradients I would first try to test the demo by modifying it towards using expectation values, and if it works please let us know here and we can look into other questions that you may have then!

I hope this helps!