Gradient Question for RL Demo

apple_jay · November 29, 2025, 2:00am

Hello!

I was looking at the Gate calibration with reinforcement learning demo and was wondering if the functions to compute the reinforce_gradient_with_baseline and compute_baseline are specific towards having the logic built with qml.states() or if it’d be possible to use expval instead.

Also was curious about how the gradients would be modified to be able to select multiple actions at the same time if we were to drive multiple qubits. So how score_funcs can be updated so that it can learn from multiple actions that can selected.

Thanks ahead for the assistance!

CatalinaAlbornoz · December 2, 2025, 2:53am

Hi @apple_jay ,

You’re asking some good questions! I don’t know the answer but I’ll try to get some answers for you.

apple_jay · December 2, 2025, 2:20pm

Thanks for looking into it and also having awesome demos to learn off of!

CatalinaAlbornoz · December 4, 2025, 4:20am

Hi @apple_jay , I got some answers from my colleague Korbinian!

The Gate calibration with reinforcement learning demo is kind of reproducing the paper in the first reference, which uses the state because we are on a very small Hilbert space, so state tomography is feasible.

That being said, you should in principle be able to do the same with just expectation values, which is what “model free reinforcement learning / gate calibration” is all about. Note however that the authors in this other paper seem to argue that this has practical problems, though I haven’t read the paper so I’m not sure.

Before going into the question of modifying the gradients I would first try to test the demo by modifying it towards using expectation values, and if it works please let us know here and we can look into other questions that you may have then!

I hope this helps!

apple_jay · December 13, 2025, 10:37pm

Hi Catalina,

Thank you for the resources and paper, they were very insightful! I was able to get the work updated with exp values so thank you for letting me know that it’s possible.

I’m working on wanting to tie in multiple action choices now and wanted to get a good opinion on how to setup a policy the can drive the different actions for qubits. Thank you in advance for the help and research!

CatalinaAlbornoz · December 15, 2025, 4:57pm

Hi @apple_jay ,

I’m glad these resources helped! Unfortunately setting up a policy for different actions is beyond my expertise. We may have something on the Quantum Compilation hub, but again I’m not really sure what to look for here. If you find any good resources for this please share them here if you can! I’m curious to see what you can find on this.

Topic		Replies	Views
Adaptive circuits for quantum chemistry Demos	3	91	May 2, 2025
Gradient Descent of real scalar-output function PennyLane Help	1	639	September 28, 2020
Cost functions: Multiple wire measurements + backends for Hermitians PennyLane Help	8	2066	June 11, 2019
Repeated parameters in qml.gradient.quantum_fisher PennyLane Help	4	89	October 8, 2024
Speeding up grad computation PennyLane Help	32	7852	May 5, 2022

Gradient Question for RL Demo

Related topics