You can use the QnodeCollection class to create a collection of independent Qnodes that can be simultaneously evaluated. The collection can be created as:
qnode = qml.QNodeCollection([qnode1, qnode2])
The qnodes within the QNodeCollection are executed sequentially by default but you can use the parallel=True keyword argument to activate asynchronous evaluation. However, the best speedup is achieved with external hardware devices or external simulators as explained in further details here under the “Asynchronous Evaluation” section. You may also find this previous discussion helpful. Please feel free to let us know if you have any further questions.
Thanks for you advice. It works for multi circuits. I notice that QnodeCollection class takes QNode as input. What about qml.grad? How could I calculate the gradient of multiple inputs in parallel as demonstrated in the example code above?
If I am not mistaken, computing the gradient in parallel is not possible with the QNodeCollection. But PennyLane integrates nicely with libraries like dask (which is actually used by the QNodeCollection). You should be able to evaluate the qml.grad function asynchronously with this library…