Hello,
I am trying to run a machine learning experiment using a small circuit. In my optimization loop, I will execute the same circuit many times, but with different input parameters (the argument x in the following code snippet). I want to run on real hardward using the pennylane-qiskit plugin. I have acces to an Heron QPU with 156 qubits. So, I could fit multiple copies of my small circuit with different parameters in the same execution to save time.
I was hopping that one of these two transforms would achieve that, but without success so far.
Here is a simple example of what I tried.
import pennylane as qml
import numpy as np
from qiskit_ibm_runtime import QiskitRuntimeService
service = QiskitRuntimeService()
backend = service.least_busy()
device = qml.device("qiskit.remote", wires=2, backend=backend)
@qml.batch_input(argnum=[0, 1])
@qml.set_shots(32)
@qml.qnode(device)
def circuit(x, y):
qml.RX(x[0], wires=0)
qml.RX(x[1], wires=1)
qml.CNOT(wires=[0, 1])
qml.RY(y[0], wires=0)
return qml.expval(qml.Z(wires=0) + qml.Z(wires=1))
x = np.random.random((2, 5))
print(f"x = {x}")
y = np.random.random(1)
print(f"y = {y}")
print(f"Outcome = {circuit(x, y)}")
print(qml.draw(circuit)(x=x, y=y))
I have the following output
β python main.py
x = [[0. 0.1 0.2 0.3 0.4]
[0.5 0.6 0.7 0.8 0.9]]
y = [2.]
Outcome = [0.47190482 0.40949323 0.37722648 0.44045717 0.09474952]
0: ββRX(0.00)βββββRY(2.00)ββ€ β<π>
1: ββRX(0.50)ββ°Xββββββββββββ€ β°<π>
0: ββRX(0.10)βββββRY(2.00)ββ€ β<π>
1: ββRX(0.60)ββ°Xββββββββββββ€ β°<π>
0: ββRX(0.20)βββββRY(2.00)ββ€ β<π>
1: ββRX(0.70)ββ°Xββββββββββββ€ β°<π>
0: ββRX(0.30)βββββRY(2.00)ββ€ β<π>
1: ββRX(0.80)ββ°Xββββββββββββ€ β°<π>
0: ββRX(0.40)βββββRY(2.00)ββ€ β<π>
1: ββRX(0.90)ββ°Xββββββββββββ€ β°<π>
Also, in the IBM Quantum Cloud dashboard, I do see that 5 jobs where executed, each for a single 2-qubit circuit instead of a larger 10-qubit circuit that would batch all the inputs.
I know I can do it manually, but I was hoping there was already an implemented way to achieve that behavior. It would be even nicer if it was smart by batching the inputs only when running on real hardward and not when executing on a simulator .