Different running Time w/wo parameter broadcast and how to generate random clifford gate

Hi! I just started to use pennylane in a few months. I’m running two different quantum functions as follows:

import pennylane as qml
from qiskit.quantum_info import random_clifford, Operator

device1 = qml.device("default.qubit", wires=20, shots=100)
@qml.qnode(device1)
def GHZState1(n_qubits, clifford):
    # -----------  GHZ State -------------- #
    qml.Hadamard(wires=0)
    for qbit in range(n_qubits - 1):
        qml.CNOT(wires=[qbit, qbit+1])
    # -----------  GHZ State -------------- #
    qml.QubitUnitary(clifford, wires=n_qubits-1)
    return [qml.sample(qml.PauliZ(i)) for i in range(n_qubits)]
clifford = Operator(random_clifford(1)).data
out = GHZState1(20, clifford)

And another function is:

import pennylane as qml
from qiskit.quantum_info import random_clifford, Operator

device2 = qml.device("default.qubit", wires=20, shots=1)
@qml.qnode(device2)
def GHZState2(n_qubits, clifford, random_unitary):
    # -----------  GHZ State -------------- #
    qml.Hadamard(wires=0)
    for qbit in range(n_qubits - 1):
        qml.CNOT(wires=[qbit, qbit+1])
    # -----------  GHZ State -------------- #
    qml.QubitUnitary(clifford, wires=n_qubits-1)
    return [qml.sample(qml.PauliZ(i)) for i in range(n_qubits)]
sampled_cliffords = []
for _ in range(n_rho):
    sampled_clifford = random_clifford(1)
    sampled_clifford = Operator(sampled_clifford).data 
    sampled_cliffords.append(sampled_clifford)
sampled_cliffords = np.stack(sampled_cliffords, axis=0)
out = GHZState2(20, sampled_cliffords)

You can see that the circuits are the same, the only difference is whether to use parameter broadcast. I calculated the running time and I found the first one without parameter broadcasting is much faster than the other, e.g., 0.2s vs 6.4s. I wonder the reason.
Besides, what should I do if there is an if statement in my function when I try to use parameter broadcast? For example, the circuit is:

def GHZState_if(n_qubits, use_clifford, clifford, random_unitary):
    # -----------  GHZ State -------------- #
    qml.Hadamard(wires=0)
    for qbit in range(n_qubits - 1):
        qml.CNOT(wires=[qbit, qbit+1])
    # -----------  GHZ State -------------- #
    if use_clifford:
        qml.QubitUnitary(clifford, wires=n_qubits-1)
    return [qml.sample(qml.PauliZ(i)) for i in range(n_qubits)]

The variable use_clifford is a shape (n,) array if clifford is a shape (n,2,2) array, and they correspond to n different circuits. I want to use parameter broadcast to simulate circuits under different situations since I don’t want to use for loop in my function. Is there a possible solution to achieve it?

Moreover, does pennylane have the function to generate a Clifford gate like qiskit.quantum_info.random_clifford?

Hi @Peng_Mi,

Welcome to the Forum!

I’m actually not sure so let me check with the team.

Hi @Peng_Mi and welcome to the forum! :slight_smile:

Regarding your first question:
Parameter broadcasting can offer speedups when there are expensive parts of your circuit that are being done redundantly for different inputs (or the same input over and over again). In general, those are operations that happen before the (first) broadcasted operation.
If you run your two versions for 10 qubits and n_rho=100 you will see that broadcasting provides a significant speedup over running the non-broadcasted code 100 times.
However, if we increase the qubit count, the speed up gradually goes away. This tells us that the tasks executed after the broadcasted QubitUnitary start taking the majority of the computation time. Here, this is the computation of marginal probabilities and sampling from them.
So of course broadcasting can not speed numerical calculations themselves up, and the sampling has to be performed for each input separately, so broadcasting only can get rid of overheads, not of actual computations after QubitUnitary.
Did you observe an actual slow down between running the broadcasted code and running the non-broadcasted code for the different inputs in a for loop?

Regarding your second question, there might be a way to make this work using qml.cond, potentially with Catalyst, and jax.vmap. However, before we go there, would running the following before your circuit also work?

clifford = qml.math.stack([cliff if use_cliff else qml.math.eye(2) for cliff, use_cliff in zip(clifford, use_clifford)])

Applying an identity sometimes across the broadcasting dimension probably currently is faster than making Python conditionals work with a broadcasting dimension :slight_smile:
If you’re interested in the qml.cond use case instead, let me know!

Regarding your third question, I am not aware of a random_clifford generator in PennyLane that implements the method by Bravyi and Maslov, but I will double check once more and get back to you!

I hope this helps, let me know if you have any other questions! :slight_smile:

1 Like

So thanks for your really comprehensive reply and interpretation. I really appreciate it. Regarding the solution to the second question, it works if I set the clifford to the identity matrix if its corresponding use_clifford=False; I used numpy to realize it before instead of qml.math.stack, but I believe the results are the same with both. Moreover, I have checked the qml.cond in docs and I’m interested in it. Will it be faster than the before realization? For the third question, it confuses me a lot when I try to sample a gate from one qubit Clifford Group and I can not find the related function in demos or docs, so here I am. Again, thanks for your detailed reply.

1 Like

Hi @Peng_Mi,

I used numpy to realize it before instead of qml.math.stack, but I believe the results are the same with both.

Yep, in this case it probably does not make a difference, you might as well use vanilla numpy instead of PennyLane’s numpy or qml.math here.

Moreover, I have checked the qml.cond in docs and I’m interested in it. Will it be faster than the before realization?

This is hard to predict for me personally. Important factors here would be:

  • How large will your circuits/qubit counts be? (For a small state, applying an identity matrix is really cheap so introducing qml.cond probably does not save much)
  • Are you planning to use just-in-time (JIT) compilation on your code? (This requires to use JAX) (Overhead introduced through qml.cond can be reduced. Internally, there will simply be two compiled programs, one for applying the Clifford and one for not doing so. )

I tried to make my idea with qml.cond and jax.vmap work, but ran into a problem. I consulted the Catalyst team to see whether this is possible at the moment :slight_smile:

For the third question, it confuses me a lot when I try to sample a gate from one qubit Clifford Group and I can not find the related function in demos or docs, so here I am. Again, thanks for your detailed reply.

It is a nice functionality, so I agree that it would be great to have it in PennyLane as well. I’ll pass on this feedback and we will see whether/when to add it! Thank you for the feedback :pray:

1 Like

Hi again @Peng_Mi

I heard back from the Catalyst team, and the following adaptation of your code should work for what you want to do! (courtesy of @David_Ittah :raised_hands: )

from catalyst import qjit, vmap
from qiskit.quantum_info import random_clifford, Operator

n_qubits=10

@qjit
@vmap(in_axes=(0, 0))
@qml.qnode(qml.device("lightning.qubit", wires=20, shots=100))
def GHZState_if(use_clifford, clifford):
    # -----------  GHZ State -------------- #
    qml.Hadamard(wires=0)
    for qbit in range(n_qubits - 1):
        qml.CNOT(wires=[qbit, qbit + 1])

    # -----------  GHZ State -------------- #
    def true_fn():
        qml.QubitUnitary(clifford, wires=n_qubits-1)
    qml.cond(use_clifford, true_fn)()

    return [qml.sample(wires=i) for i in range(n_qubits)]

use_clifford = jax.numpy.array([True, False] * 10)
cliffords = jax.numpy.array([
    Operator(random_clifford(1)).data for _ in range(20)
])

GHZState_if(use_clifford, cliffords)

Note that there are a few changes compared to your code:

  1. The variable n_qubits is not passed explicitly to the main function (this is due to a minor technical issue which will soon be resolved. Note,however, that it is not really necessary to pass it explicitly). Also random_unitary was removed, it did not do anything.
  2. The function is qjitted, i.e. just-in-time compiled by Catalyst, and vmapped, i.e. vectorized by Catalyst’s version of jax.vmap. The latter replaces the native broadcasting.
  3. The device is switched to "lightning.qubit" for compatibility (and performance).
  4. The Python if clause is replaced by qml.cond.
  5. The circuit samples wires instead of qml.Z operators due to compatibility reasons. This is essentially equivalent, though, and can easily be converted to qml.Z samples in postprocessing.

I hope this helps and encourages you to look further into advanced functionalities that PennyLane and Catalyst provide :slight_smile: Let me know in case you have further questions!

Happy vmapping!

1 Like