Efficient manipulation of pauli operators in pennylane?

What is the right way to do symbolic matrix multiplication and tensor products of Pauli strings in pennylane? Certain operations that I would expect to evaluate very fast take a long time.

For example , the following code snippet takes a very long time (>1 minute) to do a trivial calculation (side question - is the operation @ the correct way for me to do ordinary matrix multiplication when the observables are defined on the same system?)

out = qml.Identity(0)
for _ in range(1000):
    out @= qml.PauliX(0)
out.matrix

For comparison the analogous code snippet using cirq’s pauli operators evaluates basically instantaneously

qubit = cirq.LineQubit(0)
out = cirq.I(qubit)
for _ in range(1000):
    out *= cirq.X(qubit)
out.matrix([qubit])

Hi @EvanPeters! I did some digging, and it seems like the for loop itself is relatively fast,

out = qml.Identity(0)
for _ in range(1000):
    out @= qml.PauliX(0)

However, the computation of the matrix at the end is a massive bottleneck,

out.matrix

likely related to an inefficient implementation:

(if I had to guess, maybe the inefficiency is from the list comprehension in L1416?)

We have been building up more efficient Pauli group-based arithmetic in the qml.grouping module, and plan to integrate this into the Operator class ASAP. At the moment, if you are restricted to Pauli words, here is a quick function that should be much more efficient:

>>> def matrix(word):
...     phase = 1
...     prod = word[0]
...     for o2 in word:
...         prod, ph = qml.grouping.pauli_mult_with_phase(prod, o2)
...         phase *= ph
...     return ph * prod.matrix
>>> matrix([qml.PauliX(0) for i in range(1000)])
array([[0, 1],
       [1, 0]])
>>> %timeit matrix([qml.PauliX(0) for i in range(1000)])
65.1 ms ± 4.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This seems to match the behaviour of Cirq:

>>> %%timeit
... qubit = cirq.LineQubit(0)
... out = cirq.I(qubit)
... for _ in range(1000):
...     out *= cirq.X(qubit)
... out.matrix([qubit])
79 ms ± 5.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Depending on how many qubits you are using, a direct matrix approach could also work?

>>> %%timeit
... out = qml.Identity(0).matrix
... for _ in range(1000):
...     out = out @ qml.PauliX(0).matrix
9.72 ms ± 794 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

(although, my above solutions only make sense if you are looking for Pauli multiplication only; if you need a mixture of multiplication and/or tensor products, that makes it more complicated)