Increase the simulation speed of Tensor Network

mchau · June 18, 2025, 10:41pm

My TN is reasonable fast for only 6 qubits. Increasing to higher numbers make it even slower than default.qubits. Here are what I tried

Adding those right at line 1 of my script

import os

os.environ["OPENBLAS_NUM_THREADS"] = "1" # also tried with "8" for all of them
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["NUMBA_NUM_THREADS"] = "8"

I am using OPENBLAS but adding other too just in case

Setting

  max_bond_dim: 2
  cutoff: 0.0000000001

for device dev

Creating my qnode as qml.QNode(circuit, dev, interface=None) # no gradient, I plan to use different kind of optimizer since I don’t think default.tensor supports gradient.

My circuit looks like

   def circuit(feature, theta, gate_gens):
       qml.AmplitudeEmbedding(feature, wires=self.dev.wires, pad_with=0.0)
       for idx, gen in enumerate(gate_gens):
           # gen is a SparseHamiltonian
           qml.TrotterProduct(
               -1j * theta[idx] * gen, time=2, order=2, check_hermitian=False
           )
       return qml.state()

What else can I try to make it simulate a circuit with about < 25 wires?

CatalinaAlbornoz · June 19, 2025, 10:25pm

Hi @mchau ,

default.tensor allows differentiability via the parameter-shift rule.

Check out the demos that use this device and try to replicate the behaviour. If it’s too slow on your system there might be an installation issue.

Let us know if you’re able to run these demos!

mchau · June 20, 2025, 1:48pm

I can run those demo smoothly. Specifically, when running How to simulate quantum circuits with tensor networks | PennyLane Demos the time it takes for TN is

Number of qubits: 100
Result: 1.0000000000000002
Execution time: 1.0385 seconds
Number of qubits: 125
Result: 1.0000000000000002
Execution time: 1.3499 seconds
Number of qubits: 150
Result: 1.0000000000000002
Execution time: 1.6048 seconds
Number of qubits: 175
Result: 1.0000000000000002
Execution time: 2.1320 seconds
Number of qubits: 200
Result: 1.0000000000000002
Execution time: 2.6462 seconds

I bumped these number of qubits

I noted that the gates being used there are single gate, while I am doing Trotterization of a Hamiltonian that have ~40 terms

mchau · June 22, 2025, 8:31pm

Here is a MRE.

import pennylane as qml
from pennylane import numpy as np
import time

def run_circuit(n):
    dev = qml.device("default.tensor", wires=n, method="tn", local_simplify = "DCRS", contraction_optimizer = None)

    @qml.qnode(dev)
    def circuit(input_data):
        qml.AmplitudeEmbedding(input_data, wires=range(n), normalize=True)
        coeffs = [0.5]*n
        obs = [qml.PauliX(i) for i in range(n)]
        hamiltonian = qml.Hamiltonian(coeffs, obs)
        print(f"Hamiltonian {n} qubits: {hamiltonian}")
        qml.TrotterProduct(hamiltonian, time=0.2)
        return qml.state()

    input_data = np.random.rand(2 ** n)
    start_time = time.time()
    state = circuit(input_data)
    end_time = time.time()

    execution_time = end_time - start_time
    return execution_time, state

for n in range(6, 10):
    execution_time, state = run_circuit(n)
    print(f"Execution time for n={n}: {execution_time:.4f} seconds")

The result is

Hamiltonian 6 qubits: 0.5 * X(0) + 0.5 * X(1) + 0.5 * X(2) + 0.5 * X(3) + 0.5 * X(4) + 0.5 * X(5)
Execution time for n=6: 0.1082 seconds
Hamiltonian 7 qubits: 0.5 * X(0) + 0.5 * X(1) + 0.5 * X(2) + 0.5 * X(3) + 0.5 * X(4) + 0.5 * X(5) + 0.5 * X(6)
Execution time for n=7: 0.6015 seconds
Hamiltonian 8 qubits: 0.5 * X(0) + 0.5 * X(1) + 0.5 * X(2) + 0.5 * X(3) + 0.5 * X(4) + 0.5 * X(5) + 0.5 * X(6) + 0.5 * X(7)
Execution time for n=8: 78.9230 seconds
Hamiltonian 9 qubits: 0.5 * X(0) + 0.5 * X(1) + 0.5 * X(2) + 0.5 * X(3) + 0.5 * X(4) + 0.5 * X(5) + 0.5 * X(6) + 0.5 * X(7) + 0.5 * X(8).

and then my computer freezes

I have some warnings too

/python3.11/site-packages/cotengra/hyperoptimizers/hyper.py:54: UserWarning: Couldn't find `optuna`, `cmaes`, or `nevergrad` so will use completely random sampling in place of hyper-optimization.

CatalinaAlbornoz · June 24, 2025, 10:14pm

Hi @mchau ,

The issue here seems to be caused by AmplitudeEmbedding using up all of your RAM. It is a known problem in quantum computing that inputting large datasets is not a great idea, it’s actually a big bottleneck, especially when things aren’t optimized.

AmplitudeEmbedding uses qml.StatePrep under the hood. This is optimized to work with some devices such as default.qubit but not default.tensor. So it will end up running Mottonen which will quickly use up all of the RAM.

So my recommendation would be to switch to default.qubit in case you want to keep AmplitudeEmbedding, or switch to using less input data and a different embedding in case you want to keep using default.tensor.

Topic		Replies	Views
Execution time very long. Options to speed up? PennyLane Plugins	9	2274	April 26, 2021
Tensor network simulator PennyLane Help	3	743	August 6, 2021
Tensor network simulation method in qml.beta PennyLane Help	5	1313	April 14, 2024
Proper gradient and differentiation method for default.tensor with tensorflow PennyLane Help	6	22	March 25, 2025
Speeding up grad computation PennyLane Help	32	7746	May 5, 2022

Increase the simulation speed of Tensor Network

Related topics