Hello! I’m trying to train a 4-bit special unitary circuit (SU(16)). The structure of the circuit is from this article. The training process is very slow, and the CPU Usage is low (about 10% to 20%) during the training process. The phenomenon becomes more severe when I use the circuit as components of a large circuit. In a 6-bit or 7-bit circuit containing about 10 to 15 SU(16), The CPU usage cannot even reach 10%, and the training is extremely slow. Here is my code:
import cmath
import math
import matplotlib
import matplotlib.pyplot as plt
import pennylane as qml
from pennylane import numpy as np
matplotlib.use('TkAgg')
dev=qml.device('lightning.qubit', wires=4)
# dev=qml.devices.DefaultQubit(wires=4)
def getstates(n, complex=False):
# generate an arbitrary quantum state
state_r=np.random.rand(2**n)
while np.linalg.norm(state_r)<10**(-8):
state_r=np.random.rand(2**n)
if complex:
state_i=np.random.rand(2**n)
while np.linalg.norm(state_i)<10**(-8):
state_i=np.random.rand(2**n)
state=state_r+1j*state_i
else:
state=state_r
state=state/np.linalg.norm(state)
return state
def SU_N(n):
"""
Generate an n-order special unitary matrix (SU(n)) as the target to learn.
Args:
n (int): The order of the matrix.
Returns:
np.ndarray: An n x n unitary matrix with determinant 1.
"""
# Step 1: Generate a random complex matrix
z = (np.random.randn(n, n) + 1j * np.random.randn(n, n)) / np.sqrt(2)
# Step 2: Perform QR decomposition to get a unitary matrix
q, r = np.linalg.qr(z)
# Step 3: Adjust the diagonal of R to ensure Q is unitary
d = np.diagonal(r)
ph = d / np.abs(d) # Normalize to unit modulus
q = q @ np.diag(ph)
# Step 4: Ensure determinant is 1
det_q = np.linalg.det(q)
m, p=cmath.polar(det_q)
c=cmath.rect(math.pow(m, -1/n), -p/n)
for i in range(n):
q[i] = q[i] * c
return q
def uniform_CR(params, c_wires, target, op):
# the component of the SU(16) circuit from the paper, which is a multi-qubit controlled rotation.
if type(c_wires) == int or type(c_wires) == np.int64:
c_wires=[c_wires]
n = 2 ** len(c_wires)
if len(params) < n:
raise ValueError(f'The number of parameters is not enough. Need {n} parameters, given {len(params)}.')
counter = 0
for i in range(n):
table = bin(i)[2:].zfill(len(c_wires))
match op:
case 'X':
qml.ctrl(qml.RX(params[counter], target), c_wires, [bool(int(x)) for x in table])
case 'Y':
qml.ctrl(qml.RY(params[counter], target), c_wires, [bool(int(x)) for x in table])
case 'Z':
qml.ctrl(qml.RZ(params[counter], target), c_wires, [bool(int(x)) for x in table])
case _:
raise ValueError('Operations should be X, Y, or Z.')
counter += 1
def Fmk(params, m, k, wires, op):
# determine the non-trainable parameters of uniform_CR based on two non-trainable parameters m and k.
m=m-1
target = wires[m]
c_wires = [wires[m-i] for i in range(1, k+1)]
uniform_CR(params, c_wires, target, op)
def SU16(params, wires):
# the SU(16) circuit, which is a 4-qubit circuit with 255 trainable parameters.
if len(params)<255:
raise ValueError('The number of parameters should be 255 to construct a SU(16).')
qml.RZ(params[0], wires[0])
#B8
Fmk(params[1:3], 2, 1, wires, 'Z')
Fmk(params[3:7], 3, 2, wires, 'Z')
Fmk(params[7:15], 4, 3, wires, 'Z')
Fmk(params[15:23], 4, 3, wires, 'Y')
Fmk(params[247:255], 4, 3, wires, 'Z')
#BA7
Fmk(params[23:31], 1, 3, wires, 'Y')
Fmk(params[31:39], 1, 3, wires, 'Z')
Fmk(params[39:47], 4, 3, wires, 'Y')
Fmk(params[47:55], 4, 3, wires, 'Z')
#BA6
Fmk(params[55:63], 2, 3, wires, 'Y')
Fmk(params[63:71], 2, 3, wires, 'Z')
Fmk(params[71:79], 4, 3, wires, 'Y')
Fmk(params[79:87], 4, 3, wires, 'Z')
#BA5
Fmk(params[87:95], 1, 3, wires, 'Y')
Fmk(params[95:103], 1, 3, wires, 'Z')
Fmk(params[103:111], 4, 3, wires, 'Y')
Fmk(params[111:119], 4, 3, wires, 'Z')
#BA4
Fmk(params[119:127], 3, 3, wires, 'Y')
Fmk(params[127:135], 3, 3, wires, 'Z')
Fmk(params[135:143], 4, 3, wires, 'Y')
Fmk(params[143:151], 4, 3, wires, 'Z')
#BA3
Fmk(params[151:159], 1, 3, wires, 'Y')
Fmk(params[159:167], 1, 3, wires, 'Z')
Fmk(params[167:175], 4, 3, wires, 'Y')
Fmk(params[175:183], 4, 3, wires, 'Z')
#BA2
Fmk(params[183:191], 2, 3, wires, 'Y')
Fmk(params[191:199], 2, 3, wires, 'Z')
Fmk(params[199:207], 4, 3, wires, 'Y')
Fmk(params[207:215], 4, 3, wires, 'Z')
#BA1
Fmk(params[215:223], 1, 3, wires, 'Y')
Fmk(params[223:231], 1, 3, wires, 'Z')
Fmk(params[231:239], 4, 3, wires, 'Y')
Fmk(params[239:247], 4, 3, wires, 'Z')
@qml.qnode(dev)
def circuit(params, ini_state, tar_state):
# the quantum circuit to train, using the fidelity between the initial state and the target state as cost function.
qml.StatePrep(ini_state, range(4))
SU16(params, range(4))
return qml.expval(qml.Hermitian(qml.math.dm_from_state_vector(tar_state), range(4)))
def cost(params, ini_state_batch, tar_state_batch):
c = 0
for ini_state, tar_state in zip(ini_state_batch, tar_state_batch):
c += 1 - circuit(params, ini_state, tar_state)
return c / len(ini_state_batch)
if __name__ == '__main__':
batch_size = 16
target = SU_N(16)
ini_state_batch = []
tar_state_batch = []
for i in range(batch_size):
ini_state = getstates(4, True)
tar_state = target @ ini_state
ini_state_batch.append(ini_state.tolist()) # The tolist() is to ensure that Pennylane regard the quantum states as non-trainable parameters, because I often receive warnings saying that parameters in qml.Hermitian cannot be trained even though I marked the 'requires_grad' as False.
# ini_state_batch.requires_grad=False
tar_state_batch.append(tar_state.tolist())
params=np.random.uniform(0, 2*np.pi, 255)
opt=qml.AdamOptimizer(0.04)
converge_count=0 # This variable is used to check whether the cost converges
i=1
cur_cost=cost(params, ini_state_batch, tar_state_batch)
print(f'Initial fidelity is {1-cur_cost}')
loop=[0]
F=[1-cur_cost]
prev_cost=1
while converge_count<5 and i<400:
if abs(cur_cost-prev_cost)<0.003 and cur_cost<0.1:
converge_count+=1
else:
converge_count=0
prev_cost=cur_cost
ini_state_batch=[]
tar_state_batch=[]
for j in range(batch_size):
ini_state = getstates(4, True)
ini_state_batch.append(ini_state.tolist())
tar_state_batch.append((target @ ini_state).tolist())
params, cur_cost=opt.step_and_cost(lambda p: cost(p, ini_state_batch, tar_state_batch), params)
loop.append(i)
F.append(1-cur_cost)
i += 1
if not (i % 10):
print(f'Epoch {i}, current fidelity is {1-cur_cost}')
print(f'Total round number is {i}, the final fidelity is {1-cur_cost}')
fig, ax=plt.subplots()
ax.plot(loop, F)
ax.set_xlabel('Training epoch')
ax.set_ylabel('1-Fidelity')
ax.set_title('Training of SU(16) Block')
plt.show()
plt.clf()
The method I tried including the following:
- Using default.qubit, default.mixed, qml.devices.DefaultQubit and lightning.qubit as device. The lightning.qubit is a bit faster than the others, but is still slow with low CPU usage.
- Using JAX and trying to parallel the batch by jax.vmap in the cost function, and use @jax.jit to decorate the qnode, which is of no use in both training time and CPU usage.
- Using Joblib to parallel the batch, which will constantly raise errors.
- Using lightning.kokkos as device. It significantly increased the CPU usage to about 75%, but the training time is not reduced much.
I read the discussions in Topic 1304, where a Pennylane developer said that parameter-shift can be slow in some case, and backprop differentiation may be much faster. I also noticed that the lightning.qubit device supports adjoint differentiation method. However, I did not find how to choose which differentiation method to use.
The CPU I’m using is AMD Ryzen R7 5800H with 16 cores. My qml.about()
(The lightning.kokkos device is run in WSL environment, so the following output does not include it):
Name: PennyLane
Version: 0.41.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemi
stry. Train a quantum computer the same way as a neural network. Home-page: https://github.com/PennyLaneAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: C:\Users\Timmy\AppData\Local\Programs\Python\Python312\Lib\site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, reque
sts, rustworkx, scipy, tomlkit, typing-extensions Required-by: PennyLane_Lightning
Platform info: Windows-11-10.0.26200-SP0
Python version: 3.12.7
Numpy version: 1.26.4
Scipy version: 1.12.0
Installed devices:
- default.clifford (PennyLane-0.41.1)
- default.gaussian (PennyLane-0.41.1)
- default.mixed (PennyLane-0.41.1)
- default.qubit (PennyLane-0.41.1)
- default.qutrit (PennyLane-0.41.1)
- default.qutrit.mixed (PennyLane-0.41.1)
- default.tensor (PennyLane-0.41.1)
- null.qubit (PennyLane-0.41.1)
- reference.qubit (PennyLane-0.41.1)
- lightning.qubit (PennyLane_Lightning-0.41.1)