Dask paralelization

jnorambu · August 7, 2023, 2:39pm

Hi, I was trying to use dask for parallelization of the expected value of groups of sub hamiltonian terms (taking as a basis the demo VQE with parallel QPUs with Rigetti), and I get a non-expected behavior, and as I’m a novice using dask, I don’t know what I do wrong or what I should do to resolved. The code that I use is the bellow and at the end I plot the weird behavior.

symbols = ["H", "H"]
coordinates = np.array([0.0, 0.0, -0.6614, 0.0, 0.0, 0.6614], requires_grad=True)
hamil, qubits = qchem.molecular_hamiltonian( symbols= symbols, coordinates= coordinates, charge=1)
coeff, terms = hamil.terms()
terms, coeff = qml.pauli.group_observables(observables=terms, coefficients=coeff, grouping_type='qwc', method='rlf')

dev = qml.device('default.qubit', wires=4)
begin_state = qml.qchem.hf_state(electrons=2, orbitals=qubits)
singles, doubles = qml.qchem.excitations(2, qubits)
singles, doubles = qml.qchem.excitations_to_wires(singles, doubles)

def circuit2(theta, index):
    qml.UCCSD(theta, range(qubits), singles, doubles, begin_state)
    return [qml.expval(u) for u in terms[index]]


node = qml.QNode(circuit2, dev)

def process_group(theta, i):
    result_probs = node(theta=theta, index=i)
    return np.sum( coeff[i]*np.array(result_probs) )

def cost_function(theta):
    results = []
    for i in range(len(terms)):
        results.append( dask.delayed(process_group)(theta, i) )
    num_workers = 1
    result = dask.compute(*results, scheduler="processes", num_workers=num_workers)
    return np.sum( result )

number = len(singles) + len(doubles)
theta = np.random.random( size=number )*(np.pi/180.0)
theta_optimizer = qml.GradientDescentOptimizer(stepsize=0.3)
energy = [cost_function(theta)]
theta_evol = [theta]
for _ in range(40):
    theta.requires_grad = True
    theta = theta_optimizer.step(cost_function, theta)
    energy.append(cost_function(theta))
    theta_evol.append(theta)
    prev_energy = energy[len(energy)-2]
    print(energy[-1])
    conv = np.abs(energy[-1] - prev_energy)
    if conv <= 1e-6:
        break

imagen

Thanks in advance and have a nice day.

Ivana_at_Xanadu · August 7, 2023, 3:46pm

Hey @jnorambu , could you share a minimal, self-contained version of your code, and also tell me which package version(s) you’re using?

When I try setting up the same system, I get a warning telling me that openshell systems aren’t supported in qchem.molecular_hamiltonian.

jnorambu · August 7, 2023, 4:15pm

Sure, The pennylane version that im using is 0.31.0 and the dask version is 2023.7.1 (at the end of the post i will paste all the versions).
The other thing, I have a little typo in the code that I provide (the charge should be 0), sorry about that, here’s a correct version (I run it in a jupyter notebook cell and works without problem).

import pennylane as qml
from pennylane import numpy as np
import dask

symbols = ["H", "H"]
coordinates = np.array([0.0, 0.0, -0.6614, 0.0, 0.0, 0.6614], requires_grad=True)
hamil, qubits = qchem.molecular_hamiltonian( symbols= symbols, coordinates= coordinates, charge=0)
coeff, terms = hamil.terms()
terms, coeff = qml.pauli.group_observables(observables=terms, coefficients=coeff, grouping_type='qwc', method='rlf')

dev = qml.device('default.qubit', wires=4)
begin_state = qml.qchem.hf_state(electrons=2, orbitals=qubits)
singles, doubles = qml.qchem.excitations(2, qubits)
singles, doubles = qml.qchem.excitations_to_wires(singles, doubles)

def circuit2(theta, index):
    qml.UCCSD(theta, range(qubits), singles, doubles, begin_state)
    return [qml.expval(u) for u in terms[index]]


node = qml.QNode(circuit2, dev)

def process_group(theta, i):
    result_probs = node(theta=theta, index=i)
    return np.sum( coeff[i]*np.array(result_probs) )

def cost_function(theta):
    results = []
    for i in range(len(terms)):
        results.append( dask.delayed(process_group)(theta, i) )
    num_workers = 1
    result = dask.compute(*results, scheduler="processes", num_workers=num_workers)
    return np.sum( result )

number = len(singles) + len(doubles)
theta = np.random.random( size=number )*(np.pi/180.0)
theta_optimizer = qml.GradientDescentOptimizer(stepsize=0.3)
energy = [cost_function(theta)]
theta_evol = [theta]
for _ in range(40):
    theta.requires_grad = True
    theta = theta_optimizer.step(cost_function, theta)
    energy.append(cost_function(theta))
    theta_evol.append(theta)
    prev_energy = energy[len(energy)-2]
    print(energy[-1])
    conv = np.abs(energy[-1] - prev_energy)
    if conv <= 1e-6:
        break

I will be attentive in case you require more information.

Package Version

appdirs 1.4.4
appnope 0.1.3
asttokens 2.2.1
autograd 1.5
autoray 0.6.6
backcall 0.2.0
cachetools 5.3.1
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.2.0
cirq-core 1.2.0
cirq-google 1.2.0
click 8.1.6
cloudpickle 2.2.1
comm 0.1.3
contourpy 1.1.0
cryptography 41.0.3
cycler 0.11.0
Cython 3.0.0
dask 2023.7.1
debugpy 1.6.7
decorator 5.1.1
deprecation 2.1.0
dill 0.3.7
duet 0.2.9
exceptiongroup 1.1.2
executing 1.2.0
fonttools 4.41.1
fsspec 2023.6.0
future 0.18.3
google-api-core 2.11.1
google-auth 2.22.0
googleapis-common-protos 1.60.0
grpcio 1.56.2
grpcio-status 1.56.2
h5py 3.9.0
ibm-cloud-sdk-core 3.16.7
ibm-platform-services 0.38.0
idna 3.4
importlib-metadata 6.8.0
importlib-resources 6.0.0
iniconfig 2.0.0
ipykernel 6.25.0
ipython 8.14.0
jedi 0.19.0
jupyter_client 8.3.0
jupyter_core 5.3.1
kiwisolver 1.4.4
locket 1.0.0
matplotlib 3.7.2
matplotlib-inline 0.1.6
mpmath 1.3.0
mthree 2.5.1
nest-asyncio 1.5.7
networkx 3.1
ntlm-auth 1.5.0
numpy 1.23.5
orjson 3.9.2
packaging 23.1
pandas 2.0.3
parso 0.8.3
partd 1.4.0
pbr 5.11.1
PennyLane 0.31.0
PennyLane-IonQ 0.28.0
PennyLane-Lightning 0.31.0
PennyLane-qiskit 0.31.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 10.0.0
pip 23.2.1
platformdirs 3.10.0
pluggy 1.2.0
ply 3.11
prompt-toolkit 3.0.39
proto-plus 1.22.3
protobuf 4.23.4
psutil 5.9.5
ptyprocess 0.7.0
PubChemPy 1.0.4
pure-eval 0.2.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
Pygments 2.15.1
PyJWT 2.8.0
pyparsing 3.0.9
pytest 7.4.0
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0.1
pyzmq 25.1.0
qiskit 0.44.0
qiskit-ibm-provider 0.6.2
qiskit-ibm-runtime 0.11.2
qiskit-ibmq-provider 0.20.2
qiskit-terra 0.25.0
requests 2.31.0
requests-ntlm 1.1.0
rsa 4.9
rustworkx 0.13.1
scipy 1.10.0
semantic-version 2.10.0
setuptools 58.0.4
six 1.16.0
sortedcontainers 2.4.0
stack-data 0.6.2
stevedore 5.1.0
symengine 0.9.2
sympy 1.12
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.7.1
tzdata 2023.3
urllib3 1.26.16
wcwidth 0.2.6
websocket-client 1.6.1
websockets 11.0.3
zipp 3.16.2

isaacdevlugt · August 8, 2023, 8:33pm

Hey @jnorambu!

I’m able to replicate the behaviour. Anytime you see unfavourable training metrics, it could be any one or combination of the multitude of things that affect a machine learning model’s performance. Here, I suspect the biggest issue is your choice of optimizer (gradient descent with a learning rate of 0.3). It looks like your model gets to an energy that’s around -0.136 and then starts to increase as a result of the updates being scaled by a large learning rate — in other words, you’re overshooting the result.

Adaptive methods (like ADAM) are great alternatives to ensure that learning rates get scaled properly so as to not overshoot the desired result. Here’s a good summary of common machine learning optimization techniques: https://aitechtrend.com/pytorch-optimizers-which-one-should-you-use-for-your-deep-learning-project/

Let me know if this helps!

jnorambu · August 9, 2023, 8:25pm

Hi Issac, I test with the same gradiend generic with smaller values of step_size (0.1, 0.15) and i got the same behavoiur. I also test with ADAM (the one that is implemented in pennylane qml.AdamOptimizer — PennyLane 0.31.1 documentation) with learning rates of 0.001 and 0.01 and nothing change. Here some graphics
imagen
This one is with ADAM and a learning rate of 0.01

imagen
And this one is also with ADAM but with a learning rate of 0.001.

I just change the theta_optimizer variable to use the qml.AdamOptimizer(stepsize=0.01) (for example)

isaacdevlugt · August 9, 2023, 9:47pm

Interesting! My hypothesis was wrong then .

I still stand by this:

Anytime you see unfavourable training metrics, it could be any one or combination of the multitude of things that affect a machine learning model’s performance.

This may include your ansatz, cost function, parameter initialization, etc. Your ansatz seems fine, but maybe your cost function is causing the issue. If you’re just trying to minimize the energy (i.e. the expectation value of the hydrogen hamiltonian is the cost function), then your results should match with simply doing that but not using Dask. So, something seems to be up with how you’re splitting up the expectation values of each term in the Hamiltonian, passing things to Dask, and then recombining stuff at the end. I’m not familiar at all with Dask, so your debugging here is as good as mine .

With that said, there are some native PennyLane multiprocessing features coming out in the next release . If getting this example to work with Dask isn’t a time sensitive priority, I might hold out and wait for the new PL release .

jnorambu · August 10, 2023, 7:48pm

Well, I will be waiting for the new release, for the moment the calculation will take a little long to complete.

isaacdevlugt · August 10, 2023, 9:35pm

Sounds good! We’ll let you know when it drops

Topic		Replies	Views
How to use Dask to parallelize QNode computations? PennyLane Help	26	1621	August 10, 2021
Is there a way to parallelize the same circuit for multiple input data? PennyLane Help	18	2475	April 3, 2023
Multiprocessing for simulator devices PennyLane Help	1	675	February 8, 2021
About the PennyLane ProjectQ category PennyLane ProjectQ	1	572	November 16, 2018
QGAN application on Dask PennyLane Help	4	393	January 18, 2024

Dask paralelization

Related topics