Qualcs device and qml.grad() timings

I am wanting to run simulations with a higher number of qubits, so I realized that I might need to switch off the default qubit and into some other device. Looking through the device options, I think that qulacs might suit my needs, to I used this device. However, I am quickly finding that qml.grad()-function seems significantly slower using this device. I tried to run an N = 16 qubit example where I simply plot the gradient to see how it looks, and a gradient calculation using backprop which took 6 seconds now seemingly takes 102.68 seconds. Using finite differences instead, the calculation is faster (3.55 sec instead of ~5 seconds using default). I am not familiar with qulacs, so I am wondering why this is? Is it because it is highly optimized for forward calls and not for gradient-calculations like the default qubit is? I also tried this with the lightning qubit, where I find similar trends (qml.grad() takes 84 sec, finite difference takes 2.72 sec). My goal is essentially to run maxcut using QAOA for qubit counts of around 20+ but with some classical post processing using a neural network. Which qubit-device would be reccomended for such needs? Note that I also wish to calculate the gradient, either through backprop or other methods implemented in pennylane or using finite differences.

At some parts, the cost function is given by qml.expcal(H), while in other the output is qml.sample() which is post processed by a neural network. I wish to optimize the parts with qml.exval() using the gradient functinalities in Pennylane while the qml.sample() part is handled through finite differences.

I tried to upload the python file that reproduces the findings that I noted, however cannot seem to attach any files into the topic (the file is of .py type, so I can’t seem why I would get any errors).

Any aid in understanding why the qml.grad() function gives so different timings would be appreciated, from which I can choose the correct device.

Hi @Viro, thanks for the question.

Is it because it is highly optimized for forward calls and not for gradient-calculations like the default qubit is?

I believe you have nailed the issue here. PennyLane’s built-in simulator can leverage backpropagation and hence can be much faster for pipelines that require gradients. Qulacs, from my last recollection, supports parameter-shift rule gradients only.

@Viro if you’d like to learn more about the performance differences between backprop and parameter-shift in the context of variational algorithms, you might find this demo interesting: https://pennylane.ai/qml/demos/tutorial_backprop.html

In the meantime, you could also try out the lightning.qubit device. By default, this device will also use the parameter-shift rule, but if you are using shots=None, it also supports an optimized gradient method called the ‘adjoint’ method.

You can use it like so:

dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(dev, diff_method="adjoint")
def circuit(weights):

Seems interesting. Hoping to give it a spin, I changed the diff-method to adjoint, however I seemingly get this error:

OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/

I also tried the fix by setting os.environ[“KMP_DUPLICATE_LIB_OK”]=“TRUE”, however this does not solve the issue and causes a crash. Is this a known issue, and how do I go about solving it?

Note that this issue does not appear when I don’t specify the diff_method.

Hi @Viro thanks for letting us know. Can I ask for some information that may help us identify the cause of this:

  • Can you provide us with a minimum working example of the script that replicates the issue?
  • Are you using a conda env or virtualenv for your Python environment?
  • Are you running on an M1 or Intel Mac?
  • Do you have brew installed, and if so, is clang or libomp installed through brew?

As I mentioned earlier, I cannot seem to attach any files for some weird reason, so I copy pasted the example. It is really short, so I hope it’s fine that I copy pasted it.

import pennylane as qml
from pennylane import numpy as np

dev = qml.device('lightning.qubit',wires = 2,shots = 10000)
devAnalytical = qml.device('lightning.qubit',wires = 2)

def circuit(params):
    qml.Hadamard(wires = 0)
    qml.RX(params,wires =  0)
    return qml.expval(qml.PauliZ(wires = 0))

waht = qml.QNode(circuit,devAnalytical,diff_method='adjoint')
params = np.array([0.4])

This does indeed replicate the issue I mentioned arlier.

I am running a conda environment. My packages are as follows:

I have an intel mac.

Apple clang version 13.0.0 (clang-1300.0.27.3)
Target: x86_64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I think this is the stats that you are looking for? Not really all too familiar with all of this, hehe. As a side note, I’d like to mention that I have encountered similar issues with regards to running qiskit, and had to use the os.environ[“KMP_DUPLICATE_LIB_OK”]=“TRUE” to get those simulations to run, so that might be an indicator as to where the issue might be? Hope this is the information that you guys are looking for, and let me know if there is some other information that might be missing

Thanks @Viro
I suspect the issue here is that another package in your environment has already brought in libomp (or another OpenMP variant). Unfortunately, this seems to be a known issue with MacOS, as you mention, so the best option is for us to find a mitigation strategy that works.

I think there are several options we can try that may help solve this:

  • Attempting to use the environment variable trick (but you say this does not help in your case).
  • Avoid using Conda for the python environment, and simply create a bare Python3 env using python3 -m venv pyenv && source ./pyenv/bin/activate along with pip to install all packages. This will ensure the PyPI builds of packages are brought in, and in many cases tend to be better supported than the conda versions. If a conda package is causing the OpenMP library issue, this one will likely solve that.
  • If you wish to stay with conda, maybe try creating a new conda environment and attempt to only install the bare minimum packages; in this case, conda install nomkl before any others may be required to favour use of OpenBLAS over MKL, which should avoid bringing in any additional OpenMP libraries. I suspect the culprit here may be libiomp5 being brought in through numpy, so this may help.

Can you try these options and let us know if it helps?