Error in running Catalyst as no compiler found

Hi there,

I have installed PennyLane catalyst on an Honda virtual environment on my university’s cluster. It kinda weird to me that for some nodes, Catalyst run perfectly fine. However, for some other nodes, I always face this error when trying to execute the code.


I am also facing the issue when running catalyst, Jax sometimes use all of my cpu cores in the node but sometimes it only use one core in the node. Why is this happening?

Thank you for the support.

Here is the version of my pennylane:

Name: PennyLane
Version: 0.35.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /work/acslab/users/baobach/anaconda3/envs/pennylane_catalyst/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-Catalyst, PennyLane_Lightning

Platform info:           Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.17
Python version:          3.11.7
Numpy version:           1.26.4
Scipy version:           1.12.0
Installed devices:
- nvidia.custatevec (PennyLane-Catalyst-0.5.0)
- nvidia.cutensornet (PennyLane-Catalyst-0.5.0)
- softwareq.qpp (PennyLane-Catalyst-0.5.0)
- default.clifford (PennyLane-0.35.0)
- default.gaussian (PennyLane-0.35.0)
- default.mixed (PennyLane-0.35.0)
- default.qubit (PennyLane-0.35.0)
- default.qubit.autograd (PennyLane-0.35.0)
- default.qubit.jax (PennyLane-0.35.0)
- default.qubit.legacy (PennyLane-0.35.0)
- default.qubit.tf (PennyLane-0.35.0)
- default.qubit.torch (PennyLane-0.35.0)
- default.qutrit (PennyLane-0.35.0)
- null.qubit (PennyLane-0.35.0)
- lightning.qubit (PennyLane_Lightning-0.35.0)```

Hi @Bach_Bao! That is definitely odd that the compiler can be found on some nodes, but not others :thinking: Do you know if the nodes in the cluster might have differing environments/packages installed?

Hi @josh, to my knowledge, it is true that they share the same environment and packages. I am thinking the error is in the binutils/2.38 where the crti.o and -lpthread are not found. The binutils/2.38 is added to environment when I am using the cluster package manager to add the gcc 12.2 compiler to the environment.
Also do you have any idea about this issue as well. When running catalyst, the program sometimes use all of my cpu cores in the node but sometimes it only use one core in the node.

Hi @Bach_Bao, sorry to hear about your issue. Catalyst uses the first available system compiler (in order: clang, gcc, c99, c89, cc) to link together the compiled user program and Catalyst’s runtime libraries.

The relevant error messages for your issue are indeed the first two lines:

ld: cannot find crti.o
ld: cannot find -lpthread

So a compiler is definitely found, but the linker (ld) can’t find some of the required system libraries. My guess is there is a problem with the C toolchain installation on your system (normally included in most Linux distributions), but I’m not familiar with your particular environment.

Could you verify that gcc and associated development libraries (if available on your package manager) are indeed getting installed? It may also be worth verifying whether a separate pthread package is available.
Alternatively, you could try installing Clang (if available on your system) to see if that resolves the issue for you.

If the relevant libraries are actually installed but just not found, it may be worth a try pointing the library search path towards their location (e.g. export LD_LIBRARY_PATH=<path>).

One last thing that may help resolve the issue is inspecting the link command used by Catalyst. You can do this by enabling verbose mode in the qjit decorator (@qjit(verbose=True)), which should print something like the following near the end:

[SYSTEM]: gcc -shared ...
1 Like

For this particular issue, would you mind sharing some additional details, for example about how you are observing this? An additional log or screen cap might be helpful here, since we would need to know which step or process is triggering this behaviour.

Hi David,
Currently I am executing this block to calculate the gradient for QAOA

@qjit
def grad_loss(params):
    @qml.qnode(dev)
    def circuit(params):
        qml.broadcast(qml.Hadamard, range(num_qubits), pattern="single")
        for i in range(n_layer):
            U_C(params[i])
            U_B(params[i+n_layer])
        return qml.expval(cost_h)        
    #qml.draw_mpl(circuit)(params)
    #Obj function
    def objective(params):
        num_edges = len(edges_set)
        return circuit(params)/num_edges
    return catalyst.grad(objective, argnum=0)(params)

And the thing I am observing is that for number of qubit bigger or equal to 12, the Jax utilize all of my CPU cores (shown in below image) and if the number qubit is less than 12, Jax only use one single core.

Hi @Bach_Bao, thank you for getting back to me with the extra bits of information. We have a few follow-up questions if you don’t mind that would help us nail down the problem:

  • What device are you using (e.g. lightning.qubit or lightning.kokkos)?
  • There is a simple test you can run that would tell us whether the cores are being used during compilation or by the compiled program itself. Could you run the same function with the same inputs twice in a single Python session, and monitor the core usage during each call?
    @qjit
    def grad_loss(params):
        ...
    
    grad_loss(params) # monitor core usage
    # pause
    grad_loss(params) # monitor core usage again
    
    If the core usage is high during both calls, this implies that it is the compiled program that exhibits the behaviour during execution. If it is only high during the first call, it is the compilation stage that exhibits the behaviour.
  • The example snippet is very helpful, would you be able to share the missing variables in the program so that we can run it ourselves? From what I see we would need: dev, n_layer, U_C, U_B, cost_h, edges_set, num_edges, and some example input for params.
    (I can also understand if the code is sensitive, although without a reproducible example we can only provide limited advice.)
  • Lastly, a more high-level question about the problem. Is the high-core count an issue for you, and if so in what way?