Multi-core computation with lightning.qubit and OpenMP support

minn-bj · July 18, 2025, 9:19am

Hello there,

I searched a lot but can not find a proper explanation of how multi-core parallelization is possible in specific situations. I am using the JAX interface in combination with catalyst.vmap and qml.qjit as well as the lightning.qubit with OpenMP support.

While testing different setups I recognized that the qml.probs measurement does not properly use multiple cores while the measurement in How to optimize a QML model using Catalyst and quantum just-in-time (QJIT) compilation

qml.expval(qml.sum(*[qml.PauliZ(i) for i in range(n_wires)]))

When I am using qml.probs in that pennylane demo as measurement only one core runs with 100%. There are other pods opened by pennylane but they are only used in a range of ~2-8 %.

Is there a way to truly parallelize over multiple cores with qml.probs or use multiple cores for gradient calculations? Which measurements can utilize multi-core parallelization?

A personal comment: As far as I can say, such information can’t be found in the pennylane documentation. To investigate this topic without proper documentation is very time consuming and can be quite frustrating. I could imagine that many users are interested in this topic as the community starts to develop hybrid algorithms for real world scenarios that often need more computational resources. Some additional documentation regarding this topic could be interesting and beneficial for the community.

Best regards

minn-bj

mlxd · July 18, 2025, 6:05pm

Hi @minn-bj

Thank you for the feedback on the documentation for the multicore execution.
We’ll chat internally about making this more visible in future.

On the topic of qml.probs, since this is a simple loop over a vector, with limited compute per step, multithreading may actually hurt performance, rather than help. It is equivalent to a BLAS-1 like operation, which are inherently memory-bandwidth bound, and given the small size compared to, let’s say a BLAS-2 call (matrix-vector) or BLAS-3 (matrix-matrix) the serial execution (with implicit autovectorization) may be faster than spinning up a given thread.

That being said, it is possible to explicitly force the use of OpenMP threading for the qml.probs kernels in LightningQubit by building with the options listed here. In this case, you’ll need to install Lightning from source and provide an additional CMake argument -DLQ_ENABLE_KERNEL_OMP=ON. To do this onto an existing environment, you can ensure pip pulls the sdist release, and provide the CMake arguments directly as:

# regular install of pennylane and catalyst
python -m pip install pennylane pennylane-catalyst

# adapted install of LightningQubit with the new build
CMAKE_ARGS="-DLQ_ENABLE_KERNEL_OMP=ON" python -m pip \
     install pennylane_lightning --no-binary "pennylane_lightning" \
     --force-reinstall --no-cache-dir --verbose

While the above is definitely a mouthful, we tend to avoid turning this on by default for the above mentioned reasons. This will also enable multithreading support for all LightningQubit gates, which should thread over their application to the statevector. This can cause oversubscription when combined with multithreaded gradient workloads, so we designed the forward pass to be single threaded, allowing execution to be optimal for batched workloads.

For HPC specific installs, we will generally validate workloads on the machine, and adapt the flags for said machines to yield the best runtime. Though, since this is a workload dependent and system dependent process, it isn’t something we’ve exposed much yet (though, we will aim to make this easier).

I’d anticipate the above to yield better performance in regimes where qml.probs is evaluated on very large statevectors, and running on machines with high memory bandwidth, such as AMD Epycs or Xeons. In that way, the extra threading can help to saturate the available memory bus, and potentially yield better performance than serial, but it is still very likely to be workload dependent.

As for where OpenMP acceleration is currently supported, the adjoint differentiation pipeline (turned on by default for LightningQubit) will scale across observables when using qml.expval. This scaling was used mostly with Hamiltionian expectation values, and offered a useful way to run parallel gradient evaluations. As this can spawn a lot of copies for large Hamiltonians, we also recommend the batch_obs keyword argument discussed here.

You can also try out lightning.kokkos (pip install pennylane-lightning-kokkos), which should offer full OpenMP execution on all representative operations by default on Linux machines, though it will have less efficient use of vectorization than `lightning.qubit.

Feel free to try out the above, and let us know if it helps any, and we’ll do our best to improve the documentation of the above.

minn-bj · July 21, 2025, 8:51am

Hey @mlxd,

Thank you for your detailed answer. I will try to follow your instructions :). I have a further question regarding the potential loss of performance due to multi-core execution.

Blockquote
I’d anticipate the above to yield better performance in regimes where qml.probs is evaluated on very large statevectors, and running on machines with high memory bandwidth, such as AMD Epycs or Xeons. In that way, the extra threading can help to saturate the available memory bus, and potentially yield better performance than serial, but it is still very likely to be workload dependent.

I want to execute a 16 Qubit quantum circuit with an intermediate number of gates, a batch size of ~ 1000 and roughly 6-10 read out qubits for qml.probs. From my perspective this could be an attractive use case for multi-core execution if the forward and backward pass can be parallelized. Would you agree or did I misunderstood something and are both passes in fact parallelizable?

My Hardware: 2x Intel Xeon Gold 6132 CPU and ~100 GB of Ram (in principle up to 300 GB if necessary). In addition I have access to 2 Nvidia V100 16GB.

What would be your best guess to utilize this hardware and optimize performance?
(I already gave lightning.gpu a chance but it is surprisingly slow compared to the lightning.qubit). Is that behavior expected even with large batch sizes?

Best regards

minn-bj

minn-bj · July 21, 2025, 3:00pm

Here are some benchmark results for a circuit with 36 Gates.
Only the device changes

N=12 Qubits
Batch_size=1000

Single Core JIT lightning.qubit:

Forward: 0.16 sec
Update params: 6.3 sec
100% usage

JIT lightning.gpu:

Forward: 3.9 sec
Update params: 146.8 sec
50% usage
warning: 2025-07-21 14:21:21.104025: W external/xla/xla/service/gpu/nvptx_compiler.cc:760] The NVIDIA driver’s CUDA version is 12.4 which is older than the ptxas CUDA version (12.9.86). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.

N=12 Qubits
Batch_size=500

Single Core JIT lightning.qubit:

Forward: 0.06 sec
Update params: 2.8 sec
100% usage

JIT lightning.gpu:

Forward: 1.9 sec
Update params: 79.0 sec
50% usage
warning: 2025-07-21 14:21:21.104025: W external/xla/xla/service/gpu/nvptx_compiler.cc:760] The NVIDIA driver’s CUDA version is 12.4 which is older than the ptxas CUDA version (12.9.86). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.

CatalinaAlbornoz · July 22, 2025, 5:20pm

Hi @minn-bj ,

I noticed a few things.

On one hand, 16 qubits is below our recommendation for lightning.gpu. It’s normal to see lightning.qubit performing better than lightning.gpu when using less than 20 qubits. You can find this info and more in our performance page.

On the other hand you have a warning saying that your NVIDIA driver is old meaning there could be slowdowns. This could further explain the performance difference.

Finally, 12 qubits with 36 gates shouldn’t be a computationally intensive problem, but having thousands of datapoints is. Every time you try to load new data you have overheads so this looks to be the core of your speed problems. Quantum computers are known to be bad at handling large amounts of input data and parameters, so the solution here is to limit yourself to only a few datapoints.

I hope this helps!

minn-bj · July 23, 2025, 8:08am

Hey @CatalinaAlbornoz,

Blockquote
On one hand, 16 qubits is below our recommendation for lightning.gpu. It’s normal to see lightning.qubit performing better than lightning.gpu when using less than 20 qubits. You can find this info and more in our performance page.

Yeah I already thought that this could be on issue. My hope was that lightning.gpu potentially parallelizes the computation efficiently since the batch size is quite large.

Blockquote
On the other hand you have a warning saying that your NVIDIA driver is old meaning there could be slowdowns. This could further explain the performance difference.

I am already working on this, I just thought you could have some experience with this.

Blockquote
Finally, 12 qubits with 36 gates shouldn’t be a computationally intensive problem, but having thousands of datapoints is. Every time you try to load new data you have overheads so this looks to be the core of your speed problems. Quantum computers are known to be bad at handling large amounts of input data and parameters, so the solution here is to limit yourself to only a few datapoints.

Unfortunately, limiting the input data is (at the moment) not really possible. Anyway, I was wondering how the default qubit (torch interface and executed on a GPU) can be quite fast (Forward: 0.064 sec, Update params: 0.22 sec) but the other gpu accelerators like lightning.gpu and kokkos gpu are much slower. My guess was that this is related to the adjoint (lightning.gpu and lighting.kokkos) diff_method instead of the backprop diff_method (default.qubit). Is that correct? Is there a faster alternative to the mentioned default.qubit on a gpu for the scenario explained above (maybe a jax version where one can use backprop and jit the execution on a GPU)? It is quite hard to guess which simulator is best for my scenario since I don’t really know what happens inside of the different simulators.

I also recognized that the update time (gradient calc.) increases approx. linearly with the batch size. Is this behavior expected?

mlxd · July 23, 2025, 1:49pm

Hi @minn-bj

To answer your above questions — for your use cases, with moderate qubit count and batch dimensions, the default.qubit pipelines will most always be faster. The Lightning devices are generally optimised for HPC (large qubit counts, very deep circuits), and so do not have batch execution built in to the backends for gate, as would be the case in a classical ML context.

We do this as for each new batch dimension an additional statevector is required, and since the lightning.gpu device is built for HPC workloads where memory availability is the largest constraint, we iterate over the batch dimension to avoid creating intermediate statevectors where possible. The adjoint diff gradient pipeline requires some copies here, but they should be limited, and still more efficient than a backpropagation pipeline’s memory use beyond a 20 qubit threshold.

We may consider batching support for smaller qubit registers in future, but now this isn’t part of our plans.

In this scenario, if you want to execute in the 10-16 qubit regime with batch dimensions, using default.qubit with the jax or torch backends and the backpropagation gradient pipeline would likely be the best pipeline for your needs, as this should offer the best performance.

minn-bj · July 25, 2025, 9:00am

I just gave it a try, thanks again. Currently it uses multiple cores and works properly without jit and vmap.

Blockquote
#regular install of pennylane and catalyst
python -m pip install pennylane pennylane-catalyst
#adapted install of LightningQubit with the new build
CMAKE_ARGS=“-DLQ_ENABLE_KERNEL_OMP=ON” python -m pip
install pennylane_lightning --no-binary “pennylane_lightning”
–force-reinstall --no-cache-dir --verbose

Unfortunately, its not possible to jit or vmap the computation since some issue with the compiler occures. Do you have some more information on this specific error?

Error:

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:444, in Compiler.run_from_ir(self, ir, module_name, workspace)
    443     print(f"[SYSTEM] {' '.join(cmd)}", file=self.options.logfile)
--> 444 result = subprocess.run(cmd, check=True, capture_output=True, text=True)
    445 if self.options.verbose or os.getenv("ENABLE_DIAGNOSTICS"):

File /opt/conda/lib/python3.11/subprocess.py:571, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    570     if check and retcode:
--> 571         raise CalledProcessError(retcode, process.args,
    572                                  output=stdout, stderr=stderr)
    573 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['/opt/conda/bin/catalyst', '-o', '/tmp/grad.loss_fn4z94rjgu/grad.loss_fn.ll', '--module-name', 'grad.loss_fn', '--workspace', '/tmp/grad.loss_fn4z94rjgu', '-verify-each=false', '--catalyst-pipeline', 'EnforceRuntimeInvariantsPass(split-multiple-tapes;builtin.module(apply-transform-sequence);inline-nested-module),HLOLoweringPass(canonicalize;func.func(chlo-legalize-to-hlo);stablehlo-legalize-to-hlo;func.func(mhlo-legalize-control-flow);func.func(hlo-legalize-to-linalg);func.func(mhlo-legalize-to-std);func.func(hlo-legalize-sort);convert-to-signless;canonicalize;scatter-lowering;hlo-custom-call-lowering;cse;func.func(linalg-detensorize{aggressive-mode});detensorize-scf;canonicalize),QuantumCompilationPass(annotate-function;lower-mitigation;lower-gradients;adjoint-lowering),BufferizationPass(one-shot-bufferize{dialect-filter=memref};inline;gradient-preprocess;gradient-bufferize;scf-bufferize;convert-tensor-to-linalg;convert-elementwise-to-linalg;arith-bufferize;empty-tensor-to-alloc-tensor;func.func(bufferization-bufferize);func.func(tensor-bufferize);catalyst-bufferize;func.func(linalg-bufferize);func.func(tensor-bufferize);quantum-bufferize;func-bufferize;func.func(finalizing-bufferize);canonicalize;gradient-postprocess;func.func(buffer-hoisting);func.func(buffer-loop-hoisting);func.func(buffer-deallocation);convert-arraylist-to-memref;convert-bufferization-to-memref;canonicalize;cp-global-memref),MLIRToLLVMDialect(expand-realloc;convert-gradient-to-llvm;memrefcpy-to-linalgcpy;func.func(convert-linalg-to-loops);convert-scf-to-cf;expand-strided-metadata;lower-affine;arith-expand;convert-complex-to-standard;convert-complex-to-llvm;convert-math-to-llvm;convert-math-to-libm;convert-arith-to-llvm;memref-to-llvm-tbaa;finalize-memref-to-llvm{use-generic-functions};convert-index-to-llvm;convert-catalyst-to-llvm;convert-quantum-to-llvm;emit-catalyst-py-interface;canonicalize;reconcile-unrealized-casts;gep-inbounds;register-inactive-callback),', '/tmp/grad.loss_fn4z94rjgu/tmpjp6o4e1l.mlir']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

CompileError                              Traceback (most recent call last)
Cell In[7], line 135
    131     grad = catalyst.grad(loss_fn,  method="fd")
    133 ########################################################
--> 135 print(grad(params, data)['weights'].shape)
    138 def update_step(params, opt_state, opt, data):
    139     grads = grad(params, data)

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:594, in QJIT.__call__(self, *args, **kwargs)
    590         kwargs = {"static_argnums": self.compile_options.static_argnums, **kwargs}
    592     return self.user_function(*args, **kwargs)
--> 594 requires_promotion = self.jit_compile(args, **kwargs)
    596 # If we receive tracers as input, dispatch to the JAX integration.
    597 if any(isinstance(arg, jax.core.Tracer) for arg in tree_flatten(args)[0]):

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:670, in QJIT.jit_compile(self, args, **kwargs)
    667     self.jaxpr, self.out_type, self.out_treedef, self.c_sig = self.capture(args, **kwargs)
    669     self.mlir_module = self.generate_ir()
--> 670     self.compiled_function, _ = self.compile()
    672     self.fn_cache.insert(self.compiled_function, args, self.out_treedef, self.workspace)
    674 elif self.compiled_function is not cached_fn.compiled_fn:
    675     # Restore active state from cache.

File ~/.local/lib/python3.11/site-packages/catalyst/debug/instruments.py:145, in instrument.<locals>.wrapper(*args, **kwargs)
    142 @functools.wraps(fn)
    143 def wrapper(*args, **kwargs):
    144     if not InstrumentSession.active:
--> 145         return fn(*args, **kwargs)
    147     with ResultReporter(stage_name, has_finegrained) as reporter:
    148         self = args[0]

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:799, in QJIT.compile(self)
    793     shared_object, llvm_ir = self.compiler.run_from_ir(
    794         self.overwrite_ir,
    795         str(self.mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    796         self.workspace,
    797     )
    798 else:
--> 799     shared_object, llvm_ir = self.compiler.run(self.mlir_module, self.workspace)
    801 compiled_fn = CompiledFunction(
    802     shared_object, func_name, restype, self.out_type, self.compile_options
    803 )
    805 return compiled_fn, llvm_ir

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:486, in Compiler.run(self, mlir_module, *args, **kwargs)
    470 @debug_logger
    471 def run(self, mlir_module, *args, **kwargs):
    472     """Compile an MLIR module to a shared object.
    473 
    474     .. note::
   (...)    483         (str): filename of shared object
    484     """
--> 486     return self.run_from_ir(
    487         mlir_module.operation.get_asm(
    488             binary=False, print_generic_op_form=False, assume_verified=True
    489         ),
    490         str(mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    491         *args,
    492         **kwargs,
    493     )

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:451, in Compiler.run_from_ir(self, ir, module_name, workspace)
    449             print(result.stderr.strip(), file=self.options.logfile)
    450 except subprocess.CalledProcessError as e:  # pragma: nocover
--> 451     raise CompileError(f"catalyst failed with error code {e.returncode}: {e.stderr}") from e
    453 if os.path.exists(output_ir_name):
    454     with open(output_ir_name, "r", encoding="utf-8") as f:

CompileError: catalyst failed with error code 1: Compilation failed:
grad.loss_fn:221:7: error: 'quantum.device' op requires attribute 'device_name'
      "quantum.device"(%0) {kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so", name = "LightningSimulator"} : (i64) -> ()
      ^
grad.loss_fn:221:7: note: see current operation: "quantum.device"(%1) <{kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so"}> {name = "LightningSimulator"} : (i64) -> ()
grad.loss_fn: grad.loss_fn:1:8: error: expected 'module asm'
module @grad.loss_fn {
       ^
Failed to parse module as LLVM or MLIR source

CatalinaAlbornoz · July 25, 2025, 9:37pm

Thanks for sharing this @minn-bj .
We’ll take a look and get back to you next week.

CatalinaAlbornoz · July 28, 2025, 2:54pm

Hi @minn-bj ,

Can you please share a minimal reproducible example so that we can try to reproduce this issue? It looks like there’s an issue with the loss function.

minn-bj · July 29, 2025, 8:07am

Hey @CatalinaAlbornoz.

here is a minimal example and the corresponding error msg. The error msg is different since the script wants to execute the circuit, not the grad. Thank you for your support.

import os
os.environ["JAX_PLATFORMS"] = "cpu"
os.environ["OMP_PROC_BIND"] = 'spread'
os.environ["OMP_PLACES"] = "threads"
import pennylane as qml
import jax
import catalyst

n_wires = 8
batch_size=100

data=qml.numpy.random.rand(batch_size, n_wires)

dev = qml.device("lightning.qubit", wires=n_wires, batch_obs=True)

@qml.qnode(dev, interface='jax')
def circuit(data):
    
    @qml.for_loop(0,n_wires,1)
    def encode(i):
        qml.RY(data[i], wires=i)
    
    encode()
    return qml.probs(wires=dev.wires)#[qml.expval(qml.PauliZ(i)) for i in range(n_wires)]


circuit = catalyst.vmap(circuit, in_axes=(0))
circuit = qml.qjit(circuit, verbose=True)

circuit(data)

error msg:

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:444, in Compiler.run_from_ir(self, ir, module_name, workspace)
    443     print(f"[SYSTEM] {' '.join(cmd)}", file=self.options.logfile)
--> 444 result = subprocess.run(cmd, check=True, capture_output=True, text=True)
    445 if self.options.verbose or os.getenv("ENABLE_DIAGNOSTICS"):

File /opt/conda/lib/python3.11/subprocess.py:571, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    570     if check and retcode:
--> 571         raise CalledProcessError(retcode, process.args,
    572                                  output=stdout, stderr=stderr)
    573 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['/opt/conda/bin/catalyst', '-o', '/tmp/vmap.circuit45ibaj4b/vmap.circuit.ll', '--module-name', 'vmap.circuit', '--workspace', '/tmp/vmap.circuit45ibaj4b', '-verify-each=false', '--catalyst-pipeline', 'EnforceRuntimeInvariantsPass(split-multiple-tapes;builtin.module(apply-transform-sequence);inline-nested-module),HLOLoweringPass(canonicalize;func.func(chlo-legalize-to-hlo);stablehlo-legalize-to-hlo;func.func(mhlo-legalize-control-flow);func.func(hlo-legalize-to-linalg);func.func(mhlo-legalize-to-std);func.func(hlo-legalize-sort);convert-to-signless;canonicalize;scatter-lowering;hlo-custom-call-lowering;cse;func.func(linalg-detensorize{aggressive-mode});detensorize-scf;canonicalize),QuantumCompilationPass(annotate-function;lower-mitigation;lower-gradients;adjoint-lowering),BufferizationPass(one-shot-bufferize{dialect-filter=memref};inline;gradient-preprocess;gradient-bufferize;scf-bufferize;convert-tensor-to-linalg;convert-elementwise-to-linalg;arith-bufferize;empty-tensor-to-alloc-tensor;func.func(bufferization-bufferize);func.func(tensor-bufferize);catalyst-bufferize;func.func(linalg-bufferize);func.func(tensor-bufferize);quantum-bufferize;func-bufferize;func.func(finalizing-bufferize);canonicalize;gradient-postprocess;func.func(buffer-hoisting);func.func(buffer-loop-hoisting);func.func(buffer-deallocation);convert-arraylist-to-memref;convert-bufferization-to-memref;canonicalize;cp-global-memref),MLIRToLLVMDialect(expand-realloc;convert-gradient-to-llvm;memrefcpy-to-linalgcpy;func.func(convert-linalg-to-loops);convert-scf-to-cf;expand-strided-metadata;lower-affine;arith-expand;convert-complex-to-standard;convert-complex-to-llvm;convert-math-to-llvm;convert-math-to-libm;convert-arith-to-llvm;memref-to-llvm-tbaa;finalize-memref-to-llvm{use-generic-functions};convert-index-to-llvm;convert-catalyst-to-llvm;convert-quantum-to-llvm;emit-catalyst-py-interface;canonicalize;reconcile-unrealized-casts;gep-inbounds;register-inactive-callback),', '--verbose', '/tmp/vmap.circuit45ibaj4b/tmpfpiv5329.mlir']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

CompileError                              Traceback (most recent call last)
Cell In[1], line 36
     33 circuit = catalyst.vmap(circuit, in_axes=(0))
     34 circuit = qml.qjit(circuit, verbose=True)
---> 36 circuit(data)

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:594, in QJIT.__call__(self, *args, **kwargs)
    590         kwargs = {"static_argnums": self.compile_options.static_argnums, **kwargs}
    592     return self.user_function(*args, **kwargs)
--> 594 requires_promotion = self.jit_compile(args, **kwargs)
    596 # If we receive tracers as input, dispatch to the JAX integration.
    597 if any(isinstance(arg, jax.core.Tracer) for arg in tree_flatten(args)[0]):

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:670, in QJIT.jit_compile(self, args, **kwargs)
    667     self.jaxpr, self.out_type, self.out_treedef, self.c_sig = self.capture(args, **kwargs)
    669     self.mlir_module = self.generate_ir()
--> 670     self.compiled_function, _ = self.compile()
    672     self.fn_cache.insert(self.compiled_function, args, self.out_treedef, self.workspace)
    674 elif self.compiled_function is not cached_fn.compiled_fn:
    675     # Restore active state from cache.

File ~/.local/lib/python3.11/site-packages/catalyst/debug/instruments.py:145, in instrument.<locals>.wrapper(*args, **kwargs)
    142 @functools.wraps(fn)
    143 def wrapper(*args, **kwargs):
    144     if not InstrumentSession.active:
--> 145         return fn(*args, **kwargs)
    147     with ResultReporter(stage_name, has_finegrained) as reporter:
    148         self = args[0]

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:799, in QJIT.compile(self)
    793     shared_object, llvm_ir = self.compiler.run_from_ir(
    794         self.overwrite_ir,
    795         str(self.mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    796         self.workspace,
    797     )
    798 else:
--> 799     shared_object, llvm_ir = self.compiler.run(self.mlir_module, self.workspace)
    801 compiled_fn = CompiledFunction(
    802     shared_object, func_name, restype, self.out_type, self.compile_options
    803 )
    805 return compiled_fn, llvm_ir

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:486, in Compiler.run(self, mlir_module, *args, **kwargs)
    470 @debug_logger
    471 def run(self, mlir_module, *args, **kwargs):
    472     """Compile an MLIR module to a shared object.
    473 
    474     .. note::
   (...)    483         (str): filename of shared object
    484     """
--> 486     return self.run_from_ir(
    487         mlir_module.operation.get_asm(
    488             binary=False, print_generic_op_form=False, assume_verified=True
    489         ),
    490         str(mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    491         *args,
    492         **kwargs,
    493     )

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:451, in Compiler.run_from_ir(self, ir, module_name, workspace)
    449             print(result.stderr.strip(), file=self.options.logfile)
    450 except subprocess.CalledProcessError as e:  # pragma: nocover
--> 451     raise CompileError(f"catalyst failed with error code {e.returncode}: {e.stderr}") from e
    453 if os.path.exists(output_ir_name):
    454     with open(output_ir_name, "r", encoding="utf-8") as f:

CompileError: catalyst failed with error code 1: Compilation failed:
vmap.circuit:108:7: error: 'quantum.device' op requires attribute 'device_name'
      "quantum.device"(%0) {kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so", name = "LightningSimulator"} : (i64) -> ()
      ^
vmap.circuit:108:7: note: diagnostic emitted with trace:
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  catalyst  0x000000000ab8194b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43
1  catalyst  0x000000000a9a95d0
2  catalyst  0x000000000a9a9761 mlir::emitError(mlir::Location, llvm::Twine const&) + 17
3  catalyst  0x000000000aa00e19 mlir::Operation::emitError(llvm::Twine const&) + 25
4  catalyst  0x000000000aa01164 mlir::Operation::emitOpError(llvm::Twine const&) + 52
5  catalyst  0x000000000aa026ac mlir::OpState::emitOpError(llvm::Twine const&) + 12
6  catalyst  0x000000000646704f catalyst::quantum::DeviceInitOp::verifyInvariantsImpl() + 175
7  catalyst  0x00000000064371ff
8  catalyst  0x0000000006440fbc
9  catalyst  0x000000000aa304cf
10 catalyst  0x000000000aa31b56
11 catalyst  0x000000000aa2f8f0
12 catalyst  0x000000000aa31b56
13 catalyst  0x000000000aa2f8f0
14 catalyst  0x000000000aa31b56
15 catalyst  0x000000000aa2f8f0
16 catalyst  0x000000000aa32bc4 mlir::verify(mlir::Operation*, bool) + 36
17 catalyst  0x0000000009aaeed2 mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) + 5458
18 catalyst  0x0000000003ddff9d
19 catalyst  0x0000000003de83f7 QuantumDriverMain(catalyst::driver::CompilerOptions const&, catalyst::driver::CompilerOutput&, mlir::DialectRegistry&) + 1095
20 catalyst  0x0000000003ded623 QuantumDriverMainFromCL(int, char**) + 10771
21 libc.so.6 0x00007f941b64cd90
22 libc.so.6 0x00007f941b64ce40 __libc_start_main + 128
23 catalyst  0x0000000003dc75ae _start + 46

vmap.circuit:108:7: note: see current operation: "quantum.device"(%1) <{kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so"}> {name = "LightningSimulator"} : (i64) -> ()
vmap.circuit: vmap.circuit:1:8: error: expected 'module asm'
module @vmap.circuit {
       ^
Failed to parse module as LLVM or MLIR source

CatalinaAlbornoz · July 30, 2025, 9:40pm

Thanks for adding your code @minn-bj . We will investigate and get back to you.

maliasadi · July 31, 2025, 8:08pm

Hi @minn-bj, Thanks for providing the example! You should be able to QJIT compile and execute it on Lightning devices using proper versions of PennyLane, Catalyst, and Lightning.

From the error, it appears you’re using a development version of Catalyst with some incompatibilities in the compiler. If you installed Catalyst from source, please ensure you’ve carefully followed the installation guidelines; you’ll need to run make all after installing the dependencies.

Otherwise, I suggest creating a new environment and trying the following commands to pull and install the official versions of these packages from PyPI:

python -m pip install pennylane pennylane-catalyst
CMAKE_ARGS="-DLQ_ENABLE_KERNEL_OMP=ON" python -m pip install pennylane_lightning --no-binary "pennylane_lightning" --force-reinstall --no-cache-dir --verbose

You can also run this to check the installed versions:

python -c "import pennylane as qml; qml.about()"

Note that PennyLane and Lightning v0.41 and v0.42 require Catalyst v0.11 and v0.12, respectively.

If these suggestions don’t resolve your issue, feel free to report back with the versions you’ve tried.

minn-bj · August 5, 2025, 3:35pm

Hello @maliasadi ,

thanks for your reply. I followed your instructions but still have the same issue. If you need more information than the qml.about() and the error msg. feel free to approach me.

qml.about()

Name: PennyLane
Version: 0.41.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /home/jovyan/.local/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, diastatic-malt, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, tomlkit, typing-extensions
Required-by: PennyLane-Catalyst, PennyLane_Lightning, PennyLane_Lightning_GPU, PennyLane_Lightning_Kokkos

Platform info:           Linux-5.15.0-142-generic-x86_64-with-glibc2.35
Python version:          3.11.10
Numpy version:           2.3.2
Scipy version:           1.16.1
Installed devices:
- nvidia.custatevec (PennyLane-Catalyst-0.11.0)
- nvidia.cutensornet (PennyLane-Catalyst-0.11.0)
- oqc.cloud (PennyLane-Catalyst-0.11.0)
- softwareq.qpp (PennyLane-Catalyst-0.11.0)
- lightning.gpu (PennyLane_Lightning_GPU-0.41.1)
- default.clifford (PennyLane-0.41.1)
- default.gaussian (PennyLane-0.41.1)
- default.mixed (PennyLane-0.41.1)
- default.qubit (PennyLane-0.41.1)
- default.qutrit (PennyLane-0.41.1)
- default.qutrit.mixed (PennyLane-0.41.1)
- default.tensor (PennyLane-0.41.1)
- null.qubit (PennyLane-0.41.1)
- reference.qubit (PennyLane-0.41.1)
- lightning.qubit (PennyLane_Lightning-0.41.1)
- lightning.kokkos (PennyLane_Lightning_Kokkos-0.41.1)

Error:

[LIB] Running compiler driver in /tmp/vmap.circuitudc846z1
[SYSTEM] /opt/conda/bin/catalyst -o /tmp/vmap.circuitudc846z1/vmap.circuit.ll --module-name vmap.circuit --workspace /tmp/vmap.circuitudc846z1 -verify-each=false --catalyst-pipeline EnforceRuntimeInvariantsPass(split-multiple-tapes;builtin.module(apply-transform-sequence);inline-nested-module),HLOLoweringPass(canonicalize;func.func(chlo-legalize-to-hlo);stablehlo-legalize-to-hlo;func.func(mhlo-legalize-control-flow);func.func(hlo-legalize-to-linalg);func.func(mhlo-legalize-to-std);func.func(hlo-legalize-sort);convert-to-signless;canonicalize;scatter-lowering;hlo-custom-call-lowering;cse;func.func(linalg-detensorize{aggressive-mode});detensorize-scf;canonicalize),QuantumCompilationPass(annotate-function;lower-mitigation;lower-gradients;adjoint-lowering),BufferizationPass(one-shot-bufferize{dialect-filter=memref};inline;gradient-preprocess;gradient-bufferize;scf-bufferize;convert-tensor-to-linalg;convert-elementwise-to-linalg;arith-bufferize;empty-tensor-to-alloc-tensor;func.func(bufferization-bufferize);func.func(tensor-bufferize);catalyst-bufferize;func.func(linalg-bufferize);func.func(tensor-bufferize);quantum-bufferize;func-bufferize;func.func(finalizing-bufferize);canonicalize;gradient-postprocess;func.func(buffer-hoisting);func.func(buffer-loop-hoisting);func.func(buffer-deallocation);convert-arraylist-to-memref;convert-bufferization-to-memref;canonicalize;cp-global-memref),MLIRToLLVMDialect(expand-realloc;convert-gradient-to-llvm;memrefcpy-to-linalgcpy;func.func(convert-linalg-to-loops);convert-scf-to-cf;expand-strided-metadata;lower-affine;arith-expand;convert-complex-to-standard;convert-complex-to-llvm;convert-math-to-llvm;convert-math-to-libm;convert-arith-to-llvm;memref-to-llvm-tbaa;finalize-memref-to-llvm{use-generic-functions};convert-index-to-llvm;convert-catalyst-to-llvm;convert-quantum-to-llvm;emit-catalyst-py-interface;canonicalize;reconcile-unrealized-casts;gep-inbounds;register-inactive-callback), --verbose /tmp/vmap.circuitudc846z1/tmpb1ay7f59.mlir
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:444, in Compiler.run_from_ir(self, ir, module_name, workspace)
    443     print(f"[SYSTEM] {' '.join(cmd)}", file=self.options.logfile)
--> 444 result = subprocess.run(cmd, check=True, capture_output=True, text=True)
    445 if self.options.verbose or os.getenv("ENABLE_DIAGNOSTICS"):

File /opt/conda/lib/python3.11/subprocess.py:571, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    570     if check and retcode:
--> 571         raise CalledProcessError(retcode, process.args,
    572                                  output=stdout, stderr=stderr)
    573 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['/opt/conda/bin/catalyst', '-o', '/tmp/vmap.circuitudc846z1/vmap.circuit.ll', '--module-name', 'vmap.circuit', '--workspace', '/tmp/vmap.circuitudc846z1', '-verify-each=false', '--catalyst-pipeline', 'EnforceRuntimeInvariantsPass(split-multiple-tapes;builtin.module(apply-transform-sequence);inline-nested-module),HLOLoweringPass(canonicalize;func.func(chlo-legalize-to-hlo);stablehlo-legalize-to-hlo;func.func(mhlo-legalize-control-flow);func.func(hlo-legalize-to-linalg);func.func(mhlo-legalize-to-std);func.func(hlo-legalize-sort);convert-to-signless;canonicalize;scatter-lowering;hlo-custom-call-lowering;cse;func.func(linalg-detensorize{aggressive-mode});detensorize-scf;canonicalize),QuantumCompilationPass(annotate-function;lower-mitigation;lower-gradients;adjoint-lowering),BufferizationPass(one-shot-bufferize{dialect-filter=memref};inline;gradient-preprocess;gradient-bufferize;scf-bufferize;convert-tensor-to-linalg;convert-elementwise-to-linalg;arith-bufferize;empty-tensor-to-alloc-tensor;func.func(bufferization-bufferize);func.func(tensor-bufferize);catalyst-bufferize;func.func(linalg-bufferize);func.func(tensor-bufferize);quantum-bufferize;func-bufferize;func.func(finalizing-bufferize);canonicalize;gradient-postprocess;func.func(buffer-hoisting);func.func(buffer-loop-hoisting);func.func(buffer-deallocation);convert-arraylist-to-memref;convert-bufferization-to-memref;canonicalize;cp-global-memref),MLIRToLLVMDialect(expand-realloc;convert-gradient-to-llvm;memrefcpy-to-linalgcpy;func.func(convert-linalg-to-loops);convert-scf-to-cf;expand-strided-metadata;lower-affine;arith-expand;convert-complex-to-standard;convert-complex-to-llvm;convert-math-to-llvm;convert-math-to-libm;convert-arith-to-llvm;memref-to-llvm-tbaa;finalize-memref-to-llvm{use-generic-functions};convert-index-to-llvm;convert-catalyst-to-llvm;convert-quantum-to-llvm;emit-catalyst-py-interface;canonicalize;reconcile-unrealized-casts;gep-inbounds;register-inactive-callback),', '--verbose', '/tmp/vmap.circuitudc846z1/tmpb1ay7f59.mlir']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

CompileError                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 circuit(data)

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:594, in QJIT.__call__(self, *args, **kwargs)
    590         kwargs = {"static_argnums": self.compile_options.static_argnums, **kwargs}
    592     return self.user_function(*args, **kwargs)
--> 594 requires_promotion = self.jit_compile(args, **kwargs)
    596 # If we receive tracers as input, dispatch to the JAX integration.
    597 if any(isinstance(arg, jax.core.Tracer) for arg in tree_flatten(args)[0]):

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:670, in QJIT.jit_compile(self, args, **kwargs)
    667     self.jaxpr, self.out_type, self.out_treedef, self.c_sig = self.capture(args, **kwargs)
    669     self.mlir_module = self.generate_ir()
--> 670     self.compiled_function, _ = self.compile()
    672     self.fn_cache.insert(self.compiled_function, args, self.out_treedef, self.workspace)
    674 elif self.compiled_function is not cached_fn.compiled_fn:
    675     # Restore active state from cache.

File ~/.local/lib/python3.11/site-packages/catalyst/debug/instruments.py:145, in instrument.<locals>.wrapper(*args, **kwargs)
    142 @functools.wraps(fn)
    143 def wrapper(*args, **kwargs):
    144     if not InstrumentSession.active:
--> 145         return fn(*args, **kwargs)
    147     with ResultReporter(stage_name, has_finegrained) as reporter:
    148         self = args[0]

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/jit.py:799, in QJIT.compile(self)
    793     shared_object, llvm_ir = self.compiler.run_from_ir(
    794         self.overwrite_ir,
    795         str(self.mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    796         self.workspace,
    797     )
    798 else:
--> 799     shared_object, llvm_ir = self.compiler.run(self.mlir_module, self.workspace)
    801 compiled_fn = CompiledFunction(
    802     shared_object, func_name, restype, self.out_type, self.compile_options
    803 )
    805 return compiled_fn, llvm_ir

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:486, in Compiler.run(self, mlir_module, *args, **kwargs)
    470 @debug_logger
    471 def run(self, mlir_module, *args, **kwargs):
    472     """Compile an MLIR module to a shared object.
    473 
    474     .. note::
   (...)    483         (str): filename of shared object
    484     """
--> 486     return self.run_from_ir(
    487         mlir_module.operation.get_asm(
    488             binary=False, print_generic_op_form=False, assume_verified=True
    489         ),
    490         str(mlir_module.operation.attributes["sym_name"]).replace('"', ""),
    491         *args,
    492         **kwargs,
    493     )

File ~/.local/lib/python3.11/site-packages/pennylane/logging/decorators.py:61, in log_string_debug_func.<locals>.wrapper_entry(*args, **kwargs)
     54     s_caller = "::L".join(
     55         [str(i) for i in inspect.getouterframes(inspect.currentframe(), 2)[1][1:3]]
     56     )
     57     lgr.debug(
     58         f"Calling {f_string} from {s_caller}",
     59         **_debug_log_kwargs,
     60     )
---> 61 return func(*args, **kwargs)

File ~/.local/lib/python3.11/site-packages/catalyst/compiler.py:451, in Compiler.run_from_ir(self, ir, module_name, workspace)
    449             print(result.stderr.strip(), file=self.options.logfile)
    450 except subprocess.CalledProcessError as e:  # pragma: nocover
--> 451     raise CompileError(f"catalyst failed with error code {e.returncode}: {e.stderr}") from e
    453 if os.path.exists(output_ir_name):
    454     with open(output_ir_name, "r", encoding="utf-8") as f:

CompileError: catalyst failed with error code 1: Compilation failed:
vmap.circuit:108:7: error: 'quantum.device' op requires attribute 'device_name'
      "quantum.device"(%0) {kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so", name = "LightningSimulator"} : (i64) -> ()
      ^
vmap.circuit:108:7: note: diagnostic emitted with trace:
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  catalyst  0x000000000ab8194b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43
1  catalyst  0x000000000a9a95d0
2  catalyst  0x000000000a9a9761 mlir::emitError(mlir::Location, llvm::Twine const&) + 17
3  catalyst  0x000000000aa00e19 mlir::Operation::emitError(llvm::Twine const&) + 25
4  catalyst  0x000000000aa01164 mlir::Operation::emitOpError(llvm::Twine const&) + 52
5  catalyst  0x000000000aa026ac mlir::OpState::emitOpError(llvm::Twine const&) + 12
6  catalyst  0x000000000646704f catalyst::quantum::DeviceInitOp::verifyInvariantsImpl() + 175
7  catalyst  0x00000000064371ff
8  catalyst  0x0000000006440fbc
9  catalyst  0x000000000aa304cf
10 catalyst  0x000000000aa31b56
11 catalyst  0x000000000aa2f8f0
12 catalyst  0x000000000aa31b56
13 catalyst  0x000000000aa2f8f0
14 catalyst  0x000000000aa31b56
15 catalyst  0x000000000aa2f8f0
16 catalyst  0x000000000aa32bc4 mlir::verify(mlir::Operation*, bool) + 36
17 catalyst  0x0000000009aaeed2 mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) + 5458
18 catalyst  0x0000000003ddff9d
19 catalyst  0x0000000003de83f7 QuantumDriverMain(catalyst::driver::CompilerOptions const&, catalyst::driver::CompilerOutput&, mlir::DialectRegistry&) + 1095
20 catalyst  0x0000000003ded623 QuantumDriverMainFromCL(int, char**) + 10771
21 libc.so.6 0x00007fe8c2803d90
22 libc.so.6 0x00007fe8c2803e40 __libc_start_main + 128
23 catalyst  0x0000000003dc75ae _start + 46

vmap.circuit:108:7: note: see current operation: "quantum.device"(%1) <{kwargs = "{'shots': 0, 'mcmc': False, 'num_burnin': 0, 'kernel_name': None}", lib = "/home/jovyan/.local/lib/python3.11/site-packages/pennylane_lightning/liblightning_qubit_catalyst.so"}> {name = "LightningSimulator"} : (i64) -> ()
vmap.circuit: vmap.circuit:1:8: error: expected 'module asm'
module @vmap.circuit {
       ^
Failed to parse module as LLVM or MLIR source

maliasadi · August 5, 2025, 4:29pm

@minn-bj Could you please confirm whether you have only one installation of Catalyst in your python enviroment?

Based on the error message, it looks like your program isn’t running inside an isolated environment (like a Conda or virtual environment). The logs show references to both /home/jovyan/.local/lib/python3.11/ and /opt/conda/lib/python3.11/, but there should be only one reference; from a single Python installation.

To fix this, please uninstall PennyLane and Catalyst from both your Conda base environment and your user-level Python (.local/bin and .local/lib). Then, create a clean environment using either venv or conda, and pip install these packages inside there.

Once you have a single, clean installation of PennyLane and Catalyst, you can compile using the verbose compilation option: @qml.qjit(verbose=True) in your code. This will show you which Python binary is being used during compilation and execution, and it should point to only one location.

Let us know if you need help with any of these steps.

minn-bj · August 8, 2025, 7:51am

Hey @maliasadi,

thanks you where totally right. I could fix the issue.

Many thanks and best regards.

minn-bj

Topic		Replies	Views
Parallel vectorized circuit execution with vmap Catalyst	9	227	August 25, 2025
Lightning.qubit CPU utilisation PennyLane Help	7	1170	February 16, 2021
Parallelization of circuit executions PennyLane Help	12	2998	December 4, 2023
Error faced in training the quantum network for estimating parameters PennyLane Help	31	2036	March 15, 2024
Lightning-gpu failing on multi-node/multi gpus PennyLane Help	23	1422	November 29, 2023

Multi-core computation with lightning.qubit and OpenMP support

Here are some benchmark results for a circuit with 36 Gates. Only the device changes

Related topics

Here are some benchmark results for a circuit with 36 Gates.
Only the device changes