Quantum circuits cannot utilize multiple CPUs for computation

We integrate classical encoders with quantum circuits to implement a QNN and find that the inclusion of quantum circuits prevents the use of multiple CPUs on a server, allowing only a single CPU for computation. What could be the reason for this, and we need to call the TensorFlow interface?

def layer(x, params, wires, i0=0, inc=1):
    i = i0
    for j, wire in enumerate(wires):
        qml.Hadamard(wires=[wire])
        qml.RZ(x[i % len(x)], wires=[wire])
        i += inc
        qml.RY(params[0, j], wires=[wire])

    qml.broadcast(unitary=qml.CRZ, pattern="ring", wires=wires, parameters=params[1])

def ansatz(x, params, wires):
    for _ in range(num_layers):
        for j, layer_params in enumerate(params):
            layer(x, layer_params, wires, i0=j * len(wires))


dev = qml.device("default.qubit", wires=num_wires)
wires = list(range(num_wires))

@qml.qnode(dev, interface='tf', diff_method='backprop')
def kernel(x1, x2, params):
    ansatz(x1, params, wires)
    
    qml.adjoint(ansatz)(x2, params, wires)
    return qml.expval(qml.Projector([0]*num_wires, wires=wires))


def compute_kernel_matrix(image_embeddings, text_embeddings, params):
    kernel_matrix = tf.zeros((batch_size, batch_size), dtype=tf.float32)

    for i in tf.range(batch_size):
        for j in tf.range(batch_size):
            val = kernel(image_embeddings[i], text_embeddings[j], params)
            indices = [[i, j]]  
            updates = [val]  
            kernel_matrix = tf.tensor_scatter_nd_update(kernel_matrix, indices, updates)
    
    return kernel_matrix

Hey @zj-lucky,

I’m not sure I understand the issue :thinking:. It sounds like a server issue and not a PennyLane issue. Are you able to run this code as-expected on your laptop, say?

The issue isn’t with the server; I want to know how to enable code based on PennyLane to utilize a multi-threaded CPU during training, as currently, it can only use one CPU.

Thanks! You can try to set a max_workers value when creating a default-qubit device (see docs here: qml.devices.default_qubit.DefaultQubit — PennyLane 0.35.0 documentation). Let me know if that helps!

Thanks very much !

It’s possible that the guidance provided could be of help; however, regrettably, I didn’t fully grasp the details in the documentation. Would you mind assisting in adapting the previously mentioned code according to the instructions outlined in the document? Thank you for your assistance.

For sure! It’s as simple as specifying the keyword argument when creating a device:

dev = qml.device("default.qubit", max_workers=2)

@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    qml.Hadamard(1)
    return qml.state()

circuit()

max_workers refers to the number of processes you try to run on your machine. Now, for my laptop, if I run this I get:

UserWarning: The device requested 16 threads (2 processes
                times 8 threads per process), but the processor only has
                8 logical cores. The processor is likely oversubscribed, which may
                lead to performance deterioration. Consider decreasing the number of processes,
                setting the device or execution config argument `max_workers=1`
                for example, or decreasing the number of threads per process by setting the
                environment variable `OMP_NUM_THREADS=4`.

tensor([0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j], requires_grad=True)

All I need to do to make sure that the number of workers works with the number of cores on my laptop is put this at the start of my code:

import os 

os.environ['OMP_NUM_THREADS']=str(4)

The OMP_NUM_THREADS environment variable specifies the number of threads to use for each process. Since my machine has 8 cores, 2 workers / processes \times 4 threads per process equals 8 cores. All good! Let me know if this helps :slight_smile: