CPU faster than GPU ; GPU utilization not increasing

I’m comparing the performance when using lightning.qubit(diff_method = ‘adjoint’)
on CPU(Intel Xeon W-2245 CPU @3.90 GHz) and on GPU(NVIDIA GeForce RTX 2080 Ti ). I’m trying to train a hybrid QNN for 100 epochs. I run on two environments: one, using tensorflow, and the other using tensorflow-gpu.

batch_size = 4 takes ~300s per epoch.
batch_size = 64 takes ~247s per epoch.

batch_size = 4 takes ~268s per epoch.
batch_size = 64 takes ~254s per epoch.
batch_size = 2000 takes ~272s per epoch.

I’m wondering why CPU is faster than the GPU?
Also, GPU utilization only goes as high as 3-5% (10417MiB / 11019MiB) despite increasing the batch size to 1024. Is it no longer possible to speed training up by utilizing GPU up to 60% or more?

A summary of my untrained model looks like this…
Model: “sequential_1”

sequential (Sequential) (None, 24) 14232

dense_1 (Dense) (None, 10) 250

keras_layer (KerasLayer) (None, 10) 0 (unused)

dense_2 (Dense) (None, 24) 264

Total params: 14,746
Trainable params: 14,746
Non-trainable params: 0

The QNN keraslayer contains IQPEmbedding and StronglyEntanglingLayers,
and it should have 90 parameters after training.
Please shed some light to this. Thank you.

Hi @qubi, welcome to the Forum!

Using a GPU can have large overheads for small numbers of qubits. As you go above 20 qubits you can start seeing an improvement vs CPU.

Does the comparative performance change if you increate the number of qubits?

Hi! Thank you for the response. That helps :slight_smile:

Upon increasing the number of qubits, I can see a speedup of up to 10%. But the GPU utilization is still quite low. It only reaches about 12%. Is there anything more that I can do with PennyLane in order to increase this?

My batch_size is already set to 512.

Hi @qubi, there may be a bottleneck somewhere. What is your CPU utilization?