It maybe need a large number qubits
My understanding is that your code uses only 4 qubits and therefore requires only a little resources. If you want to see whether GPU worked as expected, you may increase the number of qubits to 20~25 using a simple test quantum circuit. Then you should see the utilization rate of GPU increases. For example, 21-qubit circuit could use about 1.6GB memory, and 24-qubit use about 2.9GB memory.