Running quantum circuits in GPU

NikSchet · December 20, 2021, 1:09pm

Hello all,

I am having some difficulties trying to run a standard qnode in a hybrid NN (same as in the QML/Turning quantum nodes into Keras Layers tutorial) using GPU instead of CPU. Is there an easy way to do this?

p.s. i have tried using CUDA but the @jit decorator doesnt work for quantum nodes.

Thank you in advance

mlxd · December 20, 2021, 2:37pm

Hi @NikSchet thanks for the question. Do you have some example code we can look to help identify the problem here.

NikSchet · December 20, 2021, 3:51pm

Thank you for your reply. Yes please check my code here:

github.com

nsansen/Quantum-Machine-Learning/blob/main/Pennylane DEMO v4.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Quantum Binary classifier using Keras\n",
    "\n",
    "##   Nikolaos Schetakis , nsxetakis@yahoo.gr\n",
    "\n",
    "\n",
    "### This notebook runs with Python 3.7-3.8 \n"
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

The problem i am trying to solve at the end of the day is to increase the number of qubits i am using to 25 because i my new dataset has 25features (the max qubits i am currently able to use is 17qubits, high a desktop computer: Gpu…Nvidia 3060, CPU…AMDRyzen7)

mlxd · December 20, 2021, 6:32pm

Hi @NikSchet I was able to get your code to run on my GPU (1060) with minimal changes:

I will attach a modified notebook with some comments (rename from .txt to .ipynbPennyLane_GPU_Example.txt (22.1 KB) ).

Some of the changes are as follows:

I added the following to ensure there can be selective control of the GPU device. 0 should be your default GPU, and -1 should allow you to hide the GPU from TensorFlow

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Next, I add @tf.function to ensure the QNode JIT compiles

@tf.function
@qml.qnode(dev, interface="tf", diff_method="backprop")
def qnode(inputs, weights):

For all data in Section 6, I have modified them to np.float32.
In addition, the following lines allow selective choice between CPU and GPU (pick whichever you wish to run on):

with tf.device('/device:CPU:0'):
with tf.device('/device:GPU:0'):

This should allow you to get the code running on your GPU. Now, you mentioned having memory issues also. Backprop is a notoriously memory hungry algorithm for derivatives, and may not unfortunately allow you to run large-qubit algorithms with access to a large workstation / cluster. Generally, we try to use high-memory systems, and supercomputer / cloud grade GPUs for the upper 20 qubit regimes.

Though, if you wish to wait a little longer, you can actually run larger optimization problems on the CPU using the lightning.qubit device and the adjoint differentiation method. This trades memory for compute time, and will be OpenMP parallelized on Linux / MacOS machines (if you are running Windows, you can use WSL to get the parallelized version). It should be possible to swap one device for the other, if the memory wall becomes a problem reaching the 25qubit regime with backprop.

Let us know if you require any further assistance.

NikSchet · December 20, 2021, 7:27pm

Thank you very much for your time that worked!!!
p.s. code runs smoothly for 21 qubits

jackaraz · December 24, 2021, 1:46am

Hi @mlxd just wanted to iterate on this thread since it’s related. I’m trying to implement something similar in my framework and it works nicely with "default.qubit.tf" + backprop but when I switch to qiskit.aer + parameter-shift I can not get any gradient tf.GradientTape() gives me zero all the time. Is there another way to use qiskit’s simulator with tf.vectorized_map?

Thanks

mlxd · December 24, 2021, 9:17am

Hi @jackaraz, if you have a minimum working example of your code we can take a look. Feel free to drop it here.

jackaraz · December 24, 2021, 2:00pm

Hi @mlxd here is a minimal example. I believe the problem occurs due to parameter-shift and since Qiskit does not allow it, it doesn’t work properly.

import tensorflow as tf
import pennylane as qml
from pennylane import numpy as np

dev1 = qml.device("qiskit.aer", wires = 2, shots=10, backend='qasm_simulator')
dev2 = qml.device("default.qubit.tf", wires = 2, shots=None)

@qml.qnode(dev2, diff_method="backprop", interface="tf")
def circuit2(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

@qml.qnode(dev1, diff_method="parameter-shift", interface="tf")
def circuit1(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10,2), dtype=tf.float64)

circ = tf.function(circuit2)
contract = lambda inpts : tf.vectorized_map(lambda vec: circ(vec, weights), inpts)
with tf.GradientTape() as tape:
    yhat = tf.reduce_mean(contract(inputs))
tape.gradient(yhat, weights)

# Output : <tf.Tensor: shape=(2,), dtype=float64, numpy=array([ 1.38777878e-17, -2.77555756e-17])>

circ = tf.function(circuit1)
contract = lambda inpts : tf.vectorized_map(lambda vec: circ(vec, weights), inpts)
with tf.GradientTape() as tape:
    yhat = tf.reduce_mean(contract(inputs))
tape.gradient(yhat, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0., 0.])>

My yhat is poorly choosen but it shows whats happening here I believe. circuit1 always gives zero gradient no matter what but I can get proper results from circuit2. Also If instead of using vectorized_map if I use the batching function I wrote here it gives me a good gradient result as well but this does not parallelize the execution on GPU. So I believe I need to use vectorized_map to parallelize the batch execution or is there any other way that you can suggest?

Thanks

mlxd · December 24, 2021, 3:03pm

Hi @jackaraz I had a quick look at your example but it is not clear to me what should be happening on the GPU side. I think this may better be created as an issue in the PennyLane repo, as it will allow the rest of the team to have a look.

jackaraz · December 24, 2021, 3:22pm

Hi @mlxd, thanks I’ll move it to the github then.

Topic		Replies	Views
How to use GPU to accelerate the hybrid QNN PennyLane Help	6	604	March 8, 2024
GPU usage zero with Pennylane and PyTorch PennyLane Help	4	354	July 23, 2023
Can pennylane support GPU speed_up for quantum circuit simulation? PennyLane Help	1	796	May 10, 2021
Running PennyLane parallel on several CPU cores PennyLane Help	6	1575	January 13, 2021
Trouble with GPU Training Speed in Hybrid Quantum-Classical Model PennyLane Help	3	110	August 8, 2024

Running quantum circuits in GPU

Related topics