Pennylane running on torch.cuda?

hello guys! thank your reading this.does anyone know how to run pennylane code on torch.cuda ,
thank you if know this,much appreciate!

Hi @oceaneyyys

You can create your state-vector using the Torch backend on a CUDA device with

dev = qml.device("default.qubit.torch", wires=2, torch_device="cuda:0")

and your circuit parameters too with something similar to

data = torch.rand(2, 2, device="cuda:0")

which should ensure the data is all allocated on the GPU at index 0. Hope this helps!

hi,thank you for your reply,so appreciated.
but i got error when i run this code,so how can i make this work,i would so appreciate it if you know the solution.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I have encountered this error before. There is somehow a conflict between resource allocation.

It’s a while ago, but the way I resolved this issue eventually was:

a. to install a new conda environment from scratch
b.to ensure CUDA was properly installed and the right version.

I hope this helps

Berend

thank you so much for your reply!so aprreciate!

do you use the code:
dev = qml.device(“default.qubit.torch”, wires=2, torch_device=“cuda:0”)

or this code:
dev = qml.device(“lightning.gpu”, wires=n_qubits)

it’s bit confusing to use the cuda on pytorch and pennylane

The code that caused my problem was very different. My bet would be that the first line is the correct one. I could be wrong but you can only learn through trial and error.

dev = qml.device(“default.qubit.torch”, wires=2, torch_device=“cuda:0”)

Hi @oceaneyyys

If you can give us a minimum working example of your code (a script that executes end-to-end and reproduces your error), that should help us to identify where the problem occurs.

thank you for your reply!

thank you for your help! my code is below:

import pennylane as qml
import torch
import torch.nn as nn







# dev = qml.device('default.qubit', wires=4)
dev = qml.device('default.qubit.torch', wires=4, torch_device='cuda')


def circuit(inputs,weights):
    qml.AngleEmbedding(features=inputs, wires=range(4), rotation='Y')
    for i in range(4):
        qml.RX(weights[i], wires=i)
    return qml.expval(qml.PauliZ(0))


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        weight_shapes = {"weights": (4)}
        qnode = qml.QNode(circuit, dev, interface='torch')
        self.ql1 = qml.qnn.TorchLayer(qnode, weight_shapes)
      

    def forward(self, X):
        X=self.ql1(X)
        X=torch.nn.functional.sigmoid(X)

        return X




torch_cuda=torch.device("cuda")
inputs = torch.rand(10, 4)
labels = torch.randint(0, 2, (10,))



network=Net()
criterion = nn.BCELoss() # loss function
optimizer = torch.optim.SGD(network.parameters(), lr = 0.01) # optimizer


inputs.to(torch_cuda)
labels.to(torch_cuda)
network.to(torch_cuda)
epochs=10
for epoch in range(epochs):
       

        
        tr_loss = 0
        labels=labels.type(torch.FloatTensor)
        optimizer.zero_grad()
        outputs = network(inputs)


        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        tr_loss =loss.data.numpy()


        print(tr_loss)






and i got the code error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

thank your help again,so appreciated!

Hi @oceaneyyys,
Thank you for sharing this example.

Could you please post the output of qml.about()?
And please also post your full error traceback. It will help us find the issue :smiley:

Hi @oceaneyyys

The issue here seems to be related to your torch input data. When casting the labels in your loop from an integer array to a float array, the data then resides on the CPU.

I made some modifications to the locality of your data in the provided script, and this allows it to run end to end:

import pennylane as qml
import torch
import torch.nn as nn

dev = qml.device('default.qubit.torch', wires=4, torch_device='cuda')

def circuit(inputs,weights):
    qml.AngleEmbedding(features=inputs, wires=range(4), rotation='Y')
    for i in range(4):
        qml.RX(weights[i], wires=i)
    return qml.expval(qml.PauliZ(0))

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        weight_shapes = {"weights": (4)}
        qnode = qml.QNode(circuit, dev, interface='torch')
        self.ql1 = qml.qnn.TorchLayer(qnode, weight_shapes)

    def forward(self, X):
        X=self.ql1(X)
        X=torch.nn.functional.sigmoid(X)

        return X

torch_cuda=torch.device("cuda")
inputs = torch.rand(10, 4, device=torch_cuda)
labels = torch.randint(0, 2, (10,), device=torch_cuda)

inputs.to(torch_cuda)
labels.to(torch_cuda)
network=Net()
network.to(torch_cuda)
epochs=10

criterion = nn.BCELoss() # loss function
optimizer = torch.optim.SGD(network.parameters(), lr = 0.01) # optimizer

for epoch in range(epochs):
    tr_loss = 0 
        labels=labels.type(torch.FloatTensor)
        labels = labels.to("cuda")

        optimizer.zero_grad()
        outputs = network(inputs)

        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        cpu_loss = loss.to("cpu")
        tr_loss = cpu_loss.data.numpy()

        print(tr_loss)

With Torch, to ensure end-to-end execution on the GPU, you often need to be explicit about where the data lives — even if you are working on an initially GPU-place buffer operations on it may sometimes give you CPU data back. Hope this helps.

1 Like

thank you so much! it worked.i’m so grateful for your constant help! much appreciated!

thank you so much! it worked.i’m so grateful for your constant help! much appreciated!

thank you for your reply!

1 Like