Hi @charliechiou ,
At the moment you don’t have any broadcasting in your circuit nor anything that would make it run in parallel.
Broadcasting itself doesn’t parallelize but is makes the looping (e.g. the for
loops) happen in C instead of Python which is more efficient.
The code below shows how to do actual broadcasting.
import torch
import torch.nn as nn
import pennylane as qml
import matplotlib.pyplot as plt
class QNN(nn.Module):
def __init__(self):
super(QNN, self).__init__()
self.qubit_num = 4
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
print("Using GPU")
q_device = qml.device("default.qubit.torch", wires=self.qubit_num, torch_device="cuda")
else:
print("Using CPU")
# Set max_workers depending on the number of cores you have available
q_device = qml.device("default.qubit", wires=self.qubit_num, max_workers=2)
# Set diff_method="adjoint"
self.QNode = qml.QNode(self.qnn_circuit, q_device, interface="torch", diff_method="adjoint")
self.measure_set = [0, 2]
self.p_rotation = nn.Parameter(torch.rand(12) * torch.pi, requires_grad=True)
def qnn_circuit(self, embedding):
qml.AngleEmbedding(features=embedding, wires=range(4))
for i in range(4):
qml.RY(self.p_rotation[i], wires=i)
return [qml.expval(qml.PauliZ(w)) for w in self.measure_set]
def forward(self, x):
x = x / torch.norm(x, dim=1, keepdim=True) # normalize
# You don't need to split the dataset into the different datapoints
#outputs = [torch.tensor(self.QNode(xi), device=self.device) for xi in x]
# You can use broadcasting so that the looping occurs in C which is more efficient
outputs = self.QNode(x)
return torch.stack(outputs, dim=1) # Add the dimension so that the results get stacked properly
model = QNN()
x = torch.randn(5,4)
output = model(x)
#print("Output shape:", output.shape)
print(output)
In addition to broadcasting …
Since you’re using default.qubit
you can set max_workers
as seen in the docs to use a pool of at most max_workers
processes asynchronously. This doesn’t guarantee that they will all be used, this is just a maximum. Note that backprop
doesn’t work with this so you’ll need to change the diff_method
to diff_method="adjoint"
.
Take a look at the code above and let me know if you have any further questions. You can also check out our performance page to learn about other simulators we have.
I hope this helps!