Hi. I’m looking to leverage PyTorch + CUDA in exploring some QAOA simulations. However, I’m unsure if some aspects of qml.layer
are not GPU-amenable.
For example, I am attempting some sort of on-GPU calculation similar to this example.
My code is below:
sw versions
pennylane == 0.19.1
torch == 1.10.0
QAOA circuit setup:
def qaoa_circuit_from_graph(graph, n_layers):
n_wires = len(graph.nodes)
cost_h, mixer_h = qaoa.maxcut(graph)
def qaoa_layer(params):
gamma, beta = params[0], params[1]
qaoa.cost_layer(gamma, cost_h)
qaoa.mixer_layer(beta, mixer_h)
dev = qml.device("default.qubit", wires=n_wires)
@qml.qnode(dev, interface='torch', diff_method="backprop")
def circuit(params):
for w in range(n_wires):
qml.Hadamard(wires=w)
qml.layer(qaoa_layer, n_layers, params)
return [qml.expval(term) for term in cost_h.terms[1]]
return circuit, cost_h.terms[0][-1]
Test code (CPU):
n_layers = 10
params_shape = [n_layers, 2]
params = torch.rand(params_shape)
circuit, offset = qaoa_circuit_from_graph(graphs[0], n_layers)
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
circuit(params)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
------------------------- ------------ ------------ ------------ ------------ ------------ ------------
aten::mul 10.26% 6.465ms 28.59% 18.018ms 16.350us 1102
aten::to 7.03% 4.431ms 26.76% 16.863ms 7.194us 2344
aten::_to_copy 13.96% 8.797ms 19.73% 12.432ms 7.655us 1624
aten::slice 14.06% 8.862ms 16.03% 10.100ms 5.404us 1869
aten::einsum 5.46% 3.438ms 15.12% 9.526ms 56.035us 170
aten::cat 1.18% 743.000us 11.15% 7.025ms 27.657us 254
aten::_cat 5.08% 3.204ms 9.97% 6.282ms 24.732us 254
aten::roll 1.56% 982.000us 9.35% 5.894ms 46.778us 126
aten::stack 1.44% 909.000us 9.28% 5.847ms 46.405us 126
aten::div 2.91% 1.834ms 7.85% 4.945ms 14.544us 340
------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 63.022ms
CUDA output:
n_layers = 10
params_shape = [n_layers, 2]
params_cuda = torch.rand(params_shape, device='cuda')
circuit, offset = qaoa_circuit_from_graph(graphs[0], n_layers)
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
circuit(params_cuda)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::mul 13.12% 10.282ms 21.98% 17.220ms 15.626us 2.570ms 48.93% 2.570ms 2.332us 1102
aten::to 2.03% 1.593ms 17.66% 13.835ms 15.338us 0.000us 0.00% 349.000us 0.387us 902
cudaLaunchKernel 16.38% 12.831ms 16.38% 12.831ms 5.535us 0.000us 0.00% 0.000us 0.000us 2318
aten::einsum 4.89% 3.833ms 15.99% 12.529ms 73.700us 0.000us 0.00% 510.000us 3.000us 170
aten::_to_copy 4.23% 3.317ms 15.63% 12.242ms 32.472us 0.000us 0.00% 349.000us 0.926us 377
aten::stack 1.29% 1.013ms 12.87% 10.080ms 80.000us 0.000us 0.00% 549.000us 4.357us 126
aten::cat 0.59% 463.000us 10.28% 8.057ms 62.945us 0.000us 0.00% 559.000us 4.367us 128
aten::_cat 4.45% 3.488ms 9.69% 7.594ms 59.328us 559.000us 10.64% 559.000us 4.367us 128
aten::copy_ 3.05% 2.388ms 8.53% 6.683ms 17.180us 385.000us 7.33% 385.000us 0.990us 389
aten::slice 7.17% 5.621ms 8.50% 6.659ms 4.404us 0.000us 0.00% 0.000us 0.000us 1512
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 78.345ms
Self CUDA time total: 5.252ms
Any insights are appreciated. Thanks!