GPU underusage for hybrid QNN using lightning.gpu for research

Hi @daniela.murcillo !
Really appreciate the reply. I figured it out, the error was in the forward method of the QuantumLayer class in the way it triggered the quantum circuit.

I changed it from

def forward(self, x):
            batch_results = []
            for i in range(x.shape[0]):
                result = quantum_circuit(x[i], self.weights)
                result_tensor = torch.stack(result) if isinstance(result, list) else result
                batch_results.append(result_tensor)
            return torch.stack(batch_results, dim=0) 

To

def forward(self, inputs):
            exp_vals = quantum_circuit(inputs, self.weights)
            tensor_vals = torch.stack(exp_vals, dim=1)  
            return tensor_vals

Instead of iterating over the batch manually and invoking the quantum circuit for every sample, I processed it at once using the batch_obs = True parameter and stacking the results as a torch tensor and as torch.device is set to cuda it was moved to GPU fixing the issue.

Earlier it took ~60 min to train a 2 qubit + 2 layer model, but now with GPU support the same model takes ~15 min, a 4X speed up ! which will be really meaningful while running multiple seeds and even larger models.

I would still like to know if there could be any improvements in the code !