PennyLane Shots error with CUDA

Hi, I have read the Pennylane tutorial on Quantum GAN given in Quantum GANs | PennyLane Demos

So, when I enter shots = 1000:

dev = qml.device("lightning.qubit", wires=n_qubits, shots=1000)
...... generator
...... discriminator
##Training Loop
#Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
noise = torch.Tensor()
noise2 = torch.Tensor()
#fixed_noise = torch.rand(8, n_qubits, device=device) * math.pi / 2
#noise = torch.rand(batch_size, 800, device=device) * math.pi / 2
#noise2 = torch.rand(batch_size, 800, device=device) * math.pi / 2
#real_labels = torch.full((batch_size,), 1.0, dtype=torch.float, device=device)
#fake_labels = torch.full((batch_size,), 0.0, dtype=torch.float, device=device)
print("Starting Training Loop...")
#For each epoch
x=0
for epoch in range(num_epochs):
    x=0
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, device=device)
        # Generate batch of Spectra,  latent vectors, and Properties     
        for j in range(batch_size):
            excelIndex = x*batch_size+j
            try:
                gotdata = excelDataTensor[excelIndex]
            except IndexError:
                break
            tensorA = excelDataTensor[excelIndex].view(1,4)
            noise2 = torch.cat((noise2,tensorA),0)
            
            tensor1 = torch.cat((excelDataTensor[excelIndex],torch.rand(latent)))
            tensor2 = tensor1.unsqueeze(1) 
            tensor3 = tensor2.permute(1,0)
            noise = torch.cat((noise,tensor3),0)         
                              
        noise = noise.to(device)
        noise2 = noise2.to(device)
        
        # Forward pass real batch through D
        real_cpu = real_cpu.reshape(int(b_size), image_size*image_size*nc)
        output_real = netD.forward(real_cpu,noise2).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output_real, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output_real.mean().item()
              
        ## Train with all-fake batch                
        # Generate fake image batch with G
        fake_data = netG.forward(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output_fake = netD.forward(fake_data.detach(),noise2).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output_fake, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        D_G_z1 = output_fake.mean().item()
        # Add the gradients from the all-real and all-fake batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output_fake = netD.forward(fake_data,noise2).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output_fake, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output_fake.mean().item()
        # Update G
        optimizerG.step()

        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2), file=f)

        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

       #  Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(testTensor).view(100,nc,image_size,image_size).detach().cpu()
            img_list.append(vutils.make_grid(fake, nrow=10, padding=2, normalize=True))

        iters += 1
        noise = torch.Tensor()
        noise2 = torch.Tensor()     
        x += 1
    if epoch % 2 == 0:
        ##Update folder location
        torch.save(netG, save_dir + 'netG' + str(epoch) + '.pt')
        torch.save(netD, save_dir + 'netD' + str(epoch) + '.pt')

The generator is same used in the tutorial.

The discriminator is:

class Discriminator(nn.Module):
    """Fully connected classical discriminator"""

    def __init__(self):
        super().__init__()
                
        self.l1 = nn.Linear(4, image_size*image_size*nc, bias=False)

        self.model = nn.Sequential(
            # Inputs to first hidden layer (num_input_features -> 64)
            nn.Linear(2 * image_size * image_size * nc, 64),
            nn.ReLU(),
            # First hidden layer (64 -> 16)
            nn.Linear(64, 16),
            nn.ReLU(),
            # Second hidden layer (16 -> output)
            nn.Linear(16, 1),
            nn.Sigmoid(),
        )


    def forward(self, inputs, label):
        x1 = inputs
        x2 = self.l1(label)
        #x2 = x2.reshape(int(b_size),nc,image_size,image_size)
        combine = torch.cat((x1,x2),1)
        combine = self.model(combine)
        return combine

#Create the Discriminator
netD = Discriminator().to(device)

#Print the model
print(netD)

I get the following error:

Starting Training Loop...
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 64
     62 output = netD.forward(fake.detach(),noise2).view(-1)
     63 # Calculate D's loss on the all-fake batch
---> 64 errD_fake = criterion(output, label)
     65 # Calculate the gradients for this batch
     66 errD_fake.backward()

File ~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/loss.py:619, in BCELoss.forward(self, input, target)
    618 def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 619     return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)

File ~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/functional.py:3098, in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
   3095     new_size = _infer_size(target.size(), weight.size())
   3096     weight = weight.expand(new_size)
-> 3098 return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)

RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The output for qml.about() is:

Name: PennyLane
Version: 0.31.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /dgxb_home/se21pphy004/miniconda3/envs/myenv/lib/python3.8/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU

Platform info:           Linux-5.4.0-144-generic-x86_64-with-glibc2.17
Python version:          3.8.17
Numpy version:           1.24.3
Scipy version:           1.10.0
Installed devices:
- default.gaussian (PennyLane-0.31.0)
- default.mixed (PennyLane-0.31.0)
- default.qubit (PennyLane-0.31.0)
- default.qubit.autograd (PennyLane-0.31.0)
- default.qubit.jax (PennyLane-0.31.0)
- default.qubit.tf (PennyLane-0.31.0)
- default.qubit.torch (PennyLane-0.31.0)
- default.qutrit (PennyLane-0.31.0)
- null.qubit (PennyLane-0.31.0)
- lightning.qubit (PennyLane-Lightning-0.31.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.31.0)

Also, I am using CUDA 11.7.
The datasets I am using contains RGB images and absorption spectra details. So, in the code:

netD.forward(self, inputs, label)

The inputs is the RGB image of 8x8x3 dimensions and labels is the absorption spectra details.

Without mentioning the shots, it is working perfectly. However, when I increase the shots, the code is not completing even 1 epoch cycle. What do you think is the issue?

Hello @mass_of_15,

The default for dev = qml.device(“lightning.qubit”, wires=n_qubits) is “shots=None”, which means you receive the exact expectation value using this simulator without needing to enter the number of shots. In certain workflows you can’t change the number of shots.

There is an example of the Hadamard circuit on the bottom of the “Using PennyLane”, Measurements page:
fix seed to make results reproducable
np.random.seed(1)

dev = qml.device(“default.qubit”, wires=1)

@qml.qnode(dev)
def circuit():
qml.Hadamard(wires=0)
return qml.expval(qml.PauliZ(0))

Running the simulator with shots=None returns the exact expectation value.
circuit(shots=None)
0.0

If you have the option to change the number of shots:
circuit(shots=10)
0.2
circuit(shots=1000)
-0.062
circuit(shots=100000)
0.00056

Where 100000 shots still does not return the exact ‘0.0’ value.

Reference:
https://docs.pennylane.ai/en/stable/introduction/measurements.html

Thank you. Now I understand why, in some cases, shots don’t matter. So when I used shots = 1000 for the QGAN given in the PennyLane tutorial: Quantum GANs | PennyLane Demos

The loss for the generator and discriminator is:

Iteration: 10, Discriminator Loss: 0.331, Generator Loss: 2.873
Iteration: 20, Discriminator Loss: 0.010, Generator Loss: 5.933
Iteration: 30, Discriminator Loss: 0.654, Generator Loss: 4.190
Iteration: 40, Discriminator Loss: 0.101, Generator Loss: 5.487
Iteration: 50, Discriminator Loss: 0.044, Generator Loss: 4.667
Iteration: 60, Discriminator Loss: 0.047, Generator Loss: 17.864
Iteration: 70, Discriminator Loss: 0.006, Generator Loss: 6.919
Iteration: 80, Discriminator Loss: 0.002, Generator Loss: 15.510
Iteration: 90, Discriminator Loss: 0.016, Generator Loss: 9.094
Iteration: 100, Discriminator Loss: 0.007, Generator Loss: 7.298
Iteration: 110, Discriminator Loss: 0.013, Generator Loss: 8.736
Iteration: 120, Discriminator Loss: 0.001, Generator Loss: 13.343
Iteration: 130, Discriminator Loss: 0.001, Generator Loss: 26.183
Iteration: 140, Discriminator Loss: 0.001, Generator Loss: 16.978
Iteration: 150, Discriminator Loss: 0.001, Generator Loss: 25.154
Iteration: 160, Discriminator Loss: 0.011, Generator Loss: 6.185
Iteration: 170, Discriminator Loss: 0.001, Generator Loss: 14.403
Iteration: 180, Discriminator Loss: 0.013, Generator Loss: 9.359
Iteration: 190, Discriminator Loss: 0.001, Generator Loss: 12.472
Iteration: 200, Discriminator Loss: 0.014, Generator Loss: 21.037
Iteration: 210, Discriminator Loss: 0.001, Generator Loss: 22.464
Iteration: 220, Discriminator Loss: 0.009, Generator Loss: 20.998
Iteration: 230, Discriminator Loss: 0.034, Generator Loss: 12.281
Iteration: 240, Discriminator Loss: 0.509, Generator Loss: 3.713
Iteration: 250, Discriminator Loss: 0.729, Generator Loss: 0.594
Iteration: 260, Discriminator Loss: 0.449, Generator Loss: 2.778
Iteration: 270, Discriminator Loss: 0.326, Generator Loss: 6.632
Iteration: 280, Discriminator Loss: 0.002, Generator Loss: 48.856
Iteration: 290, Discriminator Loss: 0.002, Generator Loss: 17.971
Iteration: 300, Discriminator Loss: 0.000, Generator Loss: 45.363
Iteration: 310, Discriminator Loss: 0.030, Generator Loss: 29.401
Iteration: 320, Discriminator Loss: 0.002, Generator Loss: 55.807
Iteration: 330, Discriminator Loss: 0.001, Generator Loss: 55.701
Iteration: 340, Discriminator Loss: 0.003, Generator Loss: 55.690
Iteration: 350, Discriminator Loss: 0.000, Generator Loss: 54.338
Iteration: 360, Discriminator Loss: 0.001, Generator Loss: 51.659
Iteration: 370, Discriminator Loss: 0.000, Generator Loss: 49.874
Iteration: 380, Discriminator Loss: 0.000, Generator Loss: 55.381
Iteration: 390, Discriminator Loss: 0.002, Generator Loss: 52.047
Iteration: 400, Discriminator Loss: 0.000, Generator Loss: 52.939
Iteration: 410, Discriminator Loss: 0.001, Generator Loss: 53.262
Iteration: 420, Discriminator Loss: 0.000, Generator Loss: 53.103
Iteration: 430, Discriminator Loss: 0.001, Generator Loss: 53.952
Iteration: 440, Discriminator Loss: 0.000, Generator Loss: 55.112
Iteration: 450, Discriminator Loss: 0.000, Generator Loss: 54.127
Iteration: 460, Discriminator Loss: 0.000, Generator Loss: 55.490
Iteration: 470, Discriminator Loss: 0.000, Generator Loss: 54.949
Iteration: 480, Discriminator Loss: 0.000, Generator Loss: 52.649
Iteration: 490, Discriminator Loss: 0.000, Generator Loss: 53.203
Iteration: 500, Discriminator Loss: 0.000, Generator Loss: 57.233

It can be visualized like this:

So, as you can see, I don’t think the discriminator is doing its job. I want to understand from this example how shots affect the process. Can you clarify using this example?

Also, will I have to use shots if I use hardware instead of a simulator? Or will it be irrelevant for this example?

Hey @mass_of_15! And thank you for that example @kevinkawchak :+1:

Glad your code is working! I’m not sure what you did to fix it, but sometimes

RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

means that there’s just a “normal” (i.e., not CUDA related) error with your code, and the course of action here is to just not run your code on a GPU, debug the error there for simplicity, and then try the GPU again.

So, as you can see, I don’t think the discriminator is doing its job. I want to understand from this example how shots affect the process. Can you clarify using this example?

Having a finite value of shots will simply introduce some statistical noise into your code, which can have all sorts of rippling effects with how your model performs on the given task. There are some crucial parts of your code that are missing, so it’s hard to give a better answer than that.

Also, will I have to use shots if I use hardware instead of a simulator? Or will it be irrelevant for this example?

If you use hardware, you’ll need a finite value of shots, yes.

Hope this helps!

As I said, when I didn’t mention shots, i.e. shots = None, the code was working perfectly fine. However, when shots=1000, I got the above error. I just want to know how shots is changing the results.

The output for generator print(fake_data):

tensor([   nan,    nan, 0.4863,    nan, 0.4838, 0.4855, 0.4809,    nan, 0.4791,
        0.4831, 0.4830,    nan, 0.4775,    nan, 0.4814,    nan],
       device='cuda:0', grad_fn=<ViewBackward0>)

As you can see, NaN is coming in between.

Oh! Sorry for misunderstanding. In that case, it would be extremely helpful for me to have a minimal but complete working example that replicates the error you’re seeing. When you send that, I can try to see what’s going on :slight_smile:

Hey @mass_of_15, just wanted to check in and see if you’ve fixed your issue? If / when your issue is fixed, we have a new PennyLane survey . Let us know your thoughts about PennyLane so that we can keep bringing you amazing features :sparkles:.