Hi there!
Weβve implemented a trainable QuanvolutionLayer for PyTorch:
class QonvLayer(nn.Module):
    def __init__(self, stride=2, device="default.qubit", wires=4, circuit_layers=4, n_rotations=8, out_channels=4, seed=None):
	super(QonvLayer, self).__init__()
	
	# init device
	self.wires = wires
	self.dev = qml.device(device, wires=self.wires)
	
	self.stride = stride
	self.out_channels = min(out_channels, wires)
	
	if seed is None:
	    seed = np.random.randint(low=0, high=10e6)
	    
	print("Initializing Circuit with random seed", seed)
	
	# random circuits
	@qml.qnode(device=self.dev)
	def circuit(inputs, weights):
	    n_inputs=4
	    # Encoding of 4 classical input values
	    for j in range(n_inputs):
	        qml.RY(inputs[j], wires=j)
	    # Random quantum circuit
	    RandomLayers(weights, wires=list(range(self.wires)), seed=seed)
	    
	    # Measurement producing 4 classical output values
	    return [qml.expval(qml.PauliZ(j)) for j in range(self.out_channels)]
	
	weight_shapes = {"weights": [circuit_layers, n_rotations]}
	self.circuit = qml.qnn.TorchLayer(circuit, weight_shapes=weight_shapes)
    
    
    def draw(self):
	# build circuit by sending dummy data through it
	_ = self.circuit(inputs=torch.from_numpy(np.zeros(4)))
	print(self.circuit.qnode.draw())
	self.circuit.zero_grad()
	
    
    def forward(self, img):
	bs, h, w, ch = img.size()
	if ch > 1:
	    img = img.mean(axis=-1).reshape(bs, h, w, 1)
	                
	kernel_size = 2        
	h_out = (h-kernel_size) // self.stride + 1
	w_out = (w-kernel_size) // self.stride + 1
	
	
	out = torch.zeros((bs, h_out, w_out, self.out_channels))
	
	# Loop over the coordinates of the top-left pixel of 2X2 squares
	for b in range(bs):
	    for j in range(0, h_out, self.stride):
	        for k in range(0, w_out, self.stride):
	            # Process a squared 2x2 region of the image with a quantum circuit
	            q_results = self.circuit(
	                inputs=torch.Tensor([
	                    img[b, j, k, 0],
	                    img[b, j, k + 1, 0],
	                    img[b, j + 1, k, 0],
	                    img[b, j + 1, k + 1, 0]
	                ])
	            )
	            # Assign expectation values to different channels of the output pixel (j/2, k/2)
	            for c in range(self.out_channels):
	                out[b, j // kernel_size, k // kernel_size, c] = q_results[c]
	                
	         
	return out
Experiment I: Training with 1 Quanvolutional Layer
Net:
model = torch.nn.Sequential(
    QonvLayer(stride=2, circuit_layers=2, n_rotations=4, out_channels=4),
    torch.nn.Flatten(),
    torch.nn.Linear(in_features=14*14*4, out_features=10)
)
Training output:
Epoch: 0 	Step: 0 	Accuracy: 0.25 	Loss: 2.353778839111328
Gradients Layer 0
tensor([[-4.6585e-03, -1.5023e-01,  1.0962e-17,  3.5731e-18],
	[-4.6677e-03,  2.9001e-02, -9.6852e-19,  0.0000e+00]])
Current Circuit:
 0: ββRY(0.0)ββRZ(0.397)βββRZ(1.178)βββββββββββββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)ββRZ(4.088)βββXββββββββββRZ(3.61)ββββXββββββββββββββ€ β¨Zβ© 
 2: ββRY(0.0)ββββββββββββββ°CββββββββββRY(4.173)βββ°CββRY(5.072)βββ€ β¨Zβ© 
 3: ββRY(0.0)ββRZ(1.785)βββRZ(5.903)βββββββββββββββββββββββββββββ€ β¨Zβ© 
---------------------------------------
Epoch: 0 	Step: 1 	Accuracy: 0.0 	Loss: 3.284860610961914
Gradients Layer 0
tensor([[-2.2039e-02, -4.9558e-01,  2.5899e-16, -1.4661e-16],
	[-1.2097e-02,  2.3364e-01,  1.9031e-16,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 2 	Accuracy: 0.25 	Loss: 2.0575411319732666
Gradients Layer 0
tensor([[-1.3089e-02, -9.6094e-02,  8.8986e-17,  9.1110e-17],
	[-7.3473e-03,  6.8553e-02,  6.4072e-18,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 3 	Accuracy: 0.25 	Loss: 3.791848659515381
Gradients Layer 0
tensor([[-8.5180e-02, -6.7926e-01,  2.9336e-16, -6.0067e-16],
	[-5.4367e-02,  3.8367e-01, -2.3463e-16,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 4 	Accuracy: 0.0 	Loss: 4.429379463195801
Gradients Layer 0
tensor([[-5.6071e-02, -9.5350e-01, -4.0188e-16, -5.2387e-16],
	[-4.3445e-02,  4.8110e-01,  8.5049e-18,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 5 	Accuracy: 0.0 	Loss: 2.415179967880249
Gradients Layer 0
tensor([[-3.7586e-02, -3.1990e-01,  6.1641e-19, -7.2152e-17],
	[-2.7385e-02,  1.1129e-01,  2.5546e-18,  0.0000e+00]])
Current Circuit:
 0: ββRY(0.0)ββRZ(0.397)βββRZ(1.178)ββββββββββββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)ββRZ(4.13)ββββXββββββββββRZ(3.653)βββXβββββββββββββ€ β¨Zβ© 
 2: ββRY(0.0)ββββββββββββββ°CββββββββββRY(4.216)βββ°CββRY(5.03)βββ€ β¨Zβ© 
 3: ββRY(0.0)ββRZ(1.785)βββRZ(5.903)ββββββββββββββββββββββββββββ€ β¨Zβ© 
---------------------------------------
Epoch: 0 	Step: 6 	Accuracy: 0.25 	Loss: 2.0272059440612793
Gradients Layer 0
tensor([[-1.3096e-03, -1.6318e-01, -5.6946e-18,  1.0381e-17],
	[-2.0847e-03,  3.5787e-02, -5.7634e-18,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 7 	Accuracy: 0.25 	Loss: 3.111910820007324
Gradients Layer 0
tensor([[-4.7392e-02, -5.0553e-01,  1.0673e-16,  3.0912e-17],
	[-3.8066e-02,  2.5993e-01, -1.4139e-16,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 8 	Accuracy: 0.0 	Loss: 2.9227261543273926
Gradients Layer 0
tensor([[-5.8086e-02, -3.5329e-01,  3.0340e-17,  1.1894e-16],
	[-4.1156e-02,  1.4573e-01,  1.4530e-16,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 9 	Accuracy: 0.0 	Loss: 2.6818065643310547
Gradients Layer 0
tensor([[-1.1859e-01, -2.2195e-01,  8.9172e-18, -3.7483e-17],
	[-8.8680e-02,  2.1465e-01,  4.7048e-18,  0.0000e+00]])
---------------------------------------
Epoch: 0 	Step: 10 	Accuracy: 0.0 	Loss: 2.707582950592041
Gradients Layer 0
tensor([[-6.6730e-03, -3.5080e-01,  9.0117e-18, -9.0980e-19],
	[ 4.0098e-03,  6.0043e-02,  1.0396e-17,  0.0000e+00]])
Current Circuit:
 0: ββRY(0.0)ββRZ(0.397)βββRZ(1.178)βββββββββββββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)ββRZ(4.172)βββXββββββββββRZ(3.695)βββXββββββββββββββ€ β¨Zβ© 
 2: ββRY(0.0)ββββββββββββββ°CββββββββββRY(4.258)βββ°CββRY(4.991)βββ€ β¨Zβ© 
 3: ββRY(0.0)ββRZ(1.785)βββRZ(5.903)βββββββββββββββββββββββββββββ€ β¨Zβ© 
Training takes really long, but at least the net achieves ca. 70%-80% accuracy on MNIST.
Experiment II: Training with 2 Quanvolutional Layer
Net:
model = torch.nn.Sequential(
    QonvLayer(stride=2, circuit_layers=2, n_rotations=4, out_channels=4),
    QonvLayer(stride=2, circuit_layers=2, n_rotations=4, out_channels=4),
    torch.nn.Flatten(),
    torch.nn.Linear(in_features=7*7*4, out_features=10)
)
Training output:
Epoch: 0 	Step: 0 	Accuracy: 0.25 	Loss: 2.3005757331848145
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-9.2107e-03, -3.4147e-02, -9.6166e-03, -1.6073e-02],
	[-9.2107e-03, -3.4147e-02, -3.1461e-03,  7.0380e-18]])
Current Circuit Layer 0:
 0: ββRY(0.0)βββRX(3.225)βββRX(3.592)ββRX(5.593)βββRX(5.953)βββββββββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)βββCβββββββββββRX(2.63)βββRY(4.176)βββCβββββββββββRX(2.43)ββRY(2.163)βββ€ β¨Zβ© 
 2: ββRY(0.0)βββ°XβββββββββββCββββββββββββββββββββββ°XβββββββββββCβββββββββββββββββββββ€ β¨Zβ© 
 3: ββRY(0.0)βββββββββββββββ°Xββββββββββββββββββββββββββββββββββ°Xβββββββββββββββββββββ€ β¨Zβ© 
Current Circuit Layer 1:
 0: ββRY(0.0)βββRY(1.79)ββRY(1.847)ββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)βββRY(2.06)ββRY(1.588)ββββββββββββββββββ€ β¨Zβ© 
 2: ββRY(0.0)βββXβββββββββRZ(2.124)βββXββRZ(4.867)βββ€ β¨Zβ© 
 3: ββRY(0.0)βββ°CβββββββββRY(1.193)βββ°CββRY(3.918)βββ€ β¨Zβ© 
---------------------------------------
Epoch: 0 	Step: 1 	Accuracy: 0.0 	Loss: 2.500396251678467
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-4.7077e-02, -1.2101e-01, -2.1126e-01,  1.4875e-02],
	[-4.7077e-02, -1.2101e-01, -1.1339e-02, -5.2808e-18]])
---------------------------------------
Epoch: 0 	Step: 2 	Accuracy: 0.25 	Loss: 2.1083250045776367
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[ 1.0099e-01, -6.7789e-03,  1.0940e-01, -3.1570e-02],
	[ 1.0099e-01, -6.7789e-03, -5.4767e-03, -1.1648e-17]])
---------------------------------------
Epoch: 0 	Step: 3 	Accuracy: 0.25 	Loss: 2.5666348934173584
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-1.6059e-01, -1.4263e-01, -3.8268e-01,  6.0618e-02],
	[-1.6059e-01, -1.4263e-01, -1.1166e-02, -9.7793e-19]])
---------------------------------------
Epoch: 0 	Step: 4 	Accuracy: 0.0 	Loss: 2.981722593307495
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-4.0437e-01, -3.6849e-01, -6.9833e-01,  1.3052e-01],
	[-4.0437e-01, -3.6849e-01, -6.2979e-03,  1.3407e-17]])
---------------------------------------
Epoch: 0 	Step: 5 	Accuracy: 0.25 	Loss: 2.1014046669006348
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[ 1.9919e-03,  1.2177e-02, -2.0729e-02, -1.3597e-02],
	[ 1.9919e-03,  1.2177e-02, -8.4981e-03, -2.5948e-19]])
Current Circuit Layer 0:
 0: ββRY(0.0)βββRX(3.225)βββRX(3.592)ββRX(5.593)βββRX(5.953)βββββββββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)βββCβββββββββββRX(2.63)βββRY(4.176)βββCβββββββββββRX(2.43)ββRY(2.163)βββ€ β¨Zβ© 
 2: ββRY(0.0)βββ°XβββββββββββCββββββββββββββββββββββ°XβββββββββββCβββββββββββββββββββββ€ β¨Zβ© 
 3: ββRY(0.0)βββββββββββββββ°Xββββββββββββββββββββββββββββββββββ°Xβββββββββββββββββββββ€ β¨Zβ© 
Current Circuit Layer 1:
 0: ββRY(0.0)βββRY(1.81)βββRY(1.867)ββββββββββββββββββ€ β¨Zβ© 
 1: ββRY(0.0)βββRY(2.099)ββRY(1.627)ββββββββββββββββββ€ β¨Zβ© 
 2: ββRY(0.0)βββXββββββββββRZ(2.116)βββXββRZ(4.867)βββ€ β¨Zβ© 
 3: ββRY(0.0)βββ°CββββββββββRY(1.223)βββ°CββRY(3.964)βββ€ β¨Zβ© 
---------------------------------------
Epoch: 0 	Step: 6 	Accuracy: 0.25 	Loss: 2.009097099304199
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[ 5.2628e-02,  2.8042e-02,  1.7530e-01, -4.5965e-02],
	[ 5.2628e-02,  2.8042e-02, -2.1108e-03, -8.3491e-18]])
---------------------------------------
Epoch: 0 	Step: 7 	Accuracy: 0.0 	Loss: 2.7671358585357666
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-4.1022e-01, -2.2605e-01, -4.9493e-01,  1.0710e-01],
	[-4.1022e-01, -2.2605e-01,  1.3541e-02,  2.0550e-17]])
---------------------------------------
Epoch: 0 	Step: 8 	Accuracy: 0.0 	Loss: 2.595287799835205
Gradients Layer 0:
None
Gradients Layer 1:
tensor([[-2.7742e-01, -1.4687e-01, -4.3393e-01,  1.1318e-01],
	[-2.7742e-01, -1.4687e-01, -3.8294e-03,  1.2389e-17]])
---------------------------------------
As you can see, only the Quanvolutional Layer 1 receives gradients. Layer 0 does not get any gradients and hence is not updated by the optimizer.
Now my question is: why? (i.e. What am I missing? What am I doing wrong? Or am I facing a bug?)
Thanks in advance!
Denny
PS: we are using PyTorch 1.4.0 with PennyLane v0.12.0.