Let me give a shot at your question:
Probabilities of photon counting events
On the examples listed on the document it seems that you’re missing the normalization constant when calculating \bar A. From the paper you cited:
the rescaling constant c is chosen so that 0 <
c < 1/s_{max}, and s_{max} is the maximum singular value of
A.
Precisely, the rescaling constant is chosen to be 1/(s_{max}+10^{-8})
The following code calculates the prefactor 1/\sqrt{\det Q} accordingly for both examples on the attached file
import numpy as np
# Example 1
# A = np.array([[0,1,1],
# [1,0,1],
# [1,1,0]])
# Example 2
A = np.array([[0,0,1,0,1],
[0,0,0,1,1],
[1,0,0,1,1],
[0,1,1,0,0],
[1,1,1,0,0]])
# calculate max singular value and rescaling constant
_, s, _ = np.linalg.svd(A, full_matrices=True)
c = 1 / ( np.max(s) + 1e-8 )
print(f"Max singular value: {np.max(s)}")
print(f"Rescaling constant value: {c:.2E}\n")
# calculate Q and its determinant
Ab = c * np.block([
[A, np.zeros(A.shape)],
[np.zeros(A.shape), A]
])
X = np.block([
[np.zeros(A.shape), np.identity(A.shape[0])],
[np.identity(A.shape[0]), np.zeros(A.shape)]
])
I = np.identity(Ab.shape[0])
Q = np.linalg.inv(I - X @ Ab)
detQ = np.linalg.det(Q)
print(f"det(Q) = {detQ:.2E}")
print(f"1/sqrt(det(Q)) = {1/np.sqrt(detQ):.2E}")
For example 1
Max singular value: 2.0
Rescaling constant value: 5.00E-01
det(Q) = 1.78E+08
1/sqrt(det(Q)) = 7.50E-05
and example 2
Max singular value: 2.481194304092015
Rescaling constant value: 4.03E-01
det(Q) = 4.94E+08
1/sqrt(det(Q)) = 4.50E-05
As seen in the results the prefactors are below one. In particular for example 2, the probability of observing the sample [1,1,1,1,0]
is p=4.5\times 10^{-5} as the Hafnian equals one as well as \boldsymbol n!.
Similarity measures
There are many different notions of similarity and they depend on which property of the graph is exploited to compare one graph to another.
The idea of the paper is to use the properties of the samples from the GBS machine to assign to each graph a vector in \mathbb{R}^N. This vectors are called feature vectors because they encode the properties of each graph in a familiar space: one in which we can define a measure of “closeness” or “similarity”.
The idea is the following, in \mathbb{R}^N we say two vectors \boldsymbol f and \boldsymbol f' are similar if the dot product
\langle \boldsymbol f, \boldsymbol f'\rangle between them is zero, or close to each other (pointing to a similar direction) if close to zero. If the encoding proposed is capable of generating a vector \boldsymbol f in \mathbb{R}^N for each graph G then, by association, we can say that two graphs are similar if the vectors that represent them are close to each other. Since we already know that the dot product is a good measure of similarity, then we can use that measure to compare different graphs when represented as vectors:
\kappa(G, G') = \langle \boldsymbol f, \boldsymbol f'\rangle.
The above means that we can express the similarity of two graphs as the inner product of two vectors — thanks to the feature map.
However, the kernel defined above is linear and might not be able to learn non-linear decision boundaries on classification tasks. In that case one can define a different kernel, like the radial basis function kernel \kappa_{rbf} mentioned in the paper.
Hope this helps to solve your question!