I understand that it is the second smallest eigenvalue of the Hamiltonian, which the dataset library provided. However, if we are to write a circuit to find that for H_2, it would be something like

and optimizing for \theta would only gives the ground state energy. How do I turn this circuit into finding the 1st excitation energy? I expect it should give this result.

Do VQE on the Hamiltonian H to get the ground state \vert \psi_0 \rangle with energy E_0 (as you normally would)

Create a new “effective” Hamiltonian that has the ground state “projected out” of it: H_1 = H + \beta \vert \psi_0 \rangle \langle \psi_0 \vert, where \beta > E_1 - E_0 (i.e., \beta needs to be larger than the gap between the ground state energy and the first excited state energy).

Agreed. Then <H_1> equals to the equation (2) in the paper you cited.

I just have a question. The term \beta \left| {\left\langle {{\Psi}\left( {{{\mathbf{\theta }}}} \right)\left| {{\Psi}_0} \right.} \right\rangle } \right|^2 minimizes when \Psi(\theta) is orthogonal to \Psi_0. It is like we are confident that the state (hence the HF representation) of the 1st excitation is orthogonal? Is this based on some quantum mechanics rule, or it is an artificial optimization scheme?

It is like we are confident that the state (hence the HF representation) of the 1st excitation is orthogonal

Yep! This is a guarantee. Eigenstates form an orthonormal basis, so each eigenstate is orthogonal to the others. This isn’t any fancy quantum mechanics thing, just linear algebra