Issue with GBS for Molecular Vibronic Spectra: Formic Acid

Hello!

To test Strawberry Fields, I’ve decided to compute the vibronic spectra of formic acid using GBS based on the electronic structure obtained from Firefly (or GAMESS). However, I would get two issues:

  • Compared with the data provided in the strawberryfields.apps.data package, the number of modes are not the same;
  • The vibronic spectra are very different.

More explicitly, on my side, Firefly gives a total of 15 modes, so 9 vibronic modes, compared with 7 for data in strawberryfields.apps.data. As for the spectrum, I get the following:

As with my other post, I wish I could attach the data but I can’t since I’m a new user. Let me know how I should share the Firefly output files.

Thank you!

Andre

Hello @andre,

Thank you for your question!

Regarding your first point:

Compared with the data provided in the strawberryfields.apps.data package, the number of modes are not the same

The data for formic acid provided in strawberryfields.apps.data was taken directly from this paper by Joonsuk Huh et al. Please, see Sec. 2 of the supporting information.

You are right, formic acid has 9 vibrational modes. However, due to the symmetry of the molecule, the Duschinsky matrix for this vibronic transition is block diagonal with two blocks, one of dimension 7x7 (totally symmetric block) and the other with dimension 2x2 (non-totally symmetric block). If the overall shape of the spectrum is dominated by the totally symmetric block you can neglect the other two modes.

Second point:

  • The vibronic spectra are very different.

There may be several aspects related to this.

1- The vibronic spectrum you want to simulate corresponds to the photoelectron emission of formic acid. So, you need to compute the vibrational modes of the ground states of the neutral and positively charged molecule in order to create the input data for the GBS simulations.

2- Which type of electronic structure calculations are you performing to optimize the molecular geometries and compute the vibrational frequencies? Could you please provide us with the frequencies you have obtained? If they are different with respect to the ones reported in the above mentioned paper, the spectrum will be different.

3- In principle, you don’t need to account for the symmetry to obtain these values. I would expect that using the B3LYP functional with a Pople-type basis set e.g., 6-311++G** would be enough to reproduce the frequencies in the data set. GAMESS and Firefly should give you similar results.

4- Assuming correctness of points 1-3, how many samples are you taking to reconstruct the vibronic spectrum?

Hope this helps.

Thank you!

Hi @Alain_Delgado_Gran,

Thank you very much for the detailed answer.

You are right, formic acid has 9 vibrational modes. However, due to the symmetry of the molecule, the Duschinsky matrix for this vibronic transition is block diagonal with two blocks, one of dimension 7x7 (totally symmetric block) and the other with dimension 2x2 (non-totally symmetric block). If the overall shape of the spectrum is dominated by the totally symmetric block you can neglect the other two modes.

Makes sense. For the data I use, however, it doesn’t look like there is a 7x7 totally symmetric block (data is included below). Would a non-symmetric Duschinsky matrix cause issues in the back-end?

1- The vibronic spectrum you want to simulate corresponds to the photoelectron emission of formic acid. So, you need to compute the vibrational modes of the ground states of the neutral and positively charged molecule in order to create the input data for the GBS simulations.

That is what we do. However, we are relaxing the geometry of the excited state to avoid imaginary frequencies. Is this something you do too?

As for points 2-3, we do a Hartree-Fock-type calculation with the basis set being TZV or similar to 6-311+G**. However, we don’t expect the spectrum to be radically different; peaks may shift but the overall shape should be similar.

4- Assuming correctness of points 1-3, how many samples are you taking to reconstruct the vibronic spectrum?

In the spectrum shown in the last post, there are exactly 4675 samples. Referring to the vibronic spectra tutorial, it seems that with 10 samples you can already find the main peaks of the spectrum, pointing towards the fact that 4675 samples is more than enough to recover the overall shape.

Moreover, using a conventional method, we were able to use the same data and compute a spectrum that is approximately correct, compared to experimental data.

This may indicate that I’m simply not using Strawberry Fields correctly. Thus, I’m including the code used to run the simulation along with the data files.

I’d very much appreciate if you could help me fix this issue.

Thank you!

test.py (952 Bytes) formicAcidHF_Hess.txt (122.4 KB) formicAcidCationHF_Hess.txt (125.0 KB)

Hi @andre,

Thank you for your message and the attached files.

Just to be clear. In order to compute the vibrational frequencies of the initial and final states you have to optimize the geometries of the neutral and the cation molecules. I noted that your electronic structure calculations were performed at the level of Hartree-Fock. That’s why your frequencies are blueshifted w.r.t. the values used in the Vibronic spectra tutorial. If you want to refine these numbers you can run density functional theory (DFT) simulations using the B3LYP functional. This way you would include electronic correlation effects and refine your frequencies. I assume that in Firefly you can do this by adding DFTTYP=b3lyp to the $CONTRL group in your input file.

Having said that, I would also expect that this refinement should not change drastically the relative intensities of the vibronic spectrum.

We will take a look at the python script and get back to you soon!

Thank you.

Hi @andre. Please also note that the normal modes used to compute the Duschinsky with the duschinsky function need to be mass-weighted. I am not sure if Firefly prints out mass-weighted data but you can use:

Li = Li[6:].T * m[:, np.newaxis]**0.5
Lf = Lf[6:].T * m[:, np.newaxis]**0.5

and then call the duschinsky function. For more information please check the documentation here.

I also highly recommend using DFT or a post-HF method with a basis set with proper size to get the input vibrational data. Please let us know if you face any issues.

Hi @sjahangiri,

Thanks for the tip for mass-weighting the normal modes, if made explicitly in the documentation I’m sure it would help others. However, I still believe the resulting spectrum (shown below, with 1000 samples) is far from being in agreement with the experimental results (and with the results obtained using the conventional method as shown in my second post):

If you believe the discrepancies could be solved by simply using DFT, please let me know. Thanks again!

Hi @andre. Please see a detailed explanation of the input format, including the normal modes format, for the duschinsky function here. I recommend using DFT, for example PBE0 or B3LYP functionals, to compute the input data rather than HF. That, very likely, will give you the correct spectrum. Please let us know about the results. Thanks.

Hi @sjahangiri,

Concerning the lack of proper documentation, I meant that it would be beneficial to have the process of using GAMESS data documented (i.e., skip over the first 5-6 modes and mass-weight the normal modes).

As for the vibronic spectrum calculation of formic acid, we’ve tried with DFT (B3LYP) and although the spectrum does look better (spectrum and data attached below), I am not entirely satisfied and would like some pointers on how to obtain better results.

Also, if possible, could you please explain the procedure of removing the two independent modes to perform the computation with 7 modes (as described above by @Alain_Delgado_Gran)?

Thank you very much!

formicAcidDFTHESS.TXT (190.4 KB) formicAcidCationDFTHESS.TXT (194.0 KB)

Hi @andre.

Thanks for the suggestion on documenting the process of using GAMESS data, we will consider that in the next refactor.

The fact that the DFT spectrum has a better agreement with the experimental one, compared to the one you obtained with HF, suggests that the shape of the spectrum is very sensitive to the vibrational input data obtained from electronic structure calculations. So, my recommendation is to further improve the electronic structure calculations by using a functional that gives more accurate vibrational frequencies, using a better basis set or using post-HF ab initio methods.

Regarding your last question, if you look at the vibrational modes of formic acid, you can see two groups of vibrations: one group of in-plane vibrations that contains 7 modes and a group of out-of-plane vibrations with two modes. In this paper, the authors have observed that the total vibronic spectrum is dominated by the modes of the first group and the second one can be safely neglected because of the lack of structural displacement. The vibrational frequencies for these two out-of-plane modes are 687.27, 1056.82 cm^-1, which are obtained from a DFT calculation using PBE0 functional. For more information, you can have a look at page 8 of the reference paper.

Thank you very much @sjahangiri and @Alain_Delgado_Gran, I’ve marked the thread as solved.

Hello!
I find it works well when I use GAMESS and qchem.read_gamess. However, when I use the result of ORCA there are some trouble. Do you know the difference between the output files of the two softwares.

Hi @GSQ, welcome to the forum!

Could you please share your code and the error or problem you’re getting?
You can paste it here.

Hi @GSQ,

In addition to what @CatalinaAlbornoz mentioned, could you please also send us the raw vibrational frequencies you obtained with GAMESS and ORCA? I will take a close look as soon as I have the code and the data.

I planned to attach the codes and data at first, but I got Sorry, new users can not upload attachments. I try to upload them somewhere else .

https://rec.ustc.edu.cn/share/fb2ac8a0-ba39-11ec-adc5-e586ded18233
password:wysz

The raw vibrational frequencies of CH2O2 ground state from ORCA is
$vibrational_frequencies
15
0 0.000000
1 0.000000
2 0.000000
3 0.000000
4 0.000000
5 0.000000
6 618.095481
7 663.230409
8 1033.405635
9 1099.153775
10 1275.317167
11 1376.923912
12 1780.978680
13 2978.688601
14 3646.386262

I think it’s good but the r I got is [ 0.74963814 0.48295573 0.29191516 0.19995845 -0.01301668 -0.38882838 -0.89582196 -1.68706532 -1.81765809]

Thanks @GSQ. The problem is very likely due to a format mismatch between the results extracted from GAMESS and ORCA. Could you please compare the data returned by read_gamess for atomic coordinates, atomic masses, normal mode frequencies and normal modes with those you extracted from ORCA? You might copy the data here, so we can have a look as well. Thanks!

The normal modes are just a part

ORCA

atomic coordinates
array([ 1.01247043, 0.93593077, 1.64887887, -0.76188184, -0.15625193,
0.1967257 , -0.45044505, -0.87018322, -1.92474336, 2.55700023,
1.03050053, 0.66971448, -2.53818898, -0.29233976, 1.25398599])
atomic masses
array([15.999, 15.999, 15.999, 12.011, 12.011, 12.011, 15.999, 15.999,
15.999, 1.008, 1.008, 1.008, 1.008, 1.008, 1.008])
normal mode frequencies
array([ 618.095481, 663.230409, 1033.405635, 1099.153775, 1275.317167,
1376.923912, 1780.97868 , 2978.688601, 3646.386262])
normal modes
array([[ 1.64270821e-02, 2.56421713e-02, 9.88458379e-03,
-2.01771416e-01, 8.14087573e-02, 2.67346591e-02,
-3.11053966e-03, -6.14263007e-04, 5.21353442e-02],
[ 8.26718324e-02, -8.43765333e-02, -3.22910447e-02,
-1.40453604e-01, 6.32196974e-03, -5.33726192e-03,
1.98767488e-02, 3.09322513e-04, 3.84049704e-03],
[ 2.02383672e-01, 3.26529198e-02, 1.24072657e-02,
-2.07682377e-01, -4.81094308e-02, -3.52355761e-02,
5.45858998e-02, 1.29824393e-03, -3.13564185e-02],
[ 1.93972892e-01, -2.31462067e-02, -5.88059996e-02,
2.30311237e-01, -2.06303454e-01, 2.63511527e-02,
6.42201854e-02, 7.34463287e-02, 3.96697685e-03],
[ 1.01125310e-02, 7.77109970e-02, 1.93460178e-01,
9.34585910e-02, -7.65157464e-02, 1.06005903e-02,
-2.06665848e-01, 4.66282722e-03, 7.46730494e-04],

GAMESS

atomic coordinates
array([ 0.53578, 0.49527, 0.87255, -0.40317, -0.08268, 0.1041 ,
-0.23836, -0.46048, -1.01853, 1.35311, 0.54532, 0.3544 ,
-1.34815, -0.1547 , 0.66358])
atomic masses
array([15.99491, 15.99491, 15.99491, 12. , 12. , 12. ,
15.99491, 15.99491, 15.99491, 1.00782, 1.00782, 1.00782,
1.00782, 1.00782, 1.00782])
normal mode frequencies
array([ 609.64324, 691.20728, 1026.82388, 1130.20324, 1292.5794 ,
1375.11708, 1822.88544, 3000.72956, 3594.64996])
normal modes
array([[ 2.11995871e-02, 9.06417789e-02, -2.98958036e-02,
4.02115943e-01, -2.99801941e-01, -8.51944855e-02,
-1.68709958e-02, 3.40449834e-03, -2.00595205e-01],
[ 1.68721876e-01, -3.03408208e-01, 9.86100711e-02,
2.98048780e-01, -5.78408775e-02, 3.09464764e-02,
-3.88442199e-02, -6.73892783e-04, -1.34855744e-02],
[ 4.29706713e-01, 1.13118403e-01, -3.76691268e-02,
4.63605320e-01, 8.73708593e-02, 1.49055445e-01,
-8.86674930e-02, -4.49164538e-03, 1.23927963e-01],
[ 3.71839993e-01, -7.69091873e-02, 1.60193084e-01,
-3.45513444e-01, 5.93445917e-01, -1.00158055e-01,
-4.88233946e-02, -2.46219094e-01, -1.30349294e-02],
[ 2.54594148e-02, 2.49505002e-01, -5.29221681e-01,
-1.37340994e-01, 2.13133459e-01, -3.82807824e-02,
2.61007170e-01, -1.52571852e-02, -1.94162896e-03],

Hi @GSQ. The ORCA atomic coordinates seem to be in Bohr while the GAMESS ones seem to be in Å. You can convert the ORCA values to Å with the conversion factor given here.

Please let me know if this fixes the issue or if you have any problems with the conversion.

It seems not work.
m=np.repeat(Ai[:,0],3)
ri=(Ai[:,1:4]).flatten()*0.529
Af=ReadAtoms(ES)
rf=(Af[:,1:4]).flatten()*0.529
Li=ReadNM(GS)
Li=Li[:,6:]
Lf=ReadNM(ES)
Lf=Lf[:,6:]
wi=ReadFre(GS)
wi=wi[6:]
wf=ReadFre(ES)
wf=wf[6:]
The r is array([ 0.74963814, 0.48295573, 0.29191516, 0.19995845, -0.01301668, -0.38882838, -0.89582196, -1.68706532, -1.81765809])

I notice that

NORMAL MODES
These modes are the Cartesian displacements weighted by the diagonal matrix M(i,i)=1/sqrt(m[i]) where m[i] is the mass of the displaced atom. Thus, these vectors are normalized but not orthogonal

Does it matter?

Hi @GSQ. When I multiplies the atomic coordinates you provided with the conversion factor I get the following array which is different than the r you copied in the last comment.

r = np.array([ 1.01247043, 0.93593077, 1.64887887, -0.76188184, -0.15625193,
0.1967257 , -0.45044505, -0.87018322, -1.92474336, 2.55700023,
1.03050053, 0.66971448, -2.53818898, -0.29233976, 1.25398599])

Could you please double check to make sure that the r values you are using is correct?

In terms of the normal modes, they need to be mass-weighted as explained here. Could you please also double check to see if you are using the mass-weighted normal modes?

There should be only 9 values in r. I can’t get the r values like yours. I think the normal modes are mass-weighted in the ORCA output file as I have mentioned.

Do you use the data from