Where to find the xyz file library for molecular hamiltonians?

I am using the Pennylane notebook on the Xanadu website.

I am having trouble with the read_structure function used to import the molecular structure into the Hamiltonian for VQE. It looks like I have to physically have the .xyz files stored somewhere on my device, but I’m not sure where people go to find such files and I don’t want to download something sketchy from the web.

Where does Xanadu recommend downloading a library of basic molecular xyz files from to use in read_structure?

Here is my code:

import pennylane as qml
import numpy as np
from pennylane import qchem

symbols, coordinates = qchem.read_structure("h2o.xyz")

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_51/752628394.py in <cell line: 1>()
----> 1 symbols, coordinates = qchem.read_structure('h2o.xyz')

/opt/conda/envs/pennylane/lib/python3.9/site-packages/pennylane/qchem/structure.py in read_structure(filepath, outpath)
     55     file_out = os.path.join(outpath, "structure.xyz")
     56 
---> 57     copyfile(file_in, file_out)
     58 
     59     symbols = []

/opt/conda/envs/pennylane/lib/python3.9/shutil.py in copyfile(src, dst, follow_symlinks)
    262         os.symlink(os.readlink(src), dst)
    263     else:
--> 264         with open(src, 'rb') as fsrc:
    265             try:
    266                 with open(dst, 'wb') as fdst:

FileNotFoundError: [Errno 2] No such file or directory: 'h2o.xyz'

And, finally, make sure to include the versions of your packages. Specifically, show us the output of qml.about().

qml.about()
Name: PennyLane
Version: 0.28.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: GitHub - PennyLaneAI/pennylane: PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.
Author:
Author-email:
License: Apache License 2.0
Location: /opt/conda/envs/pennylane/lib/python3.9/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, retworkx, scipy, semantic-version, toml
Required-by: PennyLane-Cirq, PennyLane-Lightning, PennyLane-qiskit, pennylane-qulacs, PennyLane-SF

Platform info: Linux-5.4.209-116.367.amzn2.x86_64-x86_64-with-glibc2.31
Python version: 3.9.15
Numpy version: 1.23.5
Scipy version: 1.10.0
Installed devices:

  • default.gaussian (PennyLane-0.28.0)
  • default.mixed (PennyLane-0.28.0)
  • default.qubit (PennyLane-0.28.0)
  • default.qubit.autograd (PennyLane-0.28.0)
  • default.qubit.jax (PennyLane-0.28.0)
  • default.qubit.tf (PennyLane-0.28.0)
  • default.qubit.torch (PennyLane-0.28.0)
  • default.qutrit (PennyLane-0.28.0)
  • null.qubit (PennyLane-0.28.0)
  • cirq.mixedsimulator (PennyLane-Cirq-0.28.0)
  • cirq.pasqal (PennyLane-Cirq-0.28.0)
  • cirq.qsim (PennyLane-Cirq-0.28.0)
  • cirq.qsimh (PennyLane-Cirq-0.28.0)
  • cirq.simulator (PennyLane-Cirq-0.28.0)
  • lightning.qubit (PennyLane-Lightning-0.28.2)
  • strawberryfields.fock (PennyLane-SF-0.20.1)
  • strawberryfields.gaussian (PennyLane-SF-0.20.1)
  • strawberryfields.gbs (PennyLane-SF-0.20.1)
  • strawberryfields.remote (PennyLane-SF-0.20.1)
  • strawberryfields.tf (PennyLane-SF-0.20.1)
  • qiskit.aer (PennyLane-qiskit-0.28.0)
  • qiskit.basicaer (PennyLane-qiskit-0.28.0)
  • qiskit.ibmq (PennyLane-qiskit-0.28.0)
  • qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.28.0)
  • qiskit.ibmq.sampler (PennyLane-qiskit-0.28.0)
  • qulacs.simulator (pennylane-qulacs-0.28.0)

Hi @Joan,

It’s great to see you again here!

From the conversation on Slack I got the impression that you don’t specifically need an .xyz file, what you need are the coordinates of a molecule. Please correct me if I’m wrong.

You have tapped into what I’d say is one of the best well-kept secrets of the internet.

Let’s say we want to get the eigenvalues for the H2 molecule. You would do something like the following:

# Import your favourite libraries
import pennylane as qml
from pennylane import numpy as np

# Define the molecule that you want to use
symbols = ["H", "H"]

# Choose the position of the nuclei of the molecule
geometry = np.array([[0.0, 0.0, 0.0], [0.0, 0.0, 0.735*1.8897259885789]])

# Calculate the Hamiltonian and the number of qubits needed
H, qubits = qml.qchem.molecular_hamiltonian(symbols, geometry)

# Calculate the eigenvalues
qml.eigvals(H)

However the key is that nobody tells you where the geometry comes from. Here are my insights on this:

You can find the optimal geometry of several molecules in PennyLane’s quantum chemistry datasets. For H2 for instance, the optimal bond length is 0.742 Angstrom.

The experimental geometry data can be found in the Computational Chemistry Comparison and Benchmark DataBase released by the National Institute of Standards and Technology.

Note that this experimental geometry is not necessarily the optimal geometry that you will need to use for qchem.molecular_hamiltonian. This is because the optimal geometry depends on the specific basis set that you choose and the specific method that you choose. The basis set is the set of orbitals that will be combined together. The specific combination of orbitals for a given basis set is what we will try to optimize to find the ground state energy.

The qchem.molecular_hamiltonian function in PennyLane has a ‘basis’ keyword argument which has ‘sto-3g’ as the default value. You can keep this value for now. The ‘method’ keyword argument specifies how we want to calculate the molecular hamiltonian. The default is ‘dhf’ which means Differentiable Hartree Fock. This is a PennyLane-native method that is differentiable. The other option is using the ‘pyscf’ method, which uses an OpenFermion backend but which is not differentiable. You can keep ‘dhf’ for now.

Now, how do you find the optimal geometry that you will need to use for qchem.molecular_hamiltonian? You can look for the calculated geometries in the Computational Chemistry Comparison and Benchmark DataBase. Write H2 as your chemical formula, look for the method you’re using in the rows, and then look for the column corresponding to the basis set you’re using. The CISD method will show results similar to ‘dhf’, so we will look at this row. On this row you can look for the ‘sto-3g’ basis set (1st column), which is the one we’re using. You will notice that the calculated geometry for the CISD method and the sto-3g basis set is 0.735 Angstrom.

The last step we need to do is converting this bondlength in Angstrom to atomic units, which are the units that qchem.molecular_hamiltonian takes. The conversion is: 1 Angstrom = 1.8897259885789 a.u. Since the H2 molecule is comprised of only two nuclei, we can decide that one of these nuclei will lie in the origin (coordinates [0.0, 0.0, 0.0]), and the other one will lie on coordinates [0.0, 0.0, 0.735*1.8897259885789].

Finally we define the geometry as a numpy array of the two points in space.
geometry = np.array([[0.0, 0.0, 0.0], [0.0, 0.0, 0.735*1.8897259885789]])

Now that we know how to find the optimal geometry, we can use qchem.molecular_hamiltonian to calculate the Hamiltonian and the eigenvalues for such Hamiltonian. The details of how the eigenvalues are classically calculated can be found in the SciPy documentation.

Let me know if this helps!