Using AutoGrad/Automatic Diffrentiation Along a Direction

Hello al! I’m wondering if you would know how to use the AutoGrad library to do differentiation along a particular line. For example, in the “Optimizing Molecular Geometries” line, we could manipulate the Finite Difference method, so that instead of shifting each of the 9 x-coordinates by a slight amount for the finite difference, we could shift the 9 x’s so that the finite-diff Hamiltonian would be along a specific line, r.

So the original code was:

def finite_diff(f, x, delta=0.01):
    """Compute the central-difference finite difference of a function"""
    gradient = []
    for i in range(len(x)):
        shift = np.zeros_like(x)
        shift[i] += 0.5 * delta
        res = (f(x + shift) - f(x - shift)) * delta**-1 # dH/dx            
        gradient.append(res)

    return gradient

And the modified code was now:

def finite_diff_r(f, r, angle, delta=0.01):
    '''Compute the central-difference finite difference of a function along radius, r'''
    gradient = []
    shift = 0.5 * delta
    
    temp1 = build_xyz(r[0]+shift, angle[0])
    temp2 = build_xyz(r[0]-shift, angle[0])
    
    res = (f(temp1) - f(temp2)) * delta**-1 # dH/dx
    
    gradient.append(res)
    
    return gradient

where “build_xyz” just constructs the geometry along a particular line ‘r’, given an angle ‘theta’.

I’m wondering if something similar has been done using the AutoGrad method of automatic diffrentiation. Reason being, the ‘finite-difference’ calculation is not robust enough and causes errors for me, but using the AutoGrad library works just fine

Hey @ImranNasrullah,

Interesting :thinking:. Implementing this into PennyLane as a proper differentiation method is something that would take a good chunk of time. But, you can most certainly define the function like you have outside of PennyLane and use it as a differentiation method.

The other thing you can look at is trying to set some coordinates to have requires_grad=False — i.e. tell PennyLane not to differentiation certain coordinates, corresponding to a line/plane that you’re interested in (your r).

the ‘finite-difference’ calculation is not robust enough and causes errors for me, but using the AutoGrad library works just fine

Yep that sounds right! Finite-difference differentiation is nice because it’s hardware-compatible, simple to understand, and will work (i.e., no bugs) no matter what. But you’re right that it’s not the greatest for numerically accurate results.

Let me know if this helps!

Hello Isaac!
Thanks for the response! As of now, I am trying to do that with an auto-grad function, but I’m not having any success:

def energy(mol):
    @qml.qnode(dev, interface="autograd")
    def circuit(*args):
        # note that active_electrons=2, active_orbitals=2 in this example
        qml.BasisState([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0], wires=range(12))
        
        # Just apply one gate for simplicity:
        qml.DoubleExcitation(*args[0][0], wires=[0, 1, 10, 11])
        
        H = qml.qchem.molecular_hamiltonian(mol.symbols, build_geometry(*args[1][0], *args[1][1]),
        active_electrons=8, active_orbitals=6, args=[build_geometry(*args[1][0], *args[1][1])])[0]

        return qml.expval(H)
    return circuit

That is the function I’m trying to use automatic dhf for, where build_xyz is similar to what I described in my question. I’ve tried to put the Hamiltonian in “terms” of the second argument of the array args (args[1]), which is an array of the parameters I’m trying to optimize for, r and theta.

However, when I have this setup, I get that my energy(mol) function is independent of these r and theta.

Would you have any idea how to proceed in light of:

But, you can most certainly define the function like you have outside of PennyLane and use it as a differentiation method.

How can I transform the above code so that what autograd “sees” is optimizing r and theta, but really what is happening is that all the euclidean coordinates are being optimized, if that makes sense, precisely what I did for the finite_diff() method, just applied to now autograd/automatic dhf. I’m not even sure how to proceed

Thanks so much for your help!

Hey @ImranNasrullah,

Can you share a complete and working bit of code here that I can copy-paste? It’ll help me figure out how to get this to work :slight_smile:.

Hello @isaacdevlugt ,
Thanks for the response!

I was able to get a non-zero output, and here is the self-contained code:

# Minimally working and self-contained example:
# imports and functions:
import pennylane as qml
from pennylane import qchem
from pennylane import numpy as np
from autograd import grad
import time
import math
bohr_angs = 0.529177210903  

#################### Functions: ####################
def unit_vector(vector):
    """ Returns the unit vector of the vector.  """
    return vector / np.linalg.norm(vector)
def angle_between(v1, v2):
    """ Returns the angle in radians between vectors 'v1' and 'v2'::
    """
    v1_u = unit_vector(v1)
    v2_u = unit_vector(v2)
    return (np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0)) * 180/math.pi) #convert to degrees

def build_geometry(r, angle): # prepares water's xyz coordinates given a bond length and angle  
    r = r/bohr_angs # converts from units of angstroms back to units of Bohrs
    
    x = np.array([[0.0, 0.0, 0.0],
                  [0.0, 0.0, 0.0],
                  [0.0, 0.0, 0.0]], requires_grad=True)
    
    # O1's nuclear coordinates, is positioned at x,y,z: (0,0,0)
    # H1's nuclear coordinates, is positioned at x,y,z: (r,0,0)
    # x:
    zeros = np.zeros((x.shape)) # Doing it this way works with 'autograd':
    zeros[1][0] = 1
    x = x + (zeros * r)
    
    # H2's nuclear coordinates, is position at x,y,z: (r*cos(angle), r*sin(angle), 0)
    # x:
    zeros = np.zeros((x.shape)) # Doing it this way works with 'autograd':
    zeros[2][0] = 1
    x = x + (zeros * r * np.cos(angle * np.pi/180))
    
    # y:
    zeros = np.zeros((x.shape)) # Doing it this way works with 'autograd':
    zeros[2][1] = 1
    x = x + (zeros * r * np.sin(angle * np.pi/180))
        
    return x 

dev = qml.device("default.qubit")
def energy(mol):
    @qml.qnode(dev, interface="autograd")
    def circuit(*args):
        # re-define geometry according via our parametrizations, r and theta:
        geometry = build_geometry(args[1][0][0], args[1][1][0]) # re-initialize geometry:
        
        # note that active_electrons=8, active_orbitals=6 in this example
        qml.BasisState([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0], wires=range(12))
        
        # Just apply one gate for simplicity:
        qml.DoubleExcitation(*args[0][0], wires=[0, 1, 10, 11])
        
        H = qml.qchem.molecular_hamiltonian(mol.symbols, mol.coordinates,
        active_electrons=8, active_orbitals=6, args=[geometry])[0]

        return qml.expval(H)
    return circuit
############################################################

##################### Initializations: #####################
symbols = ["O", "H", "H"]

# construct the z_matrix vector (which will store 'r' and 'theta') according to the geometry below:
geometry = np.array([[0.10000000,  0.10000000,  0.28377432],
                     [0.10000000,  1.45278171, -1.00662237],
                     [0.10000000, -1.45278171, -1.00662237]], requires_grad=True)
# Initialize starting r and starting theta
OH1 = geometry[0] - geometry[1]
OH2 = geometry[0] - geometry[2]
r = np.linalg.norm(geometry[0] - geometry[1]) * bohr_angs
theta = angle_between(OH1, OH2)
# Finally, initialize z_matrix, which we will use to construct the "args" vector for the first time:
z_matrix = [np.array([r], requires_grad=True)] + [np.array([theta], requires_grad=True)]

##################### Begin optimization: #######################################
circuit_param = [np.array([0.0], requires_grad=True)] * 1
for n in range(10): # Just run 1 iteration
    args = [circuit_param, z_matrix]

    geometry = build_geometry(args[1][0][0], args[1][1][0]) # re-initialize geometry:
    mol = qml.qchem.Molecule(symbols, geometry)

    # Update circuit parameters:
    g_param = grad(energy(mol), argnum = 0)(*args)
    circuit_param = circuit_param - 0.25 * g_param[0]

    # Update z_matrix:
    forces = -np.array(grad(energy(mol), argnum = 1)(*args))
    z_matrix = z_matrix + (0.5 * forces)
    
    print(f'n: {n}, E: {energy(mol)(*args):.8f}')
    print(f'r gradient: {abs(forces[0]).max():.8f}, Bond-length: {z_matrix[0][0]:.5f}')
    print(f'theta gradient: {abs(forces[1]).max():.8f}, Bond-angle: {z_matrix[1][0]:.5f}')
    print("")

I was able to get the optimization in terms of r and theta. However, I think something is slightly off with the code as the energy doesn’t consistently decrease across the steps (sometimes it goes up slightly before heading back down)

Thanks! One thing to note: you don’t need to import autograd or specify it as an interface anywhere :slight_smile:. Instead of autograd.grad, you can use qml.grad (qml.grad — PennyLane 0.35.1 documentation).

Indeed, I ran your code (with the changes above :point_up:) and I got:

n: 0, E: -74.96536778
r gradient: 0.00923322, Bond-length: 0.99393
theta gradient: 0.00031603, Bond-angle: 96.62487

n: 1, E: -74.96616750
r gradient: 0.01046767, Bond-length: 0.98870
theta gradient: 0.00030533, Bond-angle: 96.62502

n: 2, E: -74.96649075
r gradient: 0.01348520, Bond-length: 0.99544
theta gradient: 0.00032018, Bond-angle: 96.62518

n: 3, E: -74.96661630
r gradient: 0.01585724, Bond-length: 0.98751
theta gradient: 0.00030284, Bond-angle: 96.62533

n: 4, E: -74.96665859
r gradient: 0.01973438, Bond-length: 0.99738
theta gradient: 0.00032420, Bond-angle: 96.62550

n: 5, E: -74.96665934
r gradient: 0.02363922, Bond-length: 0.98556
theta gradient: 0.00029833, Bond-angle: 96.62565

n: 6, E: -74.96664084
...
n: 9, E: -74.96642749
r gradient: 0.05142733, Bond-length: 0.97833
theta gradient: 0.00028159, Bond-angle: 96.62627

I think this is due to your choice of how to carry out the optimization, your hyperparameters, etc.

Thank you! Also another thing, I tried out the algorithm with this initialization:

geometry = np.array([[0.028,  0.054,  0.0],
                     [0.986,  1.610, 0.0],
                     [1.855, 0.002, 0.0]], requires_grad=True)

which starts at a bond angle of ~60 degrees, the algorithm starts to diverge rapidly:

n: 0, E: -74.87225771
r gradient: 0.34180591, Bond-length: 1.30875
theta gradient: 0.00505364, Bond-angle: 60.01554

n: 1, E: -74.80189373
r gradient: 0.48782516, Bond-length: 0.82093
theta gradient: 0.00127304, Bond-angle: 60.01681

n: 2, E: -74.74439740
r gradient: 1.58715977, Bond-length: 2.40809
theta gradient: 0.00717866, Bond-angle: 60.02399

n: 3, E: -72.97484216
r gradient: 0.28989702, Bond-length: 2.11819
theta gradient: 0.00346159, Bond-angle: 60.02745

n: 4, E: -73.07495014
r gradient: 0.41695014, Bond-length: 1.70124
theta gradient: 0.00433193, Bond-angle: 60.03178

n: 5, E: -73.77912973
r gradient: 4.50314834, Bond-length: -2.80191
theta gradient: 0.01691892, Bond-angle: 60.04870

n: 6, E: -72.87921796
r gradient: 0.20521733, Bond-length: -2.59669
theta gradient: 0.00287212, Bond-angle: 60.05157

n: 7, E: -72.92498358
r gradient: 0.24257733, Bond-length: -2.35411
theta gradient: 0.00313382, Bond-angle: 60.05471

n: 8, E: -72.99106160
r gradient: 0.30711616, Bond-length: -2.04700
theta gradient: 0.00357515, Bond-angle: 60.05828

n: 9, E: -73.16458795
r gradient: 0.75791374, Bond-length: -1.28908
theta gradient: 0.00494037, Bond-angle: 60.06322

n: 10, E: -74.81164989
r gradient: 0.47515373, Bond-length: -0.81393
theta gradient: 0.00148895, Bond-angle: 60.06471

n: 11, E: -74.73339209
r gradient: 1.67577546, Bond-length: -2.48970
theta gradient: 0.00728323, Bond-angle: 60.07199

n: 12, E: -72.95227669
r gradient: 0.26743503, Bond-length: -2.22227
theta gradient: 0.00330208, Bond-angle: 60.07530

n: 13, E: -73.03488103
r gradient: 0.35963092, Bond-length: -1.86264
theta gradient: 0.00393178, Bond-angle: 60.07923

n: 14, E: -73.36361106
r gradient: 1.54620987, Bond-length: -0.31643
theta gradient: 0.00769892, Bond-angle: 60.08693

n: 15, E: -67.08655214
r gradient: 58.57749010, Bond-length: -58.89392
theta gradient: 0.02540338, Bond-angle: 60.11233

Didn’t expect this, as I thought using autograd was super accurate. Would you have any explanations for this and any possible fixes?
Thanks

This observation isn’t due to Autograd :slight_smile:. You can think of Autograd as a black box that differentiates things. The divergence you’re seeing is due to your geometry initialization choice. How you initialize parameters in QML will have a drastic effect on its ability to train.

To clarify, when you mean different geometry initializations, do you mean given a water molecule, trying out different bond lengths/angles, different rotations and translations of the water molecule?

Any of the parameters you need to initially define to get your optimization going, try playing around with those :slight_smile:

Hello Isaac, another quick question. Would it be possible to use the finite-diff/parameter shift tutorial as seen in this video for the finite-diff functions as seen in the tutorial: Optimization of molecular geometries tutorial?

My concern is that in this the video tutorial, it seems the parameter-shift and finite-diff tutorial are used to diffentiate the circuit, and then evaluate it AT the observables. However, in the Optimization of Molecular Geometries tutorial, it seems the observable itself is diffrentiated rather then the circuit, so in reverse.

Hey @ImranNasrullah,

Would it be possible to use the finite-diff/parameter shift tutorial as seen in this video for the finite-diff functions as seen in the tutorial: Optimization of molecular geometries tutorial?

The parameter-shift rule for something like the DoubleExcitation operation is well defined, so in principle you can do this, yes.

it seems the parameter-shift and finite-diff tutorial are used to diffentiate the circuit, and then evaluate it AT the observables. However, in the Optimization of Molecular Geometries tutorial, it seems the observable itself is diffrentiated rather then the circuit, so in reverse.

A circuit alone can’t be differentiated. Circuits have to output a measurement, and then we can talk about differentiating. There isn’t any differentiating “at the observables” as you are suggesting :slight_smile: