Molecular Dynamics

The scientific question

The scientific question that we address in this tutorial is the following: Does the 3D-"structure" of the amino acid Lysine in aqueous solution depend on the protonation state of the amino group of its side chain? Before you start to work technically on the answer to the question, you may think a bit more about the question and its implications.

Question 48: 1) What is meant by the 3D-"structure" of Lys? Is this notion reasonable here?
2. What answer to our scientific question would you expect?
3. Could you describe the answer in terms of the two distances dNN (atom N to atom Nζ ), and dCN (atom C to atom Nζ )? See the figure below.
4. Which factors could influence the answer? How about T, ion concentration, force field, simulation length, ... ?
5. Do you expect the protonation to be also dynamic?

Answer

Figure 45: Definition of two distances that can be used to characterize the structure of Lys.

Gromacs et al.

In this tutorial we will work with the molecular dynamics (MD) program suite gromacs. Gromacs is not a monolithic program, but a large collec- tion of programs for all steps of a computational experiment with MD, including preparation, running MD simulations, and analysis. We use it since it is a fast, versatile and free system of open source tools. It is well-documented and used by a large and lively community. If you are interested in using gromacs, you can access everything (program, docu- mentation) at http://www.gromacs.org. If you are using Linux, there is a good chance that gromacs is part of your Linux distribution and can be installed with a key-click. In this practical, gromacs should have been installed already. In particular, we assume that we are working with gro- macs 4.5 (other versions may require adaptations of the scripts). In this course we will also use YASARA for MD purposes, but not today because YASARA is already too automatic for educational purposes. You will notice that we will not use a graphical interface but control the experiment by shell scripts, i.e. series of commands that can be studied and changed transparently. Additionally, we can use tools such as xmgrace or R for plotting, and any molecular viewer that can read and render PDB files and measure atomic distances. Here we will use vmd as an example. By now you should have already generated MD trajectories of your system (be it Lys with protonated or de-protonated side chain) with gro- macs using one of the shell script runAll.sh in one of the directories protonated or deprotonated. There should be several output files from that MD run in the chosen directory. In the following we assume that you have opened a window shell and have changed to that directory. If that is the case and you type ls you should obtain the list of files, e.g. lys.pdb, solvated.gro, etc.

Analysis of computational experiment. Task 1: Check the input

The following files in your directory protonated or deprotonated make up the input:

  1. A shell script that provides a complete computational experiment from preparation over equilibration, production to plotting of results:
    runAll.sh
    Can you spot differences between the shell scripts runAll.sh in the two directories protonated and unprotonated? (Hint: Open them with a text editor).
  2. The initial structure of Lys:
    lys.pdb
    Why are there not two initial structures, one for the protonated, and one for the deprotonated version? Is something missing in the PDB file?
  3. Parameter files for the preparation (including energy minimizing), equilibration, and production stage of the computational experiment:
    gromppWithWater.mdp
    gromppMinAll.mdp
    gromppEquilMD.mdp
    gromppProdMD.mdp
  4. So-called index files used to tell the distance extraction program g_dist between which pairs of atoms the distances should be computed from the MD trajectory:
    indexNN.ndx
    indexCN.ndx
  5. A script for the program R that is here used to plot energy and distance data into PDF-files:
    plot.R

When your MD run is complete, you should find the two files plotEnergies.pdf and plotDistances.pdf in the chosen directory, and when you open them with a PDF-viewer, they should show meaningful plots. If the two PDF files are not present, the run has not (yet) finished. If they are present but unreadable or empty, the run has failed.

Analysis of computational experiment. Task 2: Make sense of the shell script

All steps of the computational experiment are listed in the files runAll.sh in directories protonated and deprotonated, respectively. Most of the steps essentially call a gromacs program with certain input files and op- tions, and generate from this input output files. For instance in file runAll in directory protonated you will find:

pdb2gmx -lys -f lys.pdb -o lys.gro <<EOF >& pdb2gmx.out
9
1
1
EOF

The gromacs command pdb2gmx translates our initial structure in file lys.pdb into a slightly different format ("gro"-file) needed by gromacs. We have to provide to pdb2gmx also information about the force field we are intending to use ("9" means that we will use the 9th force field from a list of several force fields that are available), and about the water model we are intending to use (the first "1" tells gromacs that we would like to use the first water model from a list). Additionally, we ask with the flag -lys the program pdb2gmx to inquire from the user the protonation state of our Lys side chain; the last "1" says that the Lys should be protonated. If you call pdb2gmx not from a shell script but interactively, you will see these selection lists and have to manually type 9 and 1 and 1 on the keyboard. The <<EOF tells pdb2gmx do not wait for manual input, but take instead the data on the following lines in the script until EOF (=End Of File) occurs. Finally, >& pdb2gmx.out sends output of pdb2gmx, such as error messages or warnings, to the file pdb2gmx.out. We could thus check whether the command has finished successfully.

  1. How do you know that there is a program called pdb2gmx and which input it takes and which output it produces? A good starting point for answering questions like this is the gromacs flowchart shown here. If you compare this flowchart with the structure of the shell script: Is there some agreement?
  2. The flowchart also contains hyperlinks to the description of the pro- grams, such as pdb2gmx and to the files that are needed and pro- duced, such as .gro files. Note that not all programs used in the shell-script are mentioned in the flowchart; e.g. find out what the program genion is doing! For that purpose click on the flowchart page on "Main Table of Contents" in the top part of the page and check whether you can find something about genion there.

Analysis of computational experiment. Task 3: Check the output

  1. The largest and most important output file of the computational experiment is the trajectory file, i.e. a list of all atomic coordinates of the molecular system at many consecutive time points. Try to find this trajectory file; you may again consult the flowchart which of the files it could be.
  2. Another important file contains various energy related quantities of the system over time, such as the total energy, the kinetic energy, the temperature. Its file name should have a characteristic extension. Try to locate this file!
  3. At several stages of the computational experiment we have written the current structure of the system into a .pdb or .gro file. Use a viewer, e.g. vmd to check the structures in .gro-files at various stages. Inspect also some of the .gro files with a text editor. Can you see there coordinates of Lys, water, and ions?

Analysis of computational experiment. Task 4: Analysis of MD trajectories

In the last part of runAll.sh you can see that the production trajectory and the energies are extracted from the respective files. These data are put into .xvg files that can be plotted with the xmgrace program. Here, we have instead used an R script to do all plotting automatically. The results are found in files plotEnergies.pdf and plotDistances.pdf. You should see something as in the next two figures.

Question 49: What happens to the total energy in the course of the equilibration?
What could be the purpose of the equilibration?
What happens to the total energy in the course of the production run?

Answer

Compare your results with your colleagues from the protonated and deprotonated groups! Where do the average distances dNN and dCN lie in each group? You can find out by using xmgrace. Type:
xmgrace distCN.xvg
Then choose menu Edit ->  Data sets. Then click on the shown name of the data set so that it is selected. In this way you obtain a list of values below in the striped area called Statistics.

Figure 46: Energy traces (energy over time) from equilibration and produc- tion MDs. Note the differently scaled axes in the two graphs.

Figure 47: Traces of distances dNN and dCN from production run (top panels), and corresponding histograms (bottom panels).

Figure 48: Distance data loaded into xmgrace. The striped area shows statistical measures of the distance trace, e.g. the mean distance.

Analysis of computational experiment. Task 5: Compare protonated and deprotonated

In the last point of the previous task we have extracted data from our tra- jectories that can be used to address our initial scientific question: How does protonation affect the distances dNN and dCN? The figure below shows boxplots that summarize the data.

Figure 49. Boxplots comparing the dNN and dCN distances from two runs (one for each protonation state). The ends of the two whiskers mark the minimum and maximum distances (except for the deprotonated dCN where we have a few outliers at low distances). The bottom and top edges of the boxes mark the positions of the first and third quartile, and the bold central line is the median.

Question 50: How would you interpret your data or the boxplots?
In which way should we improve our computational experiment(s)?
Do we have ions included in the computational experiments? If yes, what would be the effect of removing these ions?

Answer

Enjoy the movie

The trajectories can be analyzed numerically, but sometimes it is also in- structive to watch what is going on. A trajectory is nothing but a movie, and you can watch that movie with a viewer, such as vmd etc. Alternatively, the tool ngmx is a part of the gromacs suite and can also be used to visualize movies, albeit in a simpler fashion.

Figure 50: This should be shown by vmd once you have loaded molecule and trajectory, and picked two atoms (here Nζ and C of Lys) to show their distance.