Computations on PDB files

- Geometry

Elementary geometrical aspects

The lists described in this section deal with very elementary geometric aspects of proteins such as angles, torsion angles, accessibility areas, and secondary structure.

Code Type of data

chi Torsion angles

tau Backbone bond angles

acc Molecular surface area

asa Accessible surface area

dsp Secondary structure overview

Torsion angles (chi)

Given all torsion angles, one can (baring small bond-length differences) recalculate the entire structure of a protein. The average torsion angle deviations from ideal values are a measure of the quality of the structure solution. Especially the backbone torsion angles can be used for many computational applications, like e.g. making a Ramachandran plot. Feel free to look at the ′Torsion Angle′ section of the SFB course to learn about the definitions.

The torsion angle files typically look like:

 1 ALA (   2 )A     -  999.9  104.0  178.2  999.9  999.9  999.9  999.9  999.9
 2 ASP (   3 )A     T  -81.1  116.3 -179.1  175.1 -162.7  999.9  999.9  999.9
 3 LYS (   4 )A     T  -59.3   -0.2  179.2  -73.4  -61.4  169.3  -64.7  999.9
 4 GLU (   5 )A     H -110.8   22.9  178.5  -70.4  -58.7  123.9  999.9  999.9
 5 LEU (   6 )A     H  -68.5  127.2 -179.2  179.0   59.4  999.9  999.9  999.9
....

From left to right each line holds the sequential number of the residue in the PDB file, the residue type, in brackets the residue number and insertion code found in the PDB file, the chain identifier of the chain in which this residue is found, the secondary structure according to DSSP translated to 4 states by WHAT IF (H,S,T,-), and the 8 torsion angles: Φ, Ψ,Ω, Χ1-5. Non existing angles such as Φ of the N-terminal residue, or Χ2 or higher in serine get the value 999.9; this also holds for residues that are pseudo termini because they sit next to a chain break. Angles are in degrees. WHAT IF associates with residue i the Ω angle of the peptide plane between the residues i and i+1; this is not the IUPAC convention; sorry.

Backbone bond angles (tau)

For each amino acid the following four angles are listed:

N-Cα-C
Cα-C-N_i+1
Cα-C-O
O-C-N_i+1

In which the subscript i+1 indicates an atom (the backbone N) in the next residue. The first and the last residue of each chain are not used.

The backbone bond angle file typically looks like:

....
    7 PRO (   5 )E       109.5970  120.0587  117.9323  122.0064
    8 LEU (   6 )E       115.9793  114.9306  122.1006  122.9435
    9 THR (   7 )E       113.2147  117.4240  123.0386  119.4641
   10 ASN (   8 )E       109.0025  116.2505  120.7527  122.9428
....

From left to right each line holds the sequential number of the residue in the PDB file, the residue type, in brackets the residue number and (optionally the) insertion code found in the PDB file, the chain identifier of the chain in which this residue is found (E in this example), and the four bond angles in degrees; the first one being τ.

Residue molecular accessible surface areas (acc)

Solvent accessibilities of residues are used in many protein structure bioinformatics applications. The ′acc′ directory of the ′Lists′ area holds for every PDB file an entry in which per residue the accessible molecular surface is listed. This surface is smaller than the solvent accessible surface (by a factor of 2.5 - 3, roughly). Feel free to look at the ′Accessibility′ section of the SFB course to learn about these definitions.

The accessibility files typically look like:

....
  171 PRO ( 171 )A     H    2.33
  172 LEU ( 172 )A     H   24.70
  173 VAL ( 173 )A     H   40.00
  174 GLY ( 174 )A     T   12.50
  175 TRP ( 175 )A     T   25.87
  176 SER ( 176 )A     T    0.53
  177 ARG ( 177 )A     -    8.92
  178 TYR ( 178 )A     S    2.95
  179 ILE ( 179 )A     S    0.00
  180 PRO ( 180 )A     S    0.00
  181 GLU ( 181 )A     S    0.60
....

Residue accessible surface areas (asa)

The previous ′Lists′-entry (acc) gave accessible molecular surface areas. This ′Lists′ entry (asa) gives for all residues their accessible surface areas. These are the values one often finds in articles called ′solvent accessibility areas′. Feel free to look at the ′Accessibility′ section of the SFB course to learn about these definitions, and about the difference between the accessible surface area and the molecular (accessible) surface area.

Secondary structure and a few more things (dsp)

The secondary structure has always been an intruiging aspect of protein structures. That is funny, because there are hardly any questions to which the answer includes the words "secondary structure". Nevertheless, after Pauling predicted the secondary structure elements, Ramachandran invented his plot, and Kabsch and Sander wrote DSSP, secondary structure prediction has probably remained the topic of most interest of aspiring bioinformaticians world-wide.

We used DSSP on all PDB files, and you find the corresponding output files in the DSSP database. Here we use DSSP too, but convert the output to 4-state with easier names than in use by DSSP; and we added a few things.

This set of files list for each protein its sequence and secondary structure in an easy to parse format.

The files typically look like:

101m
                     10        20        30        40     ...
                      |         |         |         |     ...
   1 -  154  MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFD...
   1 -  154      HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHH T...
   1 -  154    * **         ** ** *  *                    ...
   1 -  154  AA AAA  AA AA  AA AAAA     A   A  AAAAA  AA A...

At the left, the residue range is given. This range always starts with 1 (sorry), and ends with the sequence length.

For this range the secondary structure will shown as an equally long string of secondary structure codes (H=Helix, S=Strand, T=Turn, " "=Coil).

Every residue involved in a (close; atoms touching each other) symmetry contact will be labeled with an asterix.

Every residue that is clearly solvent accessible will be labeled with a capital A. Clearly exposed is defined as either more exposed than 10 square Ångström, or more than 33% of its maximally possible accessibility in the unfolded state.

Code	Type of data
chi	Torsion angles
tau	Backbone bond angles
acc	Molecular surface area
asa	Accessible surface area
dsp	Secondary structure overview