The lists described in this section deal with very elementary geometric aspects of proteins such as angles, torsion angles, accessibility areas, and secondary structure. |
Code | Type of data |
chi | Torsion angles |
tau | Backbone bond angles |
acc | Molecular surface area |
asa | Accessible surface area |
dsp | Secondary structure overview |
Given all torsion angles, one can (baring small bond-length differences) recalculate the entire structure of a protein. The average torsion angle deviations from ideal values are a measure of the quality of the structure solution. Especially the backbone torsion angles can be used for many computational applications, like e.g. making a Ramachandran plot. Feel free to look at the ′Torsion Angle′ section of the SFB course to learn about the definitions.
The torsion angle files typically look like:
1 ALA ( 2 )A - 999.9 104.0 178.2 999.9 999.9 999.9 999.9 999.9 2 ASP ( 3 )A T -81.1 116.3 -179.1 175.1 -162.7 999.9 999.9 999.9 3 LYS ( 4 )A T -59.3 -0.2 179.2 -73.4 -61.4 169.3 -64.7 999.9 4 GLU ( 5 )A H -110.8 22.9 178.5 -70.4 -58.7 123.9 999.9 999.9 5 LEU ( 6 )A H -68.5 127.2 -179.2 179.0 59.4 999.9 999.9 999.9 ....
From left to right each line holds the sequential number of the residue in the PDB file, the residue type, in brackets the residue number and insertion code found in the PDB file, the chain identifier of the chain in which this residue is found, the secondary structure according to DSSP translated to 4 states by WHAT IF (H,S,T,-), and the 8 torsion angles: Φ, Ψ,Ω, Χ1-5. Non existing angles such as Φ of the N-terminal residue, or Χ2 or higher in serine get the value 999.9; this also holds for residues that are pseudo termini because they sit next to a chain break. Angles are in degrees. WHAT IF associates with residue i the Ω angle of the peptide plane between the residues i and i+1; this is not the IUPAC convention; sorry.
For each amino acid the following four angles are listed:
In which the subscript i+1 indicates an atom (the backbone N) in the next residue. The first and the last residue of each chain are not used.
The backbone bond angle file typically looks like:
.... 7 PRO ( 5 )E 109.5970 120.0587 117.9323 122.0064 8 LEU ( 6 )E 115.9793 114.9306 122.1006 122.9435 9 THR ( 7 )E 113.2147 117.4240 123.0386 119.4641 10 ASN ( 8 )E 109.0025 116.2505 120.7527 122.9428 ....
From left to right each line holds the sequential number of the residue in the PDB file, the residue type, in brackets the residue number and (optionally the) insertion code found in the PDB file, the chain identifier of the chain in which this residue is found (E in this example), and the four bond angles in degrees; the first one being τ.
Solvent accessibilities of residues are used in many protein structure bioinformatics applications. The ′acc′ directory of the ′Lists′ area holds for every PDB file an entry in which per residue the accessible molecular surface is listed. This surface is smaller than the solvent accessible surface (by a factor of 2.5 - 3, roughly). Feel free to look at the ′Accessibility′ section of the SFB course to learn about these definitions.
The accessibility files typically look like:.... 171 PRO ( 171 )A H 2.33 172 LEU ( 172 )A H 24.70 173 VAL ( 173 )A H 40.00 174 GLY ( 174 )A T 12.50 175 TRP ( 175 )A T 25.87 176 SER ( 176 )A T 0.53 177 ARG ( 177 )A - 8.92 178 TYR ( 178 )A S 2.95 179 ILE ( 179 )A S 0.00 180 PRO ( 180 )A S 0.00 181 GLU ( 181 )A S 0.60 ....
From left to right each line holds the sequential number of the residue in the PDB file, the residue type, in brackets the residue number and insertion code found in the PDB file, the chain identifier of the chain in which this residue is found, the secondary structure according to DSSP translated to 4 states by WHAT IF (H,S,T,-), and the total residue molecular accessible surface area in square Ångström
The previous ′Lists′-entry (acc) gave accessible molecular surface areas. This ′Lists′ entry (asa) gives for all residues their accessible surface areas. These are the values one often finds in articles called ′solvent accessibility areas′. Feel free to look at the ′Accessibility′ section of the SFB course to learn about these definitions, and about the difference between the accessible surface area and the molecular (accessible) surface area.
The secondary structure has always been an intruiging aspect of protein structures. That is funny, because there are hardly any questions to which the answer includes the words "secondary structure". Nevertheless, after Pauling predicted the secondary structure elements, Ramachandran invented his plot, and Kabsch and Sander wrote DSSP, secondary structure prediction has probably remained the topic of most interest of aspiring bioinformaticians world-wide.
We used DSSP on all PDB files, and you find the corresponding output files in the DSSP database. Here we use DSSP too, but convert the output to 4-state with easier names than in use by DSSP; and we added a few things.
This set of files list for each protein its sequence and secondary structure in an easy to parse format.
The files typically look like:
101m 10 20 30 40 ... | | | | ... 1 - 154 MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFD... 1 - 154 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHH T... 1 - 154 * ** ** ** * * ... 1 - 154 AA AAA AA AA AA AAAA A A AAAAA AA A...
At the left, the residue range is given. This range always starts with 1 (sorry), and ends with the sequence length.
For this range the secondary structure will shown as an equally long string of secondary structure codes (H=Helix, S=Strand, T=Turn, " "=Coil).
Every residue involved in a (close; atoms touching each other) symmetry contact will be labeled with an asterix.
Every residue that is clearly solvent accessible will be labeled with a capital A. Clearly exposed is defined as either more exposed than 10 square Ångström, or more than 33% of its maximally possible accessibility in the unfolded state.