Output formats

The WHAT IF program uses the famous 'SHOSOU' command to analyze the contents of a PDB entry. Inside WHAT IF / WHAT_CHECK this content is called 'the SOUP', because after all a PDB file and a cup of soup both consist of water with proteins in it. The only differences are the order and the taste.

A typical result from the extended SHOSOU command looks like:

    Contents of the SOUP:                                      *1
Protein .................... : 2                               *2
Drug, ligand or co-factor .. : 1
DNA or RNA ................. : 0
Single atom entity ......... : 7
(Groups of) water .......... : 1
Drug with known topology ... : 0
 Molecule      Range              Type              Set name   *3
     1    1 (    1)  316 (  316)E Protein           set        *4
     2  317 (  322)  318 (  323)D Protein           set        *4
     3  319 (  O2 )  319 (  O2 )E K O2 <-           set        *5
     4  320 (  317)  320 (  317)   CA               set        *6
     5  321 (  318)  321 (  318)   CA               set
     6  322 (  319)  322 (  319)   CA               set
     7  323 (  320)  323 (  320)   CA               set
     8  324 (  321)  324 (  321)   ZN               set
     9  325 (  324)  325 (  324)  DMS               set        *7
    10  326 (  O2 )  326 (  O2 )D L O2 <-           set        *8
    11  327 ( HOH )  327 ( HOH )  water   ( 157)    set        *9
   *10  *11   *12    *13    *14   *15               *16

  1. This is the header of the SHOSOU output
  2. First the contents of the soup is counted, This table is only produced when the debug flag is switched on. Normally the *2 output is skipped.
  3. This is the header of the real thing of the SHOSOU command.
  4. Molecule one is a protein with chain identifier E. This protein has 316 amino acids. The second protein is a two residue peptide with chain identifier D.
  5. The third molecule is the C-terminal oxygen of chain E. It is attached to a Lysine (that is indicated by the character K) and the arrow indicates that it is bound to something.
  6. Molecules 5 till 8 are single atomic entities (together with the two C-terminal oxygens they form the seven single atomic entities mentioned in the top half of the output.
  7. DMS probably stands for DMSO, and is a drug, ligand or co-factor. For WHAT IF drug, ligand, and co-factor are all the same thing.
  8. This is the C-terminal oxygen of the second molecule. You can see that because the O2 indicates that it is a C-terminal oxygen. The D indicates that it is part of the D chain and the arrow indicates that it is bound to something. The L indicates that it is bound to a Leucine.
  9. This is a group of 157 water molecules.
  10. The 'molecule' number.
  11. The WHAT IF number of the first residue in this molecule.
  12. The PDB number of the first residue in this molecule.
  13. The WHAT IF number of the last residue in this molecule.
  14. The PDB number of the last residue in this molecule.
  15. A short description of this molecule.
  16. The set-name is the name the user gave to the ensemble of molecules added to the soup with one single GETMOL or GETGRO, etc., command. This set-name is only relevant when WHAT IF is used interactively.

Some notes regarding the PDB file content

After showing the content of the PDB file (which in WHAT IF / WHAT_CHECK terms is 'the SOUP') you get some countings, like the number of residues, the number of waters, and the numbers of those that have unlikely or missing atoms. WHAT_CHECK also looks for residues with a negative (or zero) residue number, and it looks for consecutive residues with decreasing residue numbers.

In this section you also find some statistics about the use of chain identifiers. There is nothing wrong is a series of molecules have as chain identifier A,B,C,E,F,G, respectively. But the missing chain C might be indicative for an administrative problem that the experimentalist might immediately recognize.

This list is, just like the SHOSOU table more meant for the experimentalist who might see something in his/her PDB file that isn't supposed to be there.

In case ions are found that have the wrong chain identifier, they are listed in a table. An ion is said to have the wrong chain identifier if its chain identifier is the same as that of a protein, nucleic acid, or sugar chain, while it makes more contacts with a protein, nucleic acid, or sugar chain with another chain identifier. Obviously, this isn't wrong, but is surely doesn't help the end-users. An example is found in 1ET1:

JRNL        TITL 2 1-34 AT 0.9-A RESOLUTION.
JRNL        REF    J.BIOL.CHEM.                  V. 275 27238 2000

WHAT_CHECK reports for this file:

# 22 # Warning: Ions bound to the wrong chain
The ions listed in the table have a chain identifier that
is the same as one of the protein, nucleic acid, or sugar chains.
However, the ion seems bound to protein, nucleic acid, or sugar,
with another chain identifier.
Obviously, this is not wrong, but it is confusing for users of this
PDB file.
  71  NA   ( 101-)  A  -
  72  NA   ( 102-)  B  -

Figure 2. 1ET1 Everything with chain identifier A is in yellow, and everything with chain identifier B in purple. The small balls are water molecules. The two big balls are sodium ions.

Residue and atom nomenclature

In the box below we illustrate the residue and atom nomenclature with one warning as an example.

Warning: Unusual bond angles
The bond angles listed in the table below were found to deviate
more than 4 sigma from standard bond angles (both standard values
and sigma for protein residues have been taken from Engh and
Huber [REF], for DNA/RNA from Parkinson et al [REF]). In the
table below for each strange angle the bond angle and the number
of standard deviations it differs from the standard values is
given. Please note that disulphide bridges are neglected. Atoms
starting with "-" belong to the previous residue in the sequence.
   1 THR   (   2-)  A  -   CA   CB   OG1 103.06   -4.4
   1 THR   (   2-)  A  -   CG2  CB   OG1 117.41    4.1
  12 ASN   (  13-)  A  -   ND2  CG   OD1 127.61    5.0
  14 ASN   (  15-)  A  -   ND2  CG   OD1 128.63    6.0
  39 THR   (  40-)  B  -   CA   CB   OG1 103.59   -4.0
  45 ALA   (  46-)  B  -   N    CA   CB  103.98   -4.3
  *1  *2      *3   *4 *5   *6   *6   *6    *7      *8

The box lists a warning. A series of bond-angles considered unusual is listed. The *1, *2, etc is not part of the output but added here to label the columns.

  1. The first number is the sequential number of the residue in the PDB file.
  2. The second column holds the three letter code of the residue (this can be any three letter code used in the PDB, and can include amino acids, nucleic acids, co-factors, drugs, sugars, lipids, ions, water, etc).
  3. The third column holds the residue number as given in the PDB file. In this case the first residue of the chain could not be seen in the electron density so that the first residue in the PDB file actually is residue two of the molecule. The minus sign after the residue number indicates
  4. Column 4 holds the chain identifier of the residue.
  5. Column 5 holds minus signs. If this were an NMR structure this column would hold the NMR MODEL number.
  6. The columns labeled 6 hold the three atoms involved in this check.
  7. Column 7 holds the observed value.
  8. Column 8 indicates how many standard deviations the value in column 7 is away from the WHAT_CHECK target value.