File-content

The WHAT IF program uses the famous 'SHOSOU' command to analyze the contents of a PDB entry. Inside WHAT IF / WHAT_CHECK this content is called 'the SOUP', because after all a PDB file and a cup of soup both consist of water with proteins in it. The only differences are the order and the taste.

A typical result from the extended SHOSOU command looks like:

    Contents of the SOUP:                                      *1
Protein .................... : 2                               *2
Drug, ligand or co-factor .. : 1
DNA or RNA ................. : 0
Single atom entity ......... : 7
(Groups of) water .......... : 1
Drug with known topology ... : 0
 Molecule      Range              Type              Set name   *3
     1    1 (    1)  316 (  316)E Protein           set        *4
     2  317 (  322)  318 (  323)D Protein           set        *4
     3  319 (  O2 )  319 (  O2 )E K O2 <-           set        *5
     4  320 (  317)  320 (  317)   CA               set        *6
     5  321 (  318)  321 (  318)   CA               set
     6  322 (  319)  322 (  319)   CA               set
     7  323 (  320)  323 (  320)   CA               set
     8  324 (  321)  324 (  321)   ZN               set
     9  325 (  324)  325 (  324)  DMS               set        *7
    10  326 (  O2 )  326 (  O2 )D L O2 <-           set        *8
    11  327 ( HOH )  327 ( HOH )  water   ( 157)    set        *9
   *10  *11   *12    *13    *14   *15               *16

  1. This is the header of the SHOSOU output
  2. First the contents of the soup is counted, This table is only produced when the debug flag is switched on. Normally the *2 output is skipped.
  3. This is the header of the real thing of the SHOSOU command.
  4. Molecule one is a protein with chain identifier E. This protein has 316 amino acids. The second protein is a two residue peptide with chain identifier D.
  5. The third molecule is the C-terminal oxygen of chain E. It is attached to a Lysine (that is indicated by the character K) and the arrow indicates that it is bound to something.
  6. Molecules 5 till 8 are single atomic entities (together with the two C-terminal oxygens they form the seven single atomic entities mentioned in the top half of the output.
  7. DMS probably stands for DMSO, and is a drug, ligand or co-factor. For WHAT IF drug, ligand, and co-factor are all the same thing.
  8. This is the C-terminal oxygen of the second molecule. You can see that because the O2 indicates that it is a C-terminal oxygen. The D indicates that it is part of the D chain and the arrow indicates that it is bound to something. The L indicates that it is bound to a Leucine.
  9. This is a group of 157 water molecules.
  10. The 'molecule' number.
  11. The WHAT IF number of the first residue in this molecule.
  12. The PDB number of the first residue in this molecule.
  13. The WHAT IF number of the last residue in this molecule.
  14. The PDB number of the last residue in this molecule.
  15. A short description of this molecule.
  16. The set-name is the name the user gave to the ensemble of molecules added to the soup with one single GETMOL or GETGRO, etc., command. This set-name is only relevant when WHAT IF is used interactively.

Some notes regarding the PDB file content

After showing the content of the PDB file (which in WHAT IF / WHAT_CHECK terms is 'the SOUP') you get some countings, like the number of residues, the number of waters, and the numbers of those that have unlikely or missing atoms. WHAT_CHECK also looks for residues with a negative (or zero) residue number, and it looks for consecutive residues with decreasing residue numbers.

In this section you also find some statistics about the use of chain identifiers. There is nothing wrong is a series of molecules have as chain identifier A,B,C,E,F,G, respectively. But the missing chain C might be indicative for an administrative problem that the experimentalist might immediately recognize.

This list is, just like the SHOSOU table more meant for the experimentalist who might see something in his/her PDB file that isn't supposed to be there.

In case ions are found that have the wrong chain identifier, they are listed in a table. An ion is said to have the wrong chain identifier if its chain identifier is the same as that of a protein, nucleic acid, or sugar chain, while it makes more contacts with a protein, nucleic acid, or sugar chain with another chain identifier. Obviously, this isn't wrong, but is surely doesn't help the end-users. An example is found in 1ET1:

JRNL        AUTH   L.JIN,S.L.BRIGGS,S.CHANDRASEKHAR,N.Y.CHIRGADZE,
JRNL        AUTH 2 D.K.CLAWSON,R.W.SCHEVITZ,D.L.SMILEY,A.H.TASHJIAN,
JRNL        AUTH 3 F.ZHANG
JRNL        TITL   CRYSTAL STRUCTURE OF HUMAN PARATHYROID HORMONE
JRNL        TITL 2 1-34 AT 0.9-A RESOLUTION.
JRNL        REF    J.BIOL.CHEM.                  V. 275 27238 2000

WHAT_CHECK reports for this file:

# 22 # Warning: Ions bound to the wrong chain
=============================================
The ions listed in the table have a chain identifier that
is the same as one of the protein, nucleic acid, or sugar chains.
However, the ion seems bound to protein, nucleic acid, or sugar,
with another chain identifier.
 
Obviously, this is not wrong, but it is confusing for users of this
PDB file.
  
  71  NA   ( 101-)  A  -
  72  NA   ( 102-)  B  -

Figure 9. 1ET1 Everything with chain identifier A is in yellow, and everything with chain identifier B in purple. The small balls are water molecules. The two big balls are sodium ions.