Engh and Huber

Bond lengths and bond angles are a whole story in themselves. Engh and Huber (see the section on τ-angles for details and references) determined from small molecules in the CSD what should be the ideal bond lengths and bond angles. lately, these values have come under fire. Also from us...

But for the time being, they are good enough, certainly for validation purposes because if we report a 4σ deviation and it actually is a 3.5σ or 4.5σ deviation, then who cares, it is an exception anyway.

The Engh and Huber parameters seem OK for most cases, though, and if one day somebody works out all the problems we mentioned in the τ-article, then we will implement that in WHAT_CHECK, and redo the entire PDBREPORT database.

The idea is that we know the exact values of all bond lengths and bond angles, and we know the exact deviations in these values. If we now assume that the variation around these values is random (which it isn't; but the deviation isn't dramatic), then we can determine how many standard deviation each bond length or angle deviates from the ideal value, and report the 4σ deviations. The number of standard deviations that any observed value deviates from the ideal average is commonly called the Z-score of that observed value. Under 'Validation' you find some notes on this topic. A useful characteristic of a normal distribution is that the RMS-Z score is exactly 1.0. So, if we determine the RMS-Z score for all bond lengths, and we observe an RMS-Z score of 1.3, we know that the restraints on the bond lengths were to weak during refinement, and when the RMS-Z score is 0.3 it can be that the restraints should have been relaxed. So, if it is 1.3, it is an error, but if it is 0.7, we only issue a warning. The latter is because we don't know how much data was available to generate the density map. Ian Tickle has written some readable stuff on this topic.

In earlier WHAT_CHECK versions there was an inconvenience (not really a bug, but almost).

JRNL        AUTH   J.J.JEFFERSON,C.CIATTO,L.SHAPIRO,R.K.LIEM
JRNL        TITL   STRUCTURAL ANALYSIS OF THE PLAKIN DOMAIN OF
JRNL        TITL 2 BULLOUS PEMPHIGOID ANTIGEN1 (BPAG1) SUGGESTS THAT
JRNL        TITL 3 PLAKINS ARE MEMBERS OF THE SPECTRIN SUPERFAMILY.
JRNL        REF    J.MOL.BIOL.                   V. 366   244 2007

Figure 34. In 1IAK we find one terribly long bond. The O of the C-terminal glutamine is sitting at the other end of the molecule. It is not connected to anything, and it also isn't a symmetry related case or so. So I have no idea how they achieved this.

The coordinates also don't hint at anything (like a minus sign missing or so):

ATOM   1543  N   GLN A 216      53.689 -70.504  93.778  1.00 41.77           N
ATOM   1544  CA  GLN A 216      54.956 -70.699  94.513  1.00 41.77           C
ATOM   1545  C   GLN A 216      56.136 -70.471  93.538  1.00 41.77           C
ATOM   1546  O   GLN A 216      33.819   2.841  59.535  1.00 41.77           O
ATOM   1547  CB  GLN A 216      55.051 -72.138  95.200  1.00 41.77           C
ATOM   1548  CG  GLN A 216      54.327 -72.259  96.632  1.00 41.77           C
ATOM   1549  CD  GLN A 216      52.877 -72.823  96.526  1.00 41.77           C
ATOM   1550  OE1 GLN A 216      52.479 -73.561  95.389  1.00 41.77           O
ATOM   1551  NE2 GLN A 216      52.072 -72.173  97.721  1.00 41.77           N

WHAT IF reports faithfully:

The bond lengths listed in the table below were found to deviate
more than 4 sigma from standard bond lengths (both standard values
and sigmas for amino acid residues have been taken from Engh and
Huber [REF], for DNA they were taken from Parkinson et al [REF]). In
the table below for each unusual bond the bond length and the
number of standard deviations it differs from the normal value is
given.
 198 GLN   ( 216-)  A  -   C    O    83.84 4130.4
 198 GLN   ( 216-)  A  -   CD   OE1   1.41    9.1
 198 GLN   ( 216-)  A  -   CD   NE2   1.58   12.0

And now, because the square of 4130.4 is a big number, the RMS Z-score became a bit big too:
RMS Z-score for bond lengths: 104.078

In the new WHAT_CHECK we therefore built-in that crazy things (10σ or worse deviations) do not participate in the RMS-Z calculation and then we get for 2IAK a much more reasonable message:
RMS Z-score for bond lengths: 0.674

Browsing through a list of hundreds of files that all have a very high RMS-Z score for the bond lengths, I find many nucleic acids, many TNT refined files, and many examples like the one listed above where a few crazy distances offset the calculation completely.

For bond angles I can tell a similar story, but I won't...