Bond lengths and bond angles are a whole story in themselves. Engh and Huber (see the section on τ-angles for details and references) determined from small molecules in the CSD what should be the ideal bond lengths and bond angles. lately, these values have come under fire. Also from us...
But for the time being, they are good enough, certainly for validation purposes because if we report a 4σ deviation and it actually is a 3.5σ or 4.5σ deviation, then who cares, it is an exception anyway.
The Engh and Huber parameters seem OK for most cases, though, and if one day somebody works out all the problems we mentioned in the τ-article, then we will implement that in WHAT_CHECK, and redo the entire PDBREPORT database.
The idea is that we know the exact values of all bond lengths and bond angles, and we know the exact deviations in these values. If we now assume that the variation around these values is random (which it isn't; but the deviation isn't dramatic), then we can determine how many standard deviation each bond length or angle deviates from the ideal value, and report the 4σ deviations. The number of standard deviations that any observed value deviates from the ideal average is commonly called the Z-score of that observed value. Under 'Validation' you find some notes on this topic. A useful characteristic of a normal distribution is that the RMS-Z score is exactly 1.0. So, if we determine the RMS-Z score for all bond lengths, and we observe an RMS-Z score of 1.3, we know that the restraints on the bond lengths were to weak during refinement, and when the RMS-Z score is 0.3 it can be that the restraints should have been relaxed. So, if it is 1.3, it is an error, but if it is 0.7, we only issue a warning. The latter is because we don't know how much data was available to generate the density map. Ian Tickle has written some readable stuff on this topic.
In earlier WHAT_CHECK versions there was an inconvenience (not really a bug, but almost).
JRNL AUTH J.J.JEFFERSON,C.CIATTO,L.SHAPIRO,R.K.LIEM JRNL TITL STRUCTURAL ANALYSIS OF THE PLAKIN DOMAIN OF JRNL TITL 2 BULLOUS PEMPHIGOID ANTIGEN1 (BPAG1) SUGGESTS THAT JRNL TITL 3 PLAKINS ARE MEMBERS OF THE SPECTRIN SUPERFAMILY. JRNL REF J.MOL.BIOL. V. 366 244 2007 |
The coordinates also don't hint at anything (like a minus sign missing or so):
ATOM 1543 N GLN A 216 53.689 -70.504 93.778 1.00 41.77 N ATOM 1544 CA GLN A 216 54.956 -70.699 94.513 1.00 41.77 C ATOM 1545 C GLN A 216 56.136 -70.471 93.538 1.00 41.77 C ATOM 1546 O GLN A 216 33.819 2.841 59.535 1.00 41.77 O ATOM 1547 CB GLN A 216 55.051 -72.138 95.200 1.00 41.77 C ATOM 1548 CG GLN A 216 54.327 -72.259 96.632 1.00 41.77 C ATOM 1549 CD GLN A 216 52.877 -72.823 96.526 1.00 41.77 C ATOM 1550 OE1 GLN A 216 52.479 -73.561 95.389 1.00 41.77 O ATOM 1551 NE2 GLN A 216 52.072 -72.173 97.721 1.00 41.77 N |
WHAT IF reports faithfully:
The bond lengths listed in the table below were found to deviate more than 4 sigma from standard bond lengths (both standard values and sigmas for amino acid residues have been taken from Engh and Huber [REF], for DNA they were taken from Parkinson et al [REF]). In the table below for each unusual bond the bond length and the number of standard deviations it differs from the normal value is given. 198 GLN ( 216-) A - C O 83.84 4130.4 198 GLN ( 216-) A - CD OE1 1.41 9.1 198 GLN ( 216-) A - CD NE2 1.58 12.0 |
And now, because the square of 4130.4 is a big number, the RMS Z-score became a bit big too:
RMS Z-score for bond lengths: 104.078
In the new WHAT_CHECK we therefore built-in that crazy
things (10σ or worse deviations) do not participate in the RMS-Z calculation
and then we get for 2IAK a much more reasonable message:
RMS Z-score for bond lengths: 0.674
Browsing through a list of hundreds of files that all have a very high RMS-Z score for the bond lengths, I find many nucleic acids, many TNT refined files, and many examples like the one listed above where a few crazy distances offset the calculation completely.
For bond angles I can tell a similar story, but I won't...