1) A Z-score is the number of standard deviations that an observation is
away from the mean. So, if we have observed that the C=O distance is
1.232 +/- 0.023 Ångström then a C=O bond that has a bond length of
1.255 Ångström has a Z-score of 1.0. Obviously, one should be certain
about the data 1.232 +/- 0.023 Ångström because if that is wrong everything
that follows will be wrong. That is why E&H used the CSD (remember what
that is?) because data in the CSD is so much more precise than data in the PDB
that for PDB structure validation the CSD derived data can for all practical
purposes be called correct.
2) So, the data in the E&H FF consists of CSD derived bond lengths with standard deviations.
3) The validation algorithm is simple:
a) Measure all bond lengths in the protein that you want to validate;
b) check for each bond length the Z-score;
c) Report any bond length with a |Z| > 4.0 (or 3.0 if you want to get
picky);
d) Determine the RMS of all Z-scores (the RMS-Z ) and report if the
RMS-Z deviates significantly from 1.0. (The latter is a minor detail that is
useful to know: if a distribution is normal than its RMS-Z score is 1.0; and
please talk with the assistants if you don't understand this).