WHAT_CHECK

B-factors

The B-factor plot was what gave away the fraud by Murthy, so these plots seem important.

B-factor plots

To illustrate this I will first list a series of B-factor plots for real structures (not entirely randomly chosen, these structures were selected based on A) I have worked with these structures in one project or the other; and B) I personally know the main author of the PDB entry, and trust him/her).

Crambin. PDB-id=1CRN.

Octamer motif. PDB-id=1OCT. B-chain

Sugar binding protein. PDB-id=2PZM. A-chain.

Sugar binding protein. PDB-id=2PZM. B-chain.

Thermitase-eglin complex. PDB-id=3TEC. Enzyme.

Thermitase-eglin complex. PDB-id=3TEC. Inhibitor.

Rhino-14 capsid protein VP2. PDB-id=4RHV. (I solved this one myself!)

Thermolysin. PDB-id=5TLN.

And now the B-factor plots for the structures produced by Murthy.

PDB-id=2qid B-chain

The ones below I plot with a different program that plots B-factor plots for the whole PDB file rather then one molecule at a time, and that has the vertical axis start at zero rather than a bit below the lowest value. This shows the funny behaviour of Murthy's B-factors even better.

PDB-id=1bef

PDB-id=1g40

PDB-id=1rid

PDB-id=2hr0

PDB-id=1cmw

PDB-id=1g44

PDB-id=1y8e

PDB-id=2ou1

PDB-id=1df9

PDB-id=1l6l

PDB-id=2a01

PDB-id=2qid

The M-factor

WHAT_CHECK calculates an M-factor that expresses this weird B-factor distribution. The algorithm is explained in the:

Supplemental material

M-factors tend to be 0.2-0.4 or so for 'normal' PDB files. M-factors tend to be close to zero in Murthy files. We report the M-factor when it gets below 0.1. In July 2010 we looked at all M-factors in the PDB and found about 30 files with such an M-factor. In about half of the cases nothing seemed wrong, while in several cases there was an obvious reason for this behaviour.

Be aware that TLS can do things to B-factors that I don't understand. I do not know if this can have an influence on the M-factor.

Other B-factor checks

It seems obvious that two atoms that are covalently bound to each other cannot have widely different B-factors. Still this occasionally happens. We have found situations (especially in structures refined with one particular refinement program...) where the Cγ in phenylalanine has a higher B-factor than the Cβ and the two Cδs it is covalently bound to.

If the B-factor distribution seems too improbable in terms of differences between covalently bound atoms you get the message:

B factor Z-score
Error: The B-factors of bonded atoms show signs of over-refinement

If you are a crystallographer solving a crystal structure you can consider limiting the freedom of your B-factors in refinement by tightening some parameter, or perhaps you should give in each side chain all atoms the same B-factor. Perhaps you should even consider not refining individual B-factors at all. It also seems wise to talk with an experienced crystallographer about using TLS instead of individual B-factors.

In 'normal' X-ray structures you do expect buried atoms to have a lower B-factor, on average, than atoms that are located at the surface. But individual exceptions are commonly observed. First, not all surface atoms are at the surface in the crystal (crystal packing reduces the freedom of the surface residues involved). Second, most proteins have a few buried residues with higher B-factor than 'the rest'. If we do not observe this, we issue a warning:

B factor Z-score
Error: The B-factors of bonded atoms show signs of over-refinement

Things can also go the other way around. In the report for 151L we find as B-factor plot:

Figure 22. The B-factor plot for 151L.

and the warning text:

Warning: Average B-factor problem
The average B-factor for all buried protein atoms normally lies
between 10--20. Values around 3--5 are expected for X-ray studies
performed at liquid nitrogen temperature.
Because of the extreme value for the average B-factor, no further
analysis of the B-factors is performed.
Average B-factor for buried atoms : 38.587

The structure 151L is probably correct in most of its characteristics. And it is definitely a very good structure, especially given the fact that it was deposited in 1994. But the funny B-factor plot tells us that something weird has happened. Re-refining this structure in 2009 did not improve things very much, so perhaps the crystals weren't very good? We don't know. So, that is why WHAT_CHECK issues a warning.

It seems obvious that the B-factor of the Cγ of a phenylalanine can not be higher than the B-factor of its direct neighbours. And it seems unlikely that the Cδs of a leucine will have B-factors around 70 when the B-factor of the Cβ is around 14 and the B-factor of the Cγ is 21 or so. Still we often observe such problems; especially in structures refined with TNT.

In 141L we observe such a phenylalanine anyway:

JRNL        AUTH   E.P.BALDWIN,O.HAJISEYEDJAVADI,W.A.BAASE,
JRNL        AUTH 2 B.W.MATTHEWS
JRNL        TITL   THE ROLE OF BACKBONE FLEXIBILITY IN THE
JRNL        TITL 2 ACCOMMODATION OF VARIANTS THAT REPACK THE CORE OF
JRNL        TITL 3 T4 LYSOZYME.
JRNL        REF    SCIENCE                       V. 262  1715 1993

ATOM     25  N   PHE A   4      39.649   2.772  13.343  1.00 10.60           N
ATOM     26  CA  PHE A   4      40.460   3.622  14.191  1.00  6.54           C
ATOM     27  C   PHE A   4      41.743   4.004  13.535  1.00 15.30           C
ATOM     28  O   PHE A   4      42.115   5.146  13.569  1.00  9.36           O
ATOM     29  CB  PHE A   4      40.730   2.949  15.557  1.00 11.91           C
ATOM     30  CG  PHE A   4      39.448   2.856  16.342  1.00 18.40           C
ATOM     31  CD1 PHE A   4      38.580   1.770  16.196  1.00  7.82           C
ATOM     32  CD2 PHE A   4      39.061   3.897  17.184  1.00 17.51           C
ATOM     33  CE1 PHE A   4      37.402   1.658  16.933  1.00 20.71           C
ATOM     34  CE2 PHE A   4      37.877   3.807  17.920  1.00 17.71           C
ATOM     35  CZ  PHE A   4      37.047   2.691  17.804  1.00 13.35           C

Fortunately the reflections have been deposited for most of the TNT structures that have this problem, so in future PDB_REDO runs we are likely to see this problem get solved.

The other messages that WHAT_CHECK can produce should all be clear to crystallographers, and both meaningless and not very important for others.