Material linked from bioinformatics course
The following text is borrowed from the website about the Murthy fraud.
|
The visual observation seems to suggest that the Murthy B-factors are selected
randomly selected from a series of numbers that are not normally distributed
between some reasonable upper and lower limit. These limits typically are
about 20 and 40.
Visual inspection clearly shows that nine Murthy files have fabricated B-factors,
one or two are likely to have borrowed B-factors, while another two or three
files perhaps have been solved experimentally (but not very skillfully from a
crystallographic point of view).
So I need an algorithm that detects the B-factor selection method used by Murthy.
I have tried a few things (Fourier transform on the B-factor distribution, etc), but
they failed on the non-Gaussian distributions.
The algorithm that is fully objective (not written to explicitly detect Murthy files)
goes as follows:
- Determine for every canonical (i.e one of the normal 20 ones) amino acid that has all
atoms seemingly intact the average B-factor over all atoms.
- Call the number of B-factors so obtained N.
- If N is too small (I call N<75 arbitrarily too small) then we should not use this algorithm.
- For each residue I (I < N-19) determine the straight lines through the B factors of the
residues from I to I+J where J runs from 10 - 20. This gives 11 straight lines that
can be described by Bc=B0+βBi in which i runs over
the 11-21 B-factors. β is the slope of the lines. For each of the residues I
determine the smallest β. This minimal β we call γ.
- Determine the average and standard deviation of all γs per PDB file. Call the standard
deviation on the average the M-factor.
- Compare the M-factor of Murthy files with similar PDB files (similar defined as solved at similar
resolution, data collected at a similar temperature, using the same refinement software, and
in one case also using the refinement software the same way.