Page linked from Bioinformatics course

Material linked from bioinformatics course

The following text is borrowed from the website about the Murthy fraud.

The visual observation seems to suggest that the Murthy B-factors are selected randomly selected from a series of numbers that are not normally distributed between some reasonable upper and lower limit. These limits typically are about 20 and 40.

Visual inspection clearly shows that nine Murthy files have fabricated B-factors, one or two are likely to have borrowed B-factors, while another two or three files perhaps have been solved experimentally (but not very skillfully from a crystallographic point of view).

So I need an algorithm that detects the B-factor selection method used by Murthy. I have tried a few things (Fourier transform on the B-factor distribution, etc), but they failed on the non-Gaussian distributions.

The algorithm that is fully objective (not written to explicitly detect Murthy files) goes as follows:

Determine for every canonical (i.e one of the normal 20 ones) amino acid that has all atoms seemingly intact the average B-factor over all atoms.
Call the number of B-factors so obtained N.
If N is too small (I call N<75 arbitrarily too small) then we should not use this algorithm.
For each residue I (I < N-19) determine the straight lines through the B factors of the residues from I to I+J where J runs from 10 - 20. This gives 11 straight lines that can be described by B_c=B₀+βB_i in which i runs over the 11-21 B-factors. β is the slope of the lines. For each of the residues I determine the smallest β. This minimal β we call γ.
Determine the average and standard deviation of all γs per PDB file. Call the standard deviation on the average the M-factor.
Compare the M-factor of Murthy files with similar PDB files (similar defined as solved at similar resolution, data collected at a similar temperature, using the same refinement software, and in one case also using the refinement software the same way.