This writing is to meant to add some reflection on the structures 'solved' by H.M.K. Murthy, and that subsequently published and made publicly accessible by deposition with the PDB. The university of Alabama (UAB), where Murthy was last working as a crystallographer, has analyzed all structures H.M.K. Murthy 'solved' while working at the UAB and decided to retract all these structures from the PDB and also to retract the associated articles. |
We asked the question if this all could have been prevented if more rigorous use of structure validation software upon protein structure deposition in the wwPDB (as is now being discussed by the VTF committee) had been in place already in the years past. This web site provides a capita selecta tour through the WHAT_CHECK validation reports for the 'structures' deposited by H.M.K. Murthy. For this purpose we reinstalled the autumn 1999 version of WHAT_CHECK to indicate that referees of the Murthy articles would have had a fair chance of spotting the problems if the WHAT_CHECK report would have been available to them at that time.
The UAB has provided a very detailed statement explaining why they retract H.M.K. Murthy's 'structures' and articles. We have little to add to this thorough study, and we will use this study for a guided tour through the 'structures'. The lists of additional errors detected by WHAT_CHECK are certainly not exhaustive; in the article that we submitted about Murthy's 'structures' we discuss a few types of errors rigorously. If you want the highest level of detail, you will have to read the WHAT_CHECK reports...
1BEF is most likely a unique structure that is globally superimposed on structure 1JXP. 1BEF and 1JXP have the same general structure as well as the same crystallographic orientation and translation relative to the origin. However, the crystal forms of 1BEF and 1JXP are distinctly different, with unrelated space group and unit cells : 1BEF crystallized P21 (a = 48.4 Ångström, b = 62.4 Ångström, c = 39.6 Ångström, β = 96.7o) while 1JXP crystallized in P6322 (a = b = 96.96 Ångström, c = 167.1 Ångström). No other example of such superposition can be found in the PDB. Furthermore, 1BEF appears to be a physically improbable structure, with 1) statistically anomalous geometry, (The ramachandran score deviates almost 6 standard deviations from the normal value, and more than 4 SDs from what is common at the resolution of 1BEF). 2) unrealistic electron density and thermal factors (The B-factor plot of 1BEF is flat, see the general comment about Murthy's refinement tactics ), 3) anomalous and unreasonable packing of the central core (the WHAT IF packing score of -4.9 is very worrysome, but worse, WHAT IF's evaluation of the 1BEF surface labels this protein as a membrane protein), and 4) an unacceptable level of inter-atomic clashing (the WHAT_CHECK report shows several classes of problems. There is, for example, a long list of bumps. Many PDB files have bumps, but normally there are many bumps with water molecules, in 1BEF all bumps are between protein atoms). No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1BEF, or demonstrate that it was an experimentally determined structure, were available for examination.
The B factor distribution, sigma values, geometry and crystal packing for 1CMW are essentially normal. (I tend to disagree, the geometry of this file is awful; Murthy has problems modelling prolines (see the general notes). The prolines in this file are highly suspicious. This file has a reasonable number of not overly bad bumps, but again, no bumps including water molecules, which is suspicious). However, the exact numerical relationship of the B factors to those of the 1TAQ structure, which was used as a starting model, shows that B factors were not refined as described in the publication. Specifically, the B factors are identical to an accuracy of 0.01 by an exact numerical difference of 16.00. This could have occurred only if the 1TAQ B factors were copied into 1CMW after subtracting 16.00 and left without refinement. The Fourier maps computed with the structure factors reveal a striking absence of densities corresponding to water molecules, in spite of almost perfect agreement of the density with the submitted coordinates. Taken together, these abnormalities strongly suggest that 1CMW, in large part, corresponds directly to the 1TAQ starting model and the structure factors may have been calculated directly from this model. Therefore, ICMW (this seems a typo in the UAB statement; they undobtedly mean 1CMW) does not correctly represent original data. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1CMW, or demonstrate that it was an experimentally determined structure, is available for examination.
Per Dr. Murthy, 1DF9 was replaced with a corrected file, 2QID, because of bad
contacts that had been identified by Dr. Piet Gros et
al. (see Janssen et al., 2007) The xyz coordinates
and thermal factors of the proteins and inhibitor molecules of 1DF9 and 2QID are
exactly the same. The R-factors (0.199), free R-factors (0.243), deviations of
bonds from ideality (0.018
1G40 was originally deposited in October 2000 with space group symmetry P212121.
The reported unit cell had the following parameters: a = 65.3 Ångström, b = 115.4 Ångström and
c = 121.9 Ångström, which were the values reported in the publication. This corresponds
to a theoretical value of 45,170 possible reflections for this unit cell at a resolution
of 2.2 Ångström, and it agrees with the reported number of reflections in the data
set (39,322 reflections; 87% complete). In February 2007, the unit cell was inexplicably
changed to: a = 65.3
WHAT_CHECK additionally detected
1G40
1G44 has a distribution of B-factor values that bears no relationship to solvent accessibility or crystal contacts . (WHAT_CHECK warns for this, prominently). Also, this structure has very low R-factors in spite of unrealistic intermolecular and intramolecular contacts and crystal packing, for example, there are 36 chemically impossible close contacts. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1G44, or demonstrate that this was an experimentally determined structures, were available for examination.
WHAT_CHECK additionally detected
1L6L contains 2036 residues, 1011 waters and 64 BOG molecules. This is not the structure reported in Table 3, which supposedly contained 2366 residues, 1522 waters, and 67 BOG molecules. One can explain the discrepancy of waters if one assumes that the ~last water~ , which is number 1522 was carelessly reported rather than the total number of waters in the PDB file. However, the remaining discrepancies cannot be reconciled. Overall the 1L6L entry contains 21,138 atoms, compared to 21,619 reported in the paper. Furthermore, the lattice packing of 1L6L has considerable gaps that are also hard to reconcile with the 2.3 Ångström resolution diffraction limit for this structure, and the 1L6L crystals exhibit a solvent content of 78% (Vm=5.6). According to the Matthews Probability Calculator Server, the probability that this arrangement exists and diffracts to 2.3 Ångström resolution is 0.28% (see Kanardjieff and Rupp, Protein Science 2003 12:1865-1871). (We calculate a different Vm...). Finally, even if one assumes that 2OU1 and 1L6L represent extremely poor refinements, one cannot reconcile the fact that the PDB entries do not match the publication record. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1L6L or demonstrate that these was an experimentally determined structure, were available for examination.
WHAT_CHECK additionally detected
2OU1 is missing 269 atoms that should be in the file based on Table 3 in the published paper. In addition, 2OU1 has 863 residues and 558 waters not 869 residues and 761 waters molecules as reported in Table 3. (Almost 50 of the waters make no H-bonds at all and thus are highly improbable; in a real structure we would believe these waters to have been modelled in a Fourier-ripple or some other type of density noise). Thus, 2OU1 does not directly correspond to the coordinates reported in the Biochemistry paper. In addition, the packing of 2OU1 is highly asymmetric, which is not reflected in the B-factors of the 12 different chains. (WHAT_CHECK warns for unrealistically low φ,ψ differences between NCS related chains) . In addition, the structure registers a 0% molprobity clash score (WHAT_CHECK warns for hundreds of clashes (which it calls bumps); the worst ones either are clashes between symmetry related residues or clashes with water molecules). These odd features of 2OU1 are not consistent with reported R/Rfree values of 18.7/21.9 at 2 Ångström resolution with data in the highest shell that exhibits an average I/~I of 10.4. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2OU1 or demonstrate that these was an experimentally determined structure, were available for examination.
WHAT_CHECK additionally detected
This structure exhibits poor geometry, improbable B factors, large solvent gaps in the crystal lattice, and
an extremely high solvent content (~75%), features which are not consistent with a 2.1
WHAT_CHECK additionally detected
1Y8E exhibits a number of unlikely or improbable features, including poor geometry, improbable B factors, large solvent gaps in the crystal lattice (at least 8 Ångström), and an extremely high solvent content (~77%; Vm 5.07). (WHAT_CHECK even finds Vm=5.47). These features are not consistent with a 2.2 Ångström structure. To justify the structure of 1Y8E, PDB entries 1OCY and 1H6W were cited as proving that crystals with lattice gaps can diffract to high resolution. However, these examples differ from 1Y8E in several ways. 1OCY and 1H6W are viral fiber proteins with large disordered segments, which are part of one protein. In the case of 1Y8E, it is hypothesized that disordered suramin molecules (molecular weight (MW) ~1500) 'connect' the lattice. Since there is nothing to hold these molecules in place, such as in a covalent peptide chain, this explanation cannot be accepted. In addition, solvent content and Vm values for 1OCY, 1H6W, 1EZX, which all contain disordered segments, were compared to 1Y8E using Bernhard Rupp's Matthews probability server. This analysis revealed solvent content, Vm values, and Probabilities(P) for 1OCY (64%,Vm=3.44,P=0.38), 1H6W (57%,Vm=2.87,P=0.98), 1EZX (50%, Vm=2.5, P=1.0), and 1Y8E (76%, Vm=5.07, P=0.008 assume MW=30,000). This one parameter calls into question the validity of 1Y8E. In addition, electron density, calculated using the deposited structure factors, is in excellent agreement with implausible or impossible structural features. Additional analysis of the structure factors reveals unreasonable sigma values, unreasonable inclusion of all low resolution terms out to 150 Ångström resolution, and an unknown origin of different Rfree flags in original and updated files. The revised data submitted for 1Y8E is different and no longer contains the extreme low resolution terms. However, the conclusion remains that neither the original or revised data are the unmodified structure factors for 1Y8E as expected from the publication. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1Y8E or demonstrate that these was an experimentally determined structure, were available for examination.
WHAT_CHECK additionally detected
2A01 exhibits a number of abnormalities that suggest the structure was not generated from actual diffraction data. These abnormalities include: almost perfect correspondence of the electron density to physically impossible features, viz. close contacts and poor geometry; abnormal B factor distribution that does not vary along the chain to reflect solvent exposure and packing, even in regions that are extremely exposed to the solvent; crystal packing that is characterized by very few intermolecular contacts and very high solvent content that is inconsistent with the resolution and B factor distribution of the published data; anomalies in the structure factor file, in particular the unreasonable sigma values that are indicative of data that have been computationally generated or manipulated. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2A01, or demonstrate that this was an experimentally determined structure, were available for examination by the committee.
WHAT_CHECK additionally detected
The coordinates for 2HR0 do not form a connected network of molecules in the crystal lattice. The diffraction data do not show the features that should arise from the presence of bulk solvent, whereas the molecular arrangement indicates that large regions are not occupied by protein molecules. The values for ksol and Bsol bulk solvent parameters in 2HR0 are far outside the normally accepted ranges for these parameters. It is also noteworthy that nowhere in the Methods section of his Nature paper is there any mention of non-standard bulk solvent corrections to the Fobs values.
The B-factors of the model do not vary significantly throughout the molecule, even though long segments of the chain are almost completely exposed to solvent (Janssen et al., Nature 448:E1-E2, 2007; note Figure 2 of their communication). The Rfree and R distributions are exceptionally low at low resolution, and the difference between Rfree and R is unusually small for a structure refined at 2.3 Ångström resolution with an amplitude-based target function. (Janssen et al Nature 9 Aug 2007; note Figure 1b of their communication). Dr. Murthy provided two responses to this allegation: (1) the Rfree and R distribution would be expected if X-ray terms in a restrained refinement were weighted more heavily than usual, and (2) overweighting the X-ray terms would reduce the R-value at the cost of some geometric distortion. Overweighting the X-ray terms can reduce the R-value at the cost of geometric distortion, however, the reported errors in the bond lengths, bond angles and torsion angles all suggest that the geometry was sufficiently restrained during refinement. Furthermore, many of the unrealistic contacts in this structure are far worse than simple geometric distortion.
There are 30 chemically impossible, close contacts shorter than 2.2
WHAT_CHECK additionally detected