Introduction

This writing is to meant to add some reflection on the structures
'solved' by H.M.K. Murthy, and that subsequently published and made publicly accessible
by deposition with the PDB. The university of Alabama (UAB), where Murthy
was last working as a crystallographer, has analyzed all structures H.M.K. Murthy
'solved' while working at the UAB and decided to retract all these structures from the PDB and also
to retract the associated articles.

We asked the question if this all could have been prevented if more rigorous use of structure validation software upon protein structure deposition in the wwPDB (as is now being discussed by the VTF committee) had been in place already in the years past. This web site provides a capita selecta tour through the WHAT_CHECK validation reports for the 'structures' deposited by H.M.K. Murthy. For this purpose we reinstalled the autumn 1999 version of WHAT_CHECK to indicate that referees of the Murthy articles would have had a fair chance of spotting the problems if the WHAT_CHECK report would have been available to them at that time.

The UAB has provided a very detailed statement explaining why they retract H.M.K. Murthy's 'structures' and articles. We have little to add to this thorough study, and we will use this study for a guided tour through the 'structures'. The lists of additional errors detected by WHAT_CHECK are certainly not exhaustive; in the article that we submitted about Murthy's 'structures' we discuss a few types of errors rigorously. If you want the highest level of detail, you will have to read the WHAT_CHECK reports...

1BEF

1BEF is most likely a unique structure that is globally superimposed on structure 1JXP. 1BEF and 1JXP have the same general structure as well as the same crystallographic orientation and translation relative to the origin. However, the crystal forms of 1BEF and 1JXP are distinctly different, with unrelated space group and unit cells : 1BEF crystallized P21 (a = 48.4 Ångström, b = 62.4 Ångström, c = 39.6 Ångström, β = 96.7o) while 1JXP crystallized in P6322 (a = b = 96.96 Ångström, c = 167.1 Ångström). No other example of such superposition can be found in the PDB. Furthermore, 1BEF appears to be a physically improbable structure, with 1) statistically anomalous geometry, (The ramachandran score deviates almost 6 standard deviations from the normal value, and more than 4 SDs from what is common at the resolution of 1BEF). 2) unrealistic electron density and thermal factors (The B-factor plot of 1BEF is flat, see the general comment about Murthy's refinement tactics ), 3) anomalous and unreasonable packing of the central core (the WHAT IF packing score of -4.9 is very worrysome, but worse, WHAT IF's evaluation of the 1BEF surface labels this protein as a membrane protein), and 4) an unacceptable level of inter-atomic clashing (the WHAT_CHECK report shows several classes of problems. There is, for example, a long list of bumps. Many PDB files have bumps, but normally there are many bumps with water molecules, in 1BEF all bumps are between protein atoms). No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1BEF, or demonstrate that it was an experimentally determined structure, were available for examination.

1CMW

The B factor distribution, sigma values, geometry and crystal packing for 1CMW are essentially normal. (I tend to disagree, the geometry of this file is awful; Murthy has problems modelling prolines (see the general notes). The prolines in this file are highly suspicious. This file has a reasonable number of not overly bad bumps, but again, no bumps including water molecules, which is suspicious). However, the exact numerical relationship of the B factors to those of the 1TAQ structure, which was used as a starting model, shows that B factors were not refined as described in the publication. Specifically, the B factors are identical to an accuracy of 0.01 by an exact numerical difference of 16.00. This could have occurred only if the 1TAQ B factors were copied into 1CMW after subtracting 16.00 and left without refinement. The Fourier maps computed with the structure factors reveal a striking absence of densities corresponding to water molecules, in spite of almost perfect agreement of the density with the submitted coordinates. Taken together, these abnormalities strongly suggest that 1CMW, in large part, corresponds directly to the 1TAQ starting model and the structure factors may have been calculated directly from this model. Therefore, ICMW (this seems a typo in the UAB statement; they undobtedly mean 1CMW) does not correctly represent original data. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1CMW, or demonstrate that it was an experimentally determined structure, is available for examination.

1DF9 and 2QID

Per Dr. Murthy, 1DF9 was replaced with a corrected file, 2QID, because of bad contacts that had been identified by Dr. Piet Gros et al. (see Janssen et al., 2007) The xyz coordinates and thermal factors of the proteins and inhibitor molecules of 1DF9 and 2QID are exactly the same. The R-factors (0.199), free R-factors (0.243), deviations of bonds from ideality (0.018) and number of reflections (41212) are exactly the same for 1DF9 and 2QID. However 1DF9 contains 331 waters while 2QID contains 176 water molecules. The xyz coordinates and thermal factors of the common water molecules are identical. 1DF9 has 188 bad water-protein contacts (< 2.0 ) and 19 extremely bad water-protein contacts (< 1.0 ). (WHAT IF finds 536 bad water contacts in 1DF9. In several cases the waters essentially fall on top of protein atoms. In 2QID the drama is reduced, but there are still 134 bumps left of which this time only one with a water, and that is suspicious in the opposite direction) 2QID has no bad water-protein contacts (< 2.0 ). It appears that 2QID was produced by the removal of 155 obviously incorrect water molecules from 1DF9. The differences in the water molecules of 1DF9 compared to 2QID have no apparent experimental explanation. Furthermore, it is not possible that two different models, which differ by over 100 water molecules, show exactly the same fit of model to data, as indicated by the R-factor and free R-factors of 1DF9 and 2QID. (Both 1DF9 and 2QID are labeled by WHAT IF's surface polarity check as membrane proteins. Both also have very bad Ramachandran plots and very poor packing.) No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structures of 1DF9/2QID, or demonstrate that these were experimentally determined structures, is available for examination.

1G40

1G40 was originally deposited in October 2000 with space group symmetry P212121. The reported unit cell had the following parameters: a = 65.3 Ångström, b = 115.4 Ångström  and c = 121.9 Ångström, which were the values reported in the publication. This corresponds to a theoretical value of 45,170 possible reflections for this unit cell at a resolution of 2.2 Ångström, and it agrees with the reported number of reflections in the data set (39,322 reflections; 87% complete). In February 2007, the unit cell was inexplicably changed to: a = 65.3 , b = 104.40 , c = 141.90 . This implies that the diffraction pattern, even if the symmetry were the same, would be very different. Furthermore, the number of reflections in the data set would have to increase. It is simply not believable that such a discrepancy would be the consequence of an honest mistake, i.e., typographical errors on the PDB submission, as claimed by the PI. Also, the distribution of B-factor values bears no relationship to solvent accessibility or crystal contacts, and 1G40 does not contain any water molecules in spite of the good data resolution limit (2.2 Ångström). Further examination of the structure reveals absurd crystal packing in one particular area (residues Leu36, Pro37, Gly38 and Tyr39), which accounts for 16 of the 19 of the bad contacts in the structure. (WHAT_CHECK counts hundreds of bumps). This region cannot possibly be correct. It is noteworthy that the protein conformation in this region of the structure is the same in two other structures published by Murthy et al. (1Y8E and 1RID). The 1Y8E and 1RID structures were refined at 2.2 and 2.1 Ångström, respectively. However, due to different crystallographic symmetry, these regions of the molecule have no bad contacts in 1Y8E or 1RID. The B-factors for Pro37 and Tyr39 are very close to the average overall B-factor for the structure. It is extremely unlikely that a model with such incorrect packing could be refined to an R-factor=19.8% and Rfree=23.4% at 2.2 Ångström  resolution. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure 1G40 or demonstrate that this was an experimentally determined structures were available for examination.

WHAT_CHECK additionally detected

1G44

1G44 has a distribution of B-factor values that bears no relationship to solvent accessibility or crystal contacts . (WHAT_CHECK warns for this, prominently). Also, this structure has very low R-factors in spite of unrealistic intermolecular and intramolecular contacts and crystal packing, for example, there are 36 chemically impossible close contacts. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1G44, or demonstrate that this was an experimentally determined structures, were available for examination.

WHAT_CHECK additionally detected

1L6L

1L6L contains 2036 residues, 1011 waters and 64 BOG molecules. This is not the structure reported in Table 3, which supposedly contained 2366 residues, 1522 waters, and 67 BOG molecules. One can explain the discrepancy of waters if one assumes that the ~last water~ , which is number 1522 was carelessly reported rather than the total number of waters in the PDB file. However, the remaining discrepancies cannot be reconciled. Overall the 1L6L entry contains 21,138 atoms, compared to 21,619 reported in the paper. Furthermore, the lattice packing of 1L6L has considerable gaps that are also hard to reconcile with the 2.3 Ångström  resolution diffraction limit for this structure, and the 1L6L crystals exhibit a solvent content of 78% (Vm=5.6). According to the Matthews Probability Calculator Server, the probability that this arrangement exists and diffracts to 2.3 Ångström  resolution is 0.28% (see Kanardjieff and Rupp, Protein Science 2003 12:1865-1871). (We calculate a different Vm...). Finally, even if one assumes that 2OU1 and 1L6L represent extremely poor refinements, one cannot reconcile the fact that the PDB entries do not match the publication record. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1L6L or demonstrate that these was an experimentally determined structure, were available for examination.

WHAT_CHECK additionally detected

2OU1

2OU1 is missing 269 atoms that should be in the file based on Table 3 in the published paper. In addition, 2OU1 has 863 residues and 558 waters not 869 residues and 761 waters molecules as reported in Table 3. (Almost 50 of the waters make no H-bonds at all and thus are highly improbable; in a real structure we would believe these waters to have been modelled in a Fourier-ripple or some other type of density noise). Thus, 2OU1 does not directly correspond to the coordinates reported in the Biochemistry paper. In addition, the packing of 2OU1 is highly asymmetric, which is not reflected in the B-factors of the 12 different chains. (WHAT_CHECK warns for unrealistically low φ,ψ differences between NCS related chains) . In addition, the structure registers a 0% molprobity clash score (WHAT_CHECK warns for hundreds of clashes (which it calls bumps); the worst ones either are clashes between symmetry related residues or clashes with water molecules). These odd features of 2OU1 are not consistent with reported R/Rfree values of 18.7/21.9 at 2 Ångström  resolution with data in the highest shell that exhibits an average I/~I of 10.4. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2OU1 or demonstrate that these was an experimentally determined structure, were available for examination.

WHAT_CHECK additionally detected

1RID

This structure exhibits poor geometry, improbable B factors, large solvent gaps in the crystal lattice, and an extremely high solvent content (~75%), features which are not consistent with a 2.1 structure. 1RID exhibits extremely poor stereochemistry, but excellent refinement statistics. Analysis of 1RID's updated coordinates using the molprobity website places the overall quality of the structure in the 0th percentile. This is because the r.m.s. deviations between ideal (Eng-Huber from CNS) and observed parameters are extremely poor and not consistent with the values reported in the published paper. Electron density, calculated using the deposited structure factors, is in excellent agreement with implausible or impossible structural features. Also, 1RID exhibits poor to impossible crystal packing and the lattice shows planes of molecules with no reasonable crystal packing interactions in the 'a' direction of the 1RID lattice. For example, one of the closest contacts is A175 Glu to A54 Thr, but these are 4.6 Ångström  apart. Furthermore, analysis of the structure factors reveals unreasonable sigma values, unreasonable inclusion of all low resolution terms out to 77 resolution, and an unknown origin of different Rfree flags in original and updated files. The revised data submitted for 1RID is different and no longer contains the low resolution terms. However, neither the original nor revised data are the unmodified structure factors for 1RID as expected from the publication. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structures of 1RID, or demonstrate that this was an experimentally determined structure, are available for examination.

WHAT_CHECK additionally detected

1Y8E

1Y8E exhibits a number of unlikely or improbable features, including poor geometry, improbable B factors, large solvent gaps in the crystal lattice (at least 8 Ångström), and an extremely high solvent content (~77%; Vm 5.07). (WHAT_CHECK even finds Vm=5.47). These features are not consistent with a 2.2 Ångström  structure. To justify the structure of 1Y8E, PDB entries 1OCY and 1H6W were cited as proving that crystals with lattice gaps can diffract to high resolution. However, these examples differ from 1Y8E in several ways. 1OCY and 1H6W are viral fiber proteins with large disordered segments, which are part of one protein. In the case of 1Y8E, it is hypothesized that disordered suramin molecules (molecular weight (MW) ~1500) 'connect' the lattice. Since there is nothing to hold these molecules in place, such as in a covalent peptide chain, this explanation cannot be accepted. In addition, solvent content and Vm values for 1OCY, 1H6W, 1EZX, which all contain disordered segments, were compared to 1Y8E using Bernhard Rupp's Matthews probability server. This analysis revealed solvent content, Vm values, and Probabilities(P) for 1OCY (64%,Vm=3.44,P=0.38), 1H6W (57%,Vm=2.87,P=0.98), 1EZX (50%, Vm=2.5, P=1.0), and 1Y8E (76%, Vm=5.07, P=0.008 assume MW=30,000). This one parameter calls into question the validity of 1Y8E. In addition, electron density, calculated using the deposited structure factors, is in excellent agreement with implausible or impossible structural features. Additional analysis of the structure factors reveals unreasonable sigma values, unreasonable inclusion of all low resolution terms out to 150 Ångström  resolution, and an unknown origin of different Rfree flags in original and updated files. The revised data submitted for 1Y8E is different and no longer contains the extreme low resolution terms. However, the conclusion remains that neither the original or revised data are the unmodified structure factors for 1Y8E as expected from the publication. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 1Y8E or demonstrate that these was an experimentally determined structure, were available for examination.

WHAT_CHECK additionally detected

2A01

2A01 exhibits a number of abnormalities that suggest the structure was not generated from actual diffraction data. These abnormalities include: almost perfect correspondence of the electron density to physically impossible features, viz. close contacts and poor geometry; abnormal B factor distribution that does not vary along the chain to reflect solvent exposure and packing, even in regions that are extremely exposed to the solvent; crystal packing that is characterized by very few intermolecular contacts and very high solvent content that is inconsistent with the resolution and B factor distribution of the published data; anomalies in the structure factor file, in particular the unreasonable sigma values that are indicative of data that have been computationally generated or manipulated. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2A01, or demonstrate that this was an experimentally determined structure, were available for examination by the committee.

WHAT_CHECK additionally detected

2HR0

The coordinates for 2HR0 do not form a connected network of molecules in the crystal lattice. The diffraction data do not show the features that should arise from the presence of bulk solvent, whereas the molecular arrangement indicates that large regions are not occupied by protein molecules. The values for ksol and Bsol bulk solvent parameters in 2HR0 are far outside the normally accepted ranges for these parameters. It is also noteworthy that nowhere in the Methods section of his Nature paper is there any mention of non-standard bulk solvent corrections to the Fobs values.

The B-factors of the model do not vary significantly throughout the molecule, even though long segments of the chain are almost completely exposed to solvent (Janssen et al., Nature 448:E1-E2, 2007; note Figure 2 of their communication). The Rfree and R distributions are exceptionally low at low resolution, and the difference between Rfree and R is unusually small for a structure refined at 2.3 Ångström  resolution with an amplitude-based target function. (Janssen et al Nature 9 Aug 2007; note Figure 1b of their communication). Dr. Murthy provided two responses to this allegation: (1) the Rfree and R distribution would be expected if X-ray terms in a restrained refinement were weighted more heavily than usual, and (2) overweighting the X-ray terms would reduce the R-value at the cost of some geometric distortion. Overweighting the X-ray terms can reduce the R-value at the cost of geometric distortion, however, the reported errors in the bond lengths, bond angles and torsion angles all suggest that the geometry was sufficiently restrained during refinement. Furthermore, many of the unrealistic contacts in this structure are far worse than simple geometric distortion.

There are 30 chemically impossible, close contacts shorter than 2.2 . Despite the large number of physically impossible clashes, the deposited structure factors show remarkably good correspondence in these regions. Inspection of both the Aσ-weighted 2Fo-Fc and the Fo-Fc electron density maps revealed very well-defined electron densities in every region of bad contacts, with no negative peaks present in the Fo-Fc difference electron density map and with B-factors no higher than elsewhere in the structure. This strongly suggests that the deposited structure factors have been calculated from the structure and do not reflect experimental data. Finally, the range of values for σF is orders of magnitude too large, larger even than the range of structure factor amplitudes. Regarding this point, experimental ('real') σF values are derived from estimates of measurement uncertainties. For this reason, their values are limited and their range is a small fraction of the range of Fo. However, the range of Fo for 2HR0 is 0 < Fo < 14,215, while the range of σ is 0 < σ < 9948. This range for σ is completely unrealistic. No raw crystallographic data, data reduction output, or any other experimental records that would support the correctness of the structure of 2HR0, or demonstrate that this was an experimentally determined structure, were available for examination.

WHAT_CHECK additionally detected