Validation: Intro

Where experiments are done, experimental errors are being made.

This rather trivial open door statement looks very stupid, but it is, unfortunately, our experience that many people believe this not to be the case when it comes to protein structures that have been deposited in the PDB. Especially crystallographers often ′simply assume things′, and therefore the surprising finding that at least 10% of all metal ions in the PDB are either another ion than deposited, or a water, or simply not at all there, or the equally surprising finding of tryptophans with their five and six membered rings under a 90o  angle, both do not come as a surprise to the few people who are involved in validating and improving PDB files.

Under Pointers, you find a pointer to the PDBREPORT. This is a database that lists about thirty million errors and 4σ (or larger) deviations observed in nearly one hundred thousand PDB files. This seems rather dramatic. However, most errors and anomalies are things like 5σ bond length deviations, or pairs of atoms that are 0.3 Ångström  closer to each other than quantum chemistry allows. Nevertheless, the PDBREPORT database probably lists a few hundred thousands errors that are worth knowing about when you work with protein structures in fields such as human genetics, protein engineering, or drug design.

Unfortunately, the PDB contains a few hundred entries that are absolutely wrong, and a couple dozen entries that are meaningless. I have randomly collected a series of crazy things in the PDBAD collection. The problems listed there have all been reported to the PDB staf, and they do what they can, but they are not allowed to remove crap from their database if the depositor does not allow them to do that, and they are not allowed to make significant modifications to PDB entries. We are not hindered by such restrictions and therefore the PDB_REDO holds now about 25% fewer errors than the original PDB.