Rejection criteria

Lists know two types of PDB-file rejection criteria: 1) very general criteria related to PDB file suitability for bioinformatics applications and 2) Lists-specific criteria like cysteine bridges in a file without cysteines, ion contacts in a file without ions, etc. Rejection for any criterion leads to the production of a so-called WHY_NOT file.

Lists and WHY_NOT

The main purpose of the Lists section of the PDB facilities is to provide rapid access to computationally derived protein structure data to protein structure bioinformaticians. A large number of Lists-entries (like entries that do not hold any protein...) would make no sense. Further, many files cannot be produced at all. The WHY_NOT system explains in all these cases why Lists-entries don′t exist. Some very general WHY_NOT information is available in the WHY_NOT system. Additionally, if files cannot be made, like ligand contacts for a file without ligands, secondary structure for a file that holds DNA only, cis prolines for a file without prolines, etc., there will be a Lists-entry called xxxx.yyy.whynot, in which xxxx is the PDB identifier and yyy is the three-letter code of the List. So, the absense of ions in 1crn leads to a WHY_NOT file with the filename 1crn.ion.whynot:

COMMENT: No metal ions found
WHATIF_PDB_ion,1crn

PDB file acceptance/rejection criteria

Files can be rejected for a large number of reasons, some of which are trivial, like the PDB file for Vancomycin that holds just a funny molecule, but no protein or nucleic acid. A few hundred files have been manually rejected over the years because of a variety of reasons like, the structure seems fully incorrect, the structure was determined under extreme conditions, the file contains just a small peptide, the structure contains so many ligands or ions that the protein fold is most likely non-physiological, the structure has been solved using some experimental refinement technique, or the structure was solved at a very low resolution. A few files are rejected because they contain problems WHAT IF cannot cope with. The list of manually rejected PDB entries is available.

Even when a file passes these filters, it is not guaranteed that it makes it through. Files can be rejected on the fly for many reasons:

Completeness

Each file that is not rejected because of manual rejection (as listed in this file), is indicated by a *.whynot file. So for each PDB file that enters the Lists software either a results file comes out, or a *.whynot file.

We maintain an internal system that checks the whole Lists directory tree for completeness. Honesty dictates that we should tell you that if somewhere a handful of files is missing, we will not address that problem till the next quarterly complete overhaul.

In March 2018 the following textx were possible for *.WHYNOT files:

If at some time, we decide to add a few more of these WHYNOT reasons, it is likely that we forget to update this part of the documentation... But you get the drift...