Material linked from bioinformatics course


The crystallographic residual, the so-called R-factor, says something about the correlation of the structure model with the experimental data and therefore seems more indicative than the X-ray resolution alone. Unfortunately, acceptable and even seemingly nice R-factors can be attained by adding more parameters to the structure models, effectively over-fitting/over-refining the model (Branden and Jones 1990). This problem was solved by the introduction of the free R-factor (Brünger 1992), which is much more robust against over-fitting because the value is calculated only with the fraction of the X-ray data that was not used to build the structure model. Therefore, R-free can be seen as a description of how well the structure model predicts an independent measurement.

So, with the X-ray resolution, the (free) R-factor, and the real-space R-factor one can select a proper template from the (PSI-)BLAST results. That is, one can select a template that corresponds well with the X-ray experiment. A more in depth analysis of the template structure is needed to see whether it also corresponds with our current knowledge of protein structures. Structure validation scores like Ramachandran Z-score (Hooft et al.  1997), the fraction of Ramachandran plot outliers (Laskowski et al.  1993), side chain rotamer normality scores (Hooft et al.  1996a), residue packing scores (Vriend and Sander 1993), hydrogen bond network quality (Hooft et al.  1996b), and many others are used to get insight in the geometric quality of the template structure. Most of these validation scores, both global and local scores, can be obtained via the PDB and the linked databanks.

Another possible step in the template selection is the optimization of the template before the actual modelling. We have recently shown that validation scores such as the Ramachandran Z-score and the number of atomic clashes (bumps) can be improved by a fully automated re- refinement of the PDB entry with its original experimental data (PDB_REDO; Joosten et al.  2009). In addition, the crystallographic R-factor, or rather the free R-factor is also improved by this re-refinement. This optimization is particularly useful for templates that will be used for drug docking studies because their success often depends critically on the quality of the atomic model. The benefit of re-refinement is tightly correlated with sequence identity between the template and the model sequence. That is, any improvement of the atomic coordinates of a residue is lost when this residue (or just its side chain) has to be rebuilt. Fortunately, even with low sequence identity, there may be regions of the template that are not changed in the modelling process and thus can be improved by re-refinement.

Of course, when sufficient CPU time is available to the modeller, it may be beneficial to use a number of (re-refined) PDB entries as templates, instead of a single one.