The CMBI provides a series of facilities for protein structure bioinformatics. This page provides a brief summary of these facilities an gives pointers to the pages that hold the information (and often also the data, the software, etc) related to these facilities.
Most of these facilities have been described in:
A series of PDB related databases for everyday needs.
Wouter G Touw, Coos Baakman, Jon Black, Tim AH te Beek, E Krieger, Robbie P Joosten, Gert Vriend.
Nucleic Acids Research 2015 January; 43(Database issue): D364-D368.
The WHAT IF software package is a powerful, albeit hard-to-use, free tool for a wide variety of protein structure calculations and visualization. The more than 1,000,000 lines of code that make up WHAT IF allow you to predict mutations, find errors in structures, superpose structures, determine cavities, calculate all normal things such as H-bonds, accessibilities, salt-bridges, surfaces, interactions, etc. WHAT IF includes a series of proprietary databases that help you answer even complicated questions. The 500 page WHAT IF manual lists almost 2000 options that are systematically organized in topic-related chapters. WHAT IF provides interfaces to popular other (sometimes hard-to-use) packages such as GROMACS, DSSP, GRID, PLIM, Jmol, CONCOORD.
The WHAT_CHECK software is a (also free) subset of the WHAT IF software package. WHAT_CHECK can be used to determine the ′quality′ of macromolecular structures. Input of the software is a file in PDB format, and the output is a comprehensive report with hundreds of notes, warnings, and (unfortunately) also errors.
PDBREPORT Protein structures in the PDB were determined experimentally. They thus contain experimental errors. The PDBREPORT databank list millions and millions of errors and anomalies in WHAT_CHECK reports for each PDB entry..
B-factors in crystallographic structure models are important for many applications, especially in protein engineering, explaining the results of diagnostic genetic testing, and some fundamental aspects of protein structure bioinformatics (e.g. Molecular Dynamics).
Unfortunately, B-factors can be given in several different formats in PDB entries, and it will not be easy for non-crystallographers to figure out which are the 'real' (in crystallographic terms 'full isotropic') B-factors. The BDB databank contains PDB entries with their B-factors consistently presented in full isotropic format.
BDB access, download options, and help
Many PDB entries are either old, or created with old software. So, in principle, it should be possible to optimise them if the crystallographic reflection data are available. We have done this. More than two-thirds of all crystallographic PDB entries have been improved by re-refinement and rebuilding with more modern software. You can use the WHYNOT server to see if the entry you are interested in has also been redone. One warning though: The PDB_REDO database was produced 100% automatically; so there is a fair chance that here and there something has gone terribly wrong and remained unnoticed.
DSSP is probably the oldest software available in the entire field of protein structure bioinformatics. It determines the secondary structure given the three dimensional coordinates of a protein. So, it does not predict secondary structure. The DSSP software is freely available, and the CMBI makes sure that for every PDB entry and for every mmCIF entry there always, soon after the release of that entry, will exist a corresponding DSSP file.
There is much information to be gained from a structure, but even more from the combination of structure and a multiple sequence alignment (MSA). HSSP files provide MSAs for all PDB entries. They are a bit cumbersome to read, but good software exists to do that for you.
Reading PDB file headers is very cumbersome. That is mainly because the PDB was designed without ontology, without schema, and without a view for the future. Additionally, most information in the PDB file headers is entered by hand without validation by software. The PDBFINDER is our favorite way to deal with such problems. The PDBFINDER software has a large series of modules that each take care of one typical PDB file annotation problem. Additionally, the PDBFINDER contains data such as missing R-factors, etc., that we looked up by hand.
The PDB can hold hundreds of copies of nearly the same molecule. That is not bad, because the hundreds of lysozyme mutant structures, for example, teach us a lot about protein structures and stability. Bioinformaticians often want to be able to work with a representative dataset. It wouldn't be wise to train a computational method on a dataset that consists for more than ten percent of lysozyme structures because with that dataset the method doesn't tell anything about proteins, but it showswhat lysozyme looks like. Here PDB_SELECT comes in. PDB_SELECT holds representative datasets of sequence unique PDB files of a certain minimal R-factor and resolution.
Sometimes a PDB entry exists but the corresponding entry in the facilities describe above does not. There can be several reasons for this. For instance, DSSP and HSSP files can only be made for structures that contain protein and PDB_REDO only works for crystallographic PDB entries with deposited experimental data. Severe problems in PDB entries, either technical or scientific, may also stop us from providing the data for the facilities. Whatever the reason, you can use the WHY_NOT server to ask WHY a particular file does NOT exist.
Protein structure bioinformaticians often need some simple information like symmetry contacts, solvent accessibility, torsion angles, etc, for all files in the PDB. The Lists section of these facilities contains a series of directories that each contains one entry per PDB entry, and these entries contain those simple data types in easy-to-parse formats. These are not the results of highly sophisticated calculations but just simple lists like the accessible surface areas of residues, torsion angle tables, lists of contacts with ions, etc.
Information about the lists
YASARA scenes are available for some lists to provide a quick-and-easy visualization of the information in lists entries.
YASARA is the best macromolecular viewer, so we use it to look at structures and we have created YASARA scenes that contain highlighted structural features in the best visualisation style and viewpoint. These scenes can be inspected by everybody with access to YASARA. In addition to the commercial version of YASARA, that comes with molecular modeling and -simulation functionality, there is a freely available viewer called YASARA_View, and this free viewer can deal with all scenes that we prepared for you.
Strictly speaking, MRS is not specifically a protein structure bioinformatics tool. But as it is very useful for finding things in the protein structure world, we list it here anyway. MRS is a fast, and smart, data retrieval tool.