Homology modelling starts with an experimental structure and that is immediately the first problem. Experimental structures are based on experimental data, and thus prone to experimental errors. We work on structure validation. This means that we look for errors in structures. This has resulted in a database of experimental errors in PDB files, but more importantly, in a series of publicly accessible servers that can help you to remove some important errors from protein structures. For example, optimisation of the hydrogen bonding network is important for homology modelling and for drug docking.
Most of the structure work is done with the WHAT IF software. This software litterally can do thousands of things. A few of the WHAT IF options are available as servers .
The Yasara project is a spin-off (by Elmar Krieger) from our homology modelling project.
NMR and Xray are two very different techniques, and consequently, dealing with the results (and errors) of these techniques requires very different expertises. Our boss has worked for four years in NMR and for five years in Xray, and therefore we now need people to work on NMR structure validation, and X-ray structure validation.
This validation work has resulted in the database called PDBREPORT that lists all errors and anomalies in the PDB files.
We are not only figuring out what is wrong, we are also trying to do something about it. Chris Spronk started recalculating NMR structures that were deposited in the PDB. This generally caused a major improvement because we learned a lot over time about what protein should look like and we got considerably better software. Robbie Joosten later started doing the same with all PDB files that were solved with X-ray. The recalculated X-ray files are stored in a data base that we wanted to call Protein Data Better. That name abbreviates a bit clumsy, so in the end we called it PDB_REDO.
To understand a protein, we need three things:
Unfortunately, the third category, all know facts, is hard to get at. Data is either dispersen over thousands of database, or carefully hidden in the literature. As there is soooo much data in soooo many databases, we need software to automatically collect, store, and disseminate them. The MCSIS projects aim at sorting and presenting data, but we also work on much more dedicated search engines.
It is one thing to collect data, and a second thing to store it. But it is a real challenge to store data in such a way that you can find it back and do things with it. For this purpose we work on Molecular Class Specific Information Systems (MCSIS). A good information system must provide four important basic functions:
Query and retrieval sometimes involves searching in very many, very big databases. We have designed MRS for that purpose.
Browsing can include massive numbers of WWW pages like in the GPCRDB or in the NucleaRDB . And of course molecular visualisation with WHAT IF and YASARA . The WHAT IF software can write JMOL pages to further help with the dissemination of molecular data.