The software efforts will consist of five main tasks and a series of small activities. The main tasks will be:
- Task 1; Make WHAT IF options interactively available;
- Task 2; Put the HOPE molecule-specific data collection software on the SSP;
- Task 3; Make a connection to the PMP homology modelling portal (with SAC member T Schwede);
- Task 4; Make YASARA View available and usable for SSP users (with WP5);
- Task 5; Integrate MRS in the SSP (with WP2).
In principle the responsible researchers can work on these five tasks in parallel, albeit that the
actual integration in the portal can start experimentally around month 6 and in production mode in
month 13. All tasks include ensuring that the software can communicate through the SOAP protocol
using XML that complies with the commonly agreed-on ontologies and standards. Partner CMBI is
involved in the SeqAhead COST action that brings together a large consortium of European
bioinformaticians that will coordinate these activities for sequence related software and
databases. The SeqAhead recommendations will be followed in the NewProt project. The EDAM
ontology will be adopted for protein structure data and for computational methods.
Task 1: Make WHAT IF options interactively available
The WHAT IF software is for many years the de facto standard in rational protein engineering
research, either directly, or indirectly as integral part of, or supporting tool for other
software like, for example, foldx (http://foldx.crg.es/). WHAT IF has been kept up-to-date
for the past decades, and recently it has been fully incorporated in the YASARA modelling
and visualisation software. WHAT IF was designed with a state-of-the-art 1987 user interface
that today's scientists consider hard to learn how to operate. The WHAT IF options relevant
for protein engineering will be made more accessible, for example, by collecting them in a
YASARA protein engineering menu, by building Web servers around them, or by making them available
as Web services. Many WHAT IF options will be made callable from the HOPE software (see below).
A large series of WHAT IF options will be made available through the main interactive workbench
of the SSP. These will for example include a series of WHAT_CHECK structure validation options,
crystal packing options, and mutability prediction options. Many of these scientifically
complicated options are the result of previous large, collaborative projects; WHAT_CHECK,
for example, was the result of an fifth Framework EU project. The past investments in these
scientific options add up to tens of person years of work. A pilot project performed in the
framework of the sixth Framework EMBRACE NoE showed that making WHAT IF options fully
interoperable with other software will be doable in reasonable time.
Task 2: Put the HOPE molecule-specific data collection software on the SSP
The HOPE software (www.cmbi.ru.nl/hope/) is a system designed to be used by medical
researchers to get a molecular explanation for the observed phenotypic effects of a
mutation in the human genome. The HOPE software was designed to explain one single
mutation at the time, and is too simple for protein engineering purposes. HOPE's
underlying software that collects all kinds of data for each residue in a protein,
however, can be recycled to make for each protein of interest a simple spreadsheet
with massive amounts of elementary data for each residue; e.g. HSSP variability,
accessibility, rotameric freedom, crystal contacts, DNA/RNA contacts, ion contacts,
active site location, secondary structure, known variants, known variants in homologs,
underlying codons, codon conservation, location relative to splice sites, etcetera.
All this data is obtained using Web service calls to WHAT IF and SwissProt/UniProt,
and using DAS servers that were produced in the sixth Framework BioSapiens project.
The HOPE database will be fully integrated in the interactive workbench of the SSP.
The HOPE software contains a decision tree module that employs a simple form of
artificial intelligence to analyse the possible phenotypic effect of point mutations
that have been found related to genetic disorders. An attempt will be made to convert
this decision tree module in HOPE to allow it to function as a supervisor system that
analyses the mutations that the SSP user, after using all other SSP facilities,
finally decides to make. This approach, obviously, will have many limitations in
terms of experiments for which it will be applicable, but it will certainly be useful
for predicted point mutations. Further applicability needs to be studied.
Task 3: Make a connection to the PMP homology modelling portal
Homology modelling is a key process in most protein engineering projects. We
will collaborate with the Protein Model Portal (PMP) group at the Biozentrum
in Basel that is headed by T Schwede. T Schwede will also be a member of the
NewProt advisory board to optimize this portal-portal collaboration (and to make
NewProt benefit from his extensive experience in protein modelling, portals, and
user interactions). The PMP will be used to obtain homology models for the NewProt
users. Obviously, NewProt users can go directly to the PMP, but by taking the route
through the NewProt portal a) the modeller gets anonymized and b) all administrative
problems such as storage of the model at the NewProt portal will be dealt with
automatically c) the user doesn't need to worry.
The homology modelling procedure will include the possibility to run energy
minimisations and (very short) molecular dynamics simulations with the GROMACS
software. All scripts and files necessary to continue simulations for longer CPU
times on in-house computers will be made available to the users. The GROMACS interface
will be based on the WHAG software that is the result of a long-standing collaboration
between the CMBI and the Biozentrum (joint article in preparation).
Task 4: Make YASARA View available and usable for SSP users (with WP5)
YASARA View scenes will be produced that allow users to map most data on the
structure for visual inspection. YASARA View can be obtained freely from www.yasara.com,
and the SSP will hold a mapping to this download site. YASARA View will be used as
the visualisation engine for nearly all SSP results that are, or map on, 3D structures.
The WHAT IF options that produce output that is suitable for 3D visualisation and that
produces results that cannot be obtained with YASARA View will need to produce output
that YASARA View can read, understand, and convert into visual effects. The operations
needed to achieve this interaction must remain totally hidden for the users.
Task 5: Integrate MRS in the SSP (with WP2)
The MRS data collection and database search engine will be used for all keyword
driven database searches. This database query system is simple to use, easy to
integrate in other software, very fast and flexible. The CMBI MRS search engine is
providing thousands of queries per day in almost 30 databases. This can easily be
handled by a simple PC. MRS was designed with applications such as integration in
systems like a SSP in mind and all required interoperability facilities are already
in-place in the MRS software. MRS does include its own internal (re-engineered)
version of the well-known BLAST database query software. A facility will be added
to MRS' BLAST to flexibly limit the search to proteins from, for example, thermophilic
Validation and documentation
All tasks will be followed by validation (both software wise in WP1 to validate the
technical aspects and by mutation experiments in WP6 to validate the scientific aspects
of the products), and extensive documentation (explanation, help facility, course material).
Additional software activities
The SSP will also hold a series of other software packages. These are packages that
occasionally might be of use to protein engineers, but will not be needed routinely.
These will not be fully integrated in the SSP (unless there will be popular demand for
such integration), but will be made interactively executable, and downloadable. The
CMBI will be responsible for these installations. Examples are:
The WHAT_CHECK (very extensive) structure validation suite;
The BioMeta database and search engine that, in due time, will provide the most
likely metabolite docked 'by homology' in PDB files that hold the structure of a
substrate or product analog. BioMeta also provides sub-structure search facilities
for the ligands found in the PDB; sub-structures can be sketched using the JME software.