NewProt: Workpackage 3

NewProt: A European protein engineering project

Home

Partners

Additional information
Something
Something else
Press release
Legal note
Acknowledgements

WP 3: Software collection and integration

Objectives

Software relevant for protein engineering will be installed and made interoperable. This will mainly involve CMBI products, several open source packages, and YASARA View.

Introduction

The software efforts will consist of five main tasks and a series of small activities. The main tasks will be:

Task 1; Make WHAT IF options interactively available;
Task 2; Put the HOPE molecule-specific data collection software on the SSP;
Task 3; Make a connection to the PMP homology modelling portal (with SAC member T Schwede);
Task 4; Make YASARA View available and usable for SSP users (with WP5);
Task 5; Integrate MRS in the SSP (with WP2).

In principle the responsible researchers can work on these five tasks in parallel, albeit that the actual integration in the portal can start experimentally around month 6 and in production mode in month 13. All tasks include ensuring that the software can communicate through the SOAP protocol using XML that complies with the commonly agreed-on ontologies and standards. Partner CMBI is involved in the SeqAhead COST action that brings together a large consortium of European bioinformaticians that will coordinate these activities for sequence related software and databases. The SeqAhead recommendations will be followed in the NewProt project. The EDAM ontology will be adopted for protein structure data and for computational methods.

Task 1: Make WHAT IF options interactively available

The WHAT IF software is for many years the de facto standard in rational protein engineering research, either directly, or indirectly as integral part of, or supporting tool for other software like, for example, foldx (http://foldx.crg.es/). WHAT IF has been kept up-to-date for the past decades, and recently it has been fully incorporated in the YASARA modelling and visualisation software. WHAT IF was designed with a state-of-the-art 1987 user interface that today's scientists consider hard to learn how to operate. The WHAT IF options relevant for protein engineering will be made more accessible, for example, by collecting them in a YASARA protein engineering menu, by building Web servers around them, or by making them available as Web services. Many WHAT IF options will be made callable from the HOPE software (see below). A large series of WHAT IF options will be made available through the main interactive workbench of the SSP. These will for example include a series of WHAT_CHECK structure validation options, crystal packing options, and mutability prediction options. Many of these scientifically complicated options are the result of previous large, collaborative projects; WHAT_CHECK, for example, was the result of an fifth Framework EU project. The past investments in these scientific options add up to tens of person years of work. A pilot project performed in the framework of the sixth Framework EMBRACE NoE showed that making WHAT IF options fully interoperable with other software will be doable in reasonable time.

Task 2: Put the HOPE molecule-specific data collection software on the SSP

The HOPE software (www.cmbi.ru.nl/hope/) is a system designed to be used by medical researchers to get a molecular explanation for the observed phenotypic effects of a mutation in the human genome. The HOPE software was designed to explain one single mutation at the time, and is too simple for protein engineering purposes. HOPE's underlying software that collects all kinds of data for each residue in a protein, however, can be recycled to make for each protein of interest a simple spreadsheet with massive amounts of elementary data for each residue; e.g. HSSP variability, accessibility, rotameric freedom, crystal contacts, DNA/RNA contacts, ion contacts, active site location, secondary structure, known variants, known variants in homologs, underlying codons, codon conservation, location relative to splice sites, etcetera. All this data is obtained using Web service calls to WHAT IF and SwissProt/UniProt, and using DAS servers that were produced in the sixth Framework BioSapiens project. The HOPE database will be fully integrated in the interactive workbench of the SSP.

The HOPE software contains a decision tree module that employs a simple form of artificial intelligence to analyse the possible phenotypic effect of point mutations that have been found related to genetic disorders. An attempt will be made to convert this decision tree module in HOPE to allow it to function as a supervisor system that analyses the mutations that the SSP user, after using all other SSP facilities, finally decides to make. This approach, obviously, will have many limitations in terms of experiments for which it will be applicable, but it will certainly be useful for predicted point mutations. Further applicability needs to be studied.

Task 3: Make a connection to the PMP homology modelling portal

Homology modelling is a key process in most protein engineering projects. We will collaborate with the Protein Model Portal (PMP) group at the Biozentrum in Basel that is headed by T Schwede. T Schwede will also be a member of the NewProt advisory board to optimize this portal-portal collaboration (and to make NewProt benefit from his extensive experience in protein modelling, portals, and user interactions). The PMP will be used to obtain homology models for the NewProt users. Obviously, NewProt users can go directly to the PMP, but by taking the route through the NewProt portal a) the modeller gets anonymized and b) all administrative problems such as storage of the model at the NewProt portal will be dealt with automatically c) the user doesn't need to worry.

The homology modelling procedure will include the possibility to run energy minimisations and (very short) molecular dynamics simulations with the GROMACS software. All scripts and files necessary to continue simulations for longer CPU times on in-house computers will be made available to the users. The GROMACS interface will be based on the WHAG software that is the result of a long-standing collaboration between the CMBI and the Biozentrum (joint article in preparation).

Task 4: Make YASARA View available and usable for SSP users (with WP5)

YASARA View scenes will be produced that allow users to map most data on the structure for visual inspection. YASARA View can be obtained freely from www.yasara.com, and the SSP will hold a mapping to this download site. YASARA View will be used as the visualisation engine for nearly all SSP results that are, or map on, 3D structures. The WHAT IF options that produce output that is suitable for 3D visualisation and that produces results that cannot be obtained with YASARA View will need to produce output that YASARA View can read, understand, and convert into visual effects. The operations needed to achieve this interaction must remain totally hidden for the users.

Task 5: Integrate MRS in the SSP (with WP2)

The MRS data collection and database search engine will be used for all keyword driven database searches. This database query system is simple to use, easy to integrate in other software, very fast and flexible. The CMBI MRS search engine is providing thousands of queries per day in almost 30 databases. This can easily be handled by a simple PC. MRS was designed with applications such as integration in systems like a SSP in mind and all required interoperability facilities are already in-place in the MRS software. MRS does include its own internal (re-engineered) version of the well-known BLAST database query software. A facility will be added to MRS' BLAST to flexibly limit the search to proteins from, for example, thermophilic species.

Validation and documentation

All tasks will be followed by validation (both software wise in WP1 to validate the technical aspects and by mutation experiments in WP6 to validate the scientific aspects of the products), and extensive documentation (explanation, help facility, course material).

Additional software activities

The SSP will also hold a series of other software packages. These are packages that occasionally might be of use to protein engineers, but will not be needed routinely. These will not be fully integrated in the SSP (unless there will be popular demand for such integration), but will be made interactively executable, and downloadable. The CMBI will be responsible for these installations. Examples are:

The WHAT_CHECK (very extensive) structure validation suite;

The BioMeta database and search engine that, in due time, will provide the most likely metabolite docked 'by homology' in PDB files that hold the structure of a substrate or product analog. BioMeta also provides sub-structure search facilities for the ligands found in the PDB; sub-structures can be sketched using the JME software.

The NewProt project is funded by the European Commission within its FP7 Programme, under the thematic area KBBE-2011-5 with contract number 289350.