Bioinformatics

EU name: H1N101

(Date: Aug 28 2017 H1N101 )

After completing this section you will:
Be able to describe what is bioinformatics,
Know where, why, and by whom bioinformatics techniques and tools are being used,
Understand the concept of "transfer of information", which you use during the entire course to predict properties of newly discovered sequences.

Bioinformatics is the study of biological data using informatics tools. At least that is what the name suggests. However, in practice it is a bit different. Some topics in bioinformatics have nothing to do with biological material (for example, some aspects of drug design). There are parts of bioinformatics that require little knowledge of computers or informatics (for example, the analysis of a multiple sequence alignment). And there are fields that use computers and informatics to work on biological samples, but nevertheless are not considered to be bioinformatics (for example, electron microscopy, or NMR, or X-ray, etc).

What is bioinformatics?

Well, that is a good question. Actually, the question is so good that nobody can answer it. There are even committees that think about this problem. However, we can easily mention some aspects of bioinformatics, and will do that in the next paragraphs.

The human genome

We know that all information that determines who, why, and what we are is located in our chromosomes. In these chromosomes the information is coded by DNA. The body translates this DNA first into RNA and then into proteins, and the proteins do all the work in the body. The DNA in your body can code for about 25000 proteins. Soon, all parts of the DNA that code for these proteins, the gene, will all be known, and that will spur a flurry of bioinformatics studies. At present about 98% of the genome is available, but much bioinformatics work is still needed before we can fully harvest this information.

Figure 1. Here a large part of the human genome was sequenced.

Figure 2. And in this building they do much bioinformatics on the human genome.

With the human genome roughly known, we can calculate how processes take place in our body. We can use computers to design experiments that can be done in a lab to obtain even more information. (The next-generation bioinformatician will have a background in molecular biology, protein chemistry or medicinal chemistry, and might even perform these experiments him/herself)! We can start comparing entire databases, searching for the causes of many diseases, and if we find a protein that causes a disease, we can try to design a medicine. In summary, there are more challenges than students!

Figure 3. What does a chromosome look like?

Protein engineering

There also exist less spectacular forms of bioinformatics. For example, bioinformaticians are involved in improving the enzymes in soap powder (yes, there really are enzymes in laundry powder, it is not just a marketing invention). This seems rather uninspiring work, but by improving the activity of the enzymes, soap powder becomes less harmful to the environment, and saving the environment (or at least a tiny part of it) makes the bioinformatician feel good.

Figure 4. A re-engineered soap powder enzyme

Drug design

Many bioinformaticians work on medicine development. We will not elaborate on this topic right here, but it must be clear that a bit of software is probably necessary to create a cure for migraine. Some aspects of drug design will be discussed in the course Computational Drug Discovery.

Figure 5. One of the many possible rational ' drug design' schemes.

The design of medicines requires a lot of high throughput screening, which involves a lot of robotics.

Figure 6. A programmable robot

The number of diseases that are being studied using bioinformatic tools is enormous. I think that bioinformatic tools are involved in the design of each and every medicine. The picture below lists only a few.

Figure 7. Diseases amenable to bioinformatics

The flu

Recently we have had media-pandemics about H5N1 and H1N1. One of these days we might get a flu-pandemic that, like the Spanish flu, will kill several percent of the world's population. I don't think many words are needed to explain the importance of bioinformatics and related in silico tools.

Plant breeding

Bioinformatics is rapidly becoming the most important tool available to the plant breeding field. For example, plant breeders are trying to find out which potatoes are more resistant to certain parasites, or which strawberry tastes 'better'. A very small plant, arabidopsis thaliana, is the plantbreeders paradigm. Almost the entire sequence of this small plant is known, and very many studies are performed on this plant. Scientists hope that what they discover about arabidopsis thaliana will also be true of other plants. They study the size of the flowers and the embryo (seed) of this plant, and hope to use this information for the breeding of other plants. Remember, when you are eating green peas, you are eating plant embryos!

Figure 8. The model for most plant research: Arabidopsis.

The determination of the DNA sequence of this plant is an integral part of all the studies. Bioinformatician are not only involved in the strategy of sequencing, but also in data management, and, most importantly, in harvesting this wealth of information. This last part, harvesting the genome data, will require a lot of similarity searches and (multiple) sequence alignments, and those are the main topics of this course.

Figure 9. There are many ways to annotate a genome sequence.

What about this course?

Bioinformatics is everywhere. Computers are invading the labs, and are being used for a rapidly increasing number of tasks. Everywhere where computers enter the lab, bioinformaticians are in high demand. You see this in many fields: the food industry, medicinal chemistry, the design of diagnostic tests ("please, sir, one drop of blood on this filter paper, and in two minutes this computer tells you what medicine you will need"), plant breeding, evolution theory, making artificial sweeteners, etc. You ask, and the bioinformatician calculates. So, there is no way to define bioinformatics properly. The only way to get to know what bioinformatics is, is to actually do it. One thing, though, is clear: sequence alignments are an essential aspect of very many bioinformatics projects.

Some examples where sequence alignments are used