|
After completing the "Aligning" section you will: |
The most powerful weapon in the bioinformaticist's armoury is sequence alignment. Why?
Let's think about an alignment. It is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution (mutation of an asparagine or a serine, conservation of a tryptophan or a cysteine bridge) should be most prominently observed in your alignment.
What kind of things are important? Let's give a few examples:
We shall now start working on sequence alignments. We shall steadily add one rule after another, and learn a few new physicochemical properties of amino acids at the same time.
We already discussed that there are two kinds of sequence alignments. The one kind tries to align those residues that have a common ancestor. The other kind tries to align those residues that fall on top of each other when the corresponding structures get superposed three-dimensionally.
In this course we are mainly interested in this latter type of alignment. Obviously, if two residues sit at similar positions in similar structures, they are likely to have similar physico-chemical properties. So, lets start using everything we learned about amino acids in some "real" alignments.
As we are bioinformaticians, we are not just going to run an alignment program and look at the result. No, we are going to think about it and use all kinds of additional information. We can recognize several levels of sophistication in the information we can use:
For each of the following examples, work out which is the better alignment: the right or the left. No additional knowledge is available. The secondary structure of CPISRT or FRCW cannot be predicted reliably.
Question 93: Which is each time the better alignment, right or left (and why)? The first four are not so difficult, but after that....
CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW |
CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW |
CPISRTIASNFRCW CPISRTIASNFRCW CPISRTH---FRCW CPISRT---HFRCW |
CPISRTEASDFRCW CPISRTEASDFRCW CPISRT---NFRCW CPISRTN---FRCW |
CPISRTSASIFRCW CPISRTSASIFRCW CPISRT---TFRCW CPISRTT---FRCW |
CPISRTGASIFRCW CPISRTGASIFRCW CPISRTA---FRCW CPISRT---AFRCW |
CPISRTEASNFRCW CPISRTEASNFRCW CPISRTQ---FRCW CPISRT---QFRCW |
CPISRTFASTFRCW CPISRTFASTFRCW CPISRT---YFRCW CPISRTY---FRCW |
Answer
Sometimes the secondary structure of one or more of the sequences is known. This can either be the secondary structure as derived from a PDB file (which holds 3D coordinates) or it can be a predicted secondary structure. In this section we will look at a coarse way to predict secondary structure. In the sections thereafter we will use predicted secondary structure characteristics to make better alignments. Before we use this information let's look at some aspects of secondary structure. You know that secondary structure elements fall in four categories: helix strand turn the rest. If you look at the Chou and Fasman parameters (and other useful data) you will see that there is relation between residue type and secondary structure. As always in bioinformatics, the rules suggested by these parameters aren't exactly hard and fast, and exceptions abound. Nonetheless, they do make some sense, so we shall study them.
Supplemental material NOTE
Question 94: Using the Chou-Fasman parameters, predict the secondary structure of the following sequences:
ELMKIAQLAKRGP VVICETTWYVEVT VTITVEGPKITVE SRGGEPTRHEAKE ELLALKLLTVTVT (a loop/turn of at least one residue is needed between helix and strand) |
Answer
Question 95: And now, using everything you have learned so far, select from each of these pairs the better helix:
ALQLNMQAKALL ANQLLMQAAKLL |
ARAAEALLQAAE AEAAEALLQAAK |
ALLLAALLLAL AAEALAKALLR |
Answer
Question 96:
And now, using everything you have learned so far, select from each of
these pairs the better strand:
VVKISVTIKSG LLKISLTIILI |
VVTTVVTTVVTT VTVTVTVTVTVT |
VVICFFWIIFVI VKICFKSIYVRI |
VKITFEITVEIR IRVTWRGTINIE |
Answer
EU name: HELILU
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
We have discussed already that the alpha helix normally has a hydrophobic side and a hydrophilic side. Obviously, that is only the case for helices that pack against something. Look at the following two packed, short helices:
Question 97:
|
Figure 68. The gray circles represent the central cores of two packed helices and the coloured balls represent side chains. |
Which residues in figure 68 are almost guaranteed hydrophobic?
AnswerThere are two ways to draw pictures like figure 68. And the next figure illustrates that.
Question 98:
|
Figure 69. a) helical wheel for a helix of length 18. b) The helix cut open and folded flat. |
Explain how figure 69 b was made. Explain this, in 3D, to the assistants
before continuing.
Colour in figure 69 a those residues that are encircled
in figure 69 b.
Do you see anything striking?
Question 99: Lets use these helical wheel plots. The assistants have them already printed for you. Take the two sequences:
1: AELAKAMQAAQLMEAIKGGS |
Predict the secondary structure. Draw the helical part on the helical wheels;
both on the circular wheel (a) and on the flat plot (b).
And let your partner do this one:
2: SPAELAKQALEAAKQLAEAGGSP |
After drawing them, describe the differences. Which is the better helix, and why?
Answer
EU name: STRALI
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
Look at the sequences:
S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G |
In both cases it is clear that the left part of the sequence does not have a regular structure, whereas the right part is helical. And that is the available additional information, an easy to predict secondary structure. By shifting the sequences back and fro, we can find several reasonably poor/good alignments. But we know both contain the N-terminus of a helix. So lets find the ends of the helices. For that we go back to the table:
-4 -3 -2 -1 1 2 3 4 5 total
- - - - H H H H H
ALA 143 148 99 58 189 205 187 241 268 1538
CYS 24 31 29 22 14 17 18 33 17 205
ASP 98 110 121 260 98 197 167 49 86 1186
GLU 91 100 71 71 152 287 269 70 147 1258
PHE 53 70 90 29 68 46 49 107 65 577
GLY 207 246 166 192 96 127 99 65 60 1258
HIS 48 50 39 46 28 36 38 24 30 339
ILE 94 81 133 19 79 45 68 161 99 779
LYS 99 98 80 46 98 105 69 80 154 829
LEU 105 111 188 50 140 84 113 281 209 1281
MET 37 20 51 13 26 22 54 61 67 351
ASN 103 83 89 206 46 62 55 37 77 758
PRO 143 136 121 99 240 78 40 0 0 857
GLN 48 58 40 38 83 93 124 76 101 661
ARG 82 63 59 51 71 75 61 114 109 685
SER 112 128 98 292 105 126 99 48 76 1084
THR 106 99 119 253 91 80 115 72 67 1002
VAL 141 107 132 37 117 74 120 208 120 1056
TRP 29 25 29 14 30 26 28 30 29 240
TYR 66 65 75 33 58 44 56 72 48 517
|
We now use this table and indicate preferred positions of residues relative to the first
position of the helix.
S G V S P D Q L A A L K L I L E L A L K
-1-4-4-1-4-1 3-2 1 1-2 2
-3-2 -3 2 5 1 2 2 1 5
4 -2 3 4 3 3 4
1 5 4 4 5
5 5
G T S L E T A L L M Q I A Q K L I A G
-4-1-1-2 2-1 1-2
-3 3 1 3 3 2 1
4 3 4
5 4 5
5
|
So, the optimal paths that put each residue as much as possible at its preferred position is:
S G V S P D Q L A A L K L I L E L A L K
-1-4-4-1-4-1 3-2 1 1-2 2
-3-2 -3 2 5 1 2 2 1 5
4 -2 3 4 3 3 4
1 5 4 4 5
5 5
G T S L E T A L L M Q I A Q K L I A G
-4-1-1-2 2-1 1-2
-3 3 1 3 3 2 1
4 3 4
5 4 5
5
|
So, in the top sequence the helix starts with PDQ and in the bottom sequence with LET. In total only two residues are not optimally happy with this arrangement (which two, and why isn't this so bad after all?). The optimal alignment thus must be:
S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G |
And that would be difficult to find using an alignment program. Clustal will do it right, but with only three identities it would be unhappy with the result. Checking these helix cap propensities gives you much confidence in this alignment.
Question 100: Using the same ideas as the example given on the web-site just above this question, align:
NHSGPPSTSGPAQLLAKALEIALK PGISAEMVALKALLEALQALELLLR |
Ps, no need to try Clustal on this one, it will do it wrong!
Answer
EU name: STRILU
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
We wil analyze a short hairpin in detail.
Question 101:
|
Figure 73. β-hairpin. |
This was a very nice hairpin with everywhere the ideal residue for the position where it was sitting. Unfortunately, I lost all 14 side chains so that it now looks like a poly-alanine β-hairpin. Can you invent a sequence of 14 amino acids that would make this β-hairpin the ideal β-hairpin.
Hint. Use everything you know about β-strands and β-turns; think first at which side you plan the water and at which side the rest of the protein; think which residues should be β-turn lovers, etc.
Let the assistant check your ideal sequence before continuing with the next question.
Answer
EU name: STRAL2
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
In the previous example we used the predicted N-terminal start of a helix as additional information, and we had to find out the relations between residue types and secondary structure to predict this helix end. In the next example we will use the predicted secondary structure of two β-hairpins to aid with their alignment.
First, try to align the sequences TCTVTSNSITCT (A) and TCTVSTCT (B).
When using only the sequence information this may seem straightforward.
But how would you align A and B if not only the sequence but also the structure of A was known? Or, in other words: "Let's pretend that you want to predict the structure of B, with all the information you have on A."
Two possible alignments, B1 and B2 are shown below:
A TCTVTSNSITCT A TCTVTSNSITCT B1 TCTVS----TCT B2 TCTV----STCT |
Question 102:
Answer
One last example:
Question 103: Predict the secondary structure of the following two sequences:
CWEALALLAELALAAMKGSTPNGS CWEALALLLEALMRGTTPNGG |
Align these two sequences by taking into account the secondary structure prediction. Then make an alignment as you would expect a computer program to do it. Which of these two alignments is better and why?
AnswerSo, if it is not clear by now that structural knowledge can help to fine-tune the alignment, you are in trouble...