We have already discussed MD a bit. You can find a summary of the MD seminar-like movie in the seminars list.
Today we are going to think about making a force field ourselves.
All atoms attract each other. Period. That looks a very wrong remark, but it isn't. There is an attractive force between all pairs of atoms. However, there are additional, repulsive, forces that can be stronger in certain distance ranges. Repulsion is seen between equally charged atoms, or when two atoms come too close to each other. Although it is not correct, it is generally assumed that the attractive force behaves as 1/R6 with the inter atomic distance R and the repulsive force at short distances goes with 1/R12 (actually these are potential energies, and not forces, but people in the Molecular dynamics field like to call a set of energies a force field; don't ask me why). These two effects have an opposite sign, which results in the plot shown at the below.
Knowledge of the distance of optimal happiness is important for our understanding of protein structures. The distances of optimal happiness is generally referred to as R0.
Question 51:
Go back to the seminar on molecular dynamics. Find all formulas that deal with
the relation between geometric relations between atoms and their contribution
to the overall energy of the protein. Write down the five sub-parts of the
formula that every molecular dynamics program seems to have included in the
source code.
Which (part of) the force field equations describes figure 51.
Look at the formula for figure 51 again. What is the Energy value when I fill in 1000.0
Ångström for rij?
In this experiment, we will all together determine one parameter of a force field, namely the distance between the donor and the acceptor in a hydrogen bond.
There are many ways of doing this. Please think and come up with at least three really different methods.
A crude method that is doable within this course is to estimate the most occurring short distance between a pair of atoms. I.e. if you measure for an inter-atomic distance between two carbons: 3.93 3.95 4.40 3.98 4.02 3.81 3.94 4.00 3.95 and 3.94 it is reasonable to assume that the corresponding R0 is around 3.94.
The fact that you also find longer distances is simply the result of the fact that there are more atoms in a protein than just those two you are looking at. Not all atoms and groups can have optimal pairwise packing.
Each team should take one protein structure that was solved by X-ray with a resolution of 1.2 Ångström or better.
Question 52:
How do you find structures solved at better than 1.2 Ångström resolution (this
is not simple, but MRS can do it, fight with the MRS help...)?
Why can we not use NMR structures?
Why should we use structures solved at 1.2 Ångström or better?
Tabulate at least 10 close donor-acceptor distances. Close is defined as clearly less than 4.5 Ångström. Try not to find the closest or the least close distances. Just take any you can find. Your choice of atom pairs will influence the final outcome. So, make sure you think about this problem before blindly starting. Neither take any β-strand - β-strand helices nor take any intra-helical hydrogen bonds (why not, actually?).
When done, ask some computer guru to make a histogram with bins of 0.1 Ångström and add up in each bin the numbers of H-bonds found in total by all groups in the course. Think about the curve that you expect. Think about a way to determine R0 from the curve.
Question 53:
Answer
Supplemental material
Supplemental material
Force fields are everywhere. In the movie The Elevator, an intelligent elevator goes beserk. So obviously the software designers for this elevator went one step to far. The idea to make an elevator smart so that it spends less electricity and gets people faster where they want to be is of course based on a force field.
Question 55:
How would you design the software for the elevator in the Radboud parking house?
Why is this a much more difficult task than makeing a force field to predict
the secondary structure of proteins?
Lets make a force field to predict transmembrane helices in G protein-coupled receptors (GPCRs).
To do so we need data to set-up the force field. Normally, we would take a large set of membrane proteins with know structure, but as this practical is only meant to illustrate the concepts, we will use just one protein to generate the data. This protein is an opsin, so for the purists, we are going to design a computer program to predict transmembrane helices in opsins. But we will nevertheless try it out on 'something else' for validation purposes.
So, today we will not only design and implement a force field to predict transmembrane helices in GPCRs, but we will also test our force field. To achieve this double goal, we will use data only from one GPCR to set-up the force field, and we use a totally different GPCR to see how well the force field works.
But first a short intro in GPCRs:
In the box below, you find the secondary structure determination of the opsin file shown in figure 52. In this box the conserved residues are numbered for easy reference.
130 | 1 - 60 MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLY 1 - 60 TSS TT SSTT TTT TTTT 333T HHHHHHHHHHHHHHHHHHHHHHHHHHH 220 224 305 | | | 61 - 120 VTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLG 61 - 120 HHHHTTT THHHHHHHHHHHHHHHHHHHTHHHHHHHHHHTT TTHHHHHHHHHHHHHHH 340 420 | | 121 - 180 GEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIP 121 - 180 HHHHHHHHHHHHHHHHHHHT TTT HHHHHHHHHHHHHHHHHHHTHHHHTTT SSS 520 528 | | 181 - 235 EGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAA 181 - 235 STTTTSSSS T TTTTHHHHHHHHHHHTTHHHHHHHHHHHHHTTTT T 620 | 236 - 295 SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSA 236 - 295 TTTTHHHHHHHHHTHHHHHHHHHHHHHHHHHHHHHHTTT HHHHHHHHHHHHHHH 730 | 296 - 323 VYNPVIYIMMNKQFRNCMVTTLCCGKNPSTTVSKTETSQVAPA 296 - 323 HHHHHHHHHH HHHHHHHHHHHTTT TTTTT |
Question 56: Do the helices that you see with YASARA, and the helices indicated in the box agree? And if they don't, why?
Answer
And now the important part. Work in groups of three or four students, and
come up with a plan for the force field design. The elements will be: |
All the counting has already been done for you, albeit that the counting results have a somewhat clumsy format: the HSSP format. HSSP files consist of a header:
HSSP HOMOLOGY DERIVED SECONDARY STRUCTURE OF PROTEINS , VERSION 1.1 2001 PDBID 1f88 DATE file generated on 17-May-10 SEQBASE UniProtKB/Swiss-Prot(20-Apr-2010) UniProtKB/TrEMBL(20-Apr-2010) PARAMETER SMIN: -0.5 SMAX: 1.0 PARAMETER gap-open: 3.0 gap-elongation: 0.1 PARAMETER conservation weights: YES PARAMETER InDels in secondary structure allowed: YES PARAMETER alignments sorted according to :DISTANCE THRESHOLD according to: t(L)=(290.15 * L ** -0.562) + 5 REFERENCE Sander C., Schneider R. : Database of homology-derived protein structures. Proteins, 9:56-68 (1991). CONTACT Maintained at www.CMBI.ru.nl by Elmar.Krieger.cmbi.umcn.nl AVAILABLE Free academic use. Commercial users must apply for license. Etcetera. |
The section that lists all aligned sequences:
## PROTEINS : EMBL/SWISSPROT identifier and alignment statistics NR. ID STRID %IDE %WSIM IFIR ILAS JFIR JLAS LALI NGAP LGAP LSEQ2 ACCNUM PROTEIN 1 : OPSD_BOVIN 3DQB 1.00 1.00 565 648 243 326 84 0 0 348 P02699 RecName: Full=Rhodopsin; 2 : OPSD_BOVIN 3DQB 1.00 1.00 342 483 1 142 142 0 0 348 P02699 RecName: Full=Rhodopsin; 3 : OPSD_BOVIN 3DQB 1.00 1.00 1 235 1 235 235 0 0 348 P02699 RecName: Full=Rhodopsin; 4 : OPSD_BOVIN 3DQB 1.00 1.00 237 324 240 327 88 0 0 348 P02699 RecName: Full=Rhodopsin; 5 : OPSD_BOVIN 3DQB 1.00 1.00 485 563 148 226 79 0 0 348 P02699 RecName: Full=Rhodopsin; 6 : OPSD_SHEEP 0.99 1.00 342 483 1 142 142 0 0 348 P02700 RecName: Full=Rhodopsin; 7 : OPSD_FELCA 0.99 0.99 342 483 1 142 142 0 0 348 Q95KU1 RecName: Full=Rhodopsin; 8 : OPSD_PHOGR 0.99 0.99 342 483 1 142 142 0 0 348 O62795 RecName: Full=Rhodopsin; 9 : OPSD_PHOVI 0.99 0.99 342 483 1 142 142 0 0 348 O62794 RecName: Full=Rhodopsin; 10 : Q2KND7_URSMA 0.98 0.99 358 483 1 126 126 0 0 322 Q2KND7 SubName: Full=Rhodopsin; 11 : Q2KND6_PHOHI 0.98 0.99 358 483 1 126 126 0 0 324 Q2KND6 SubName: Full=Rhodopsin; 12 : OPSD_CANFA 0.98 0.99 342 483 1 142 142 0 0 348 P32308 RecName: Full=Rhodopsin; Etcetera |
The alignment part is important to find the residue numbers back for the helices (I coloured the secondary structure assignments red; a H in this column, obviously, indicates a helix).
## ALIGNMENTS 1 - 70 SeqNo PDBNo AA STRUCTURE BP1 BP2 ACC NOCC VAR ....:....1....:....2....:....3....:....4....:....5....:....6....:....7 1 1 A M 0 0 192 292 3 M M M MMMMM M MM MMM M MMMM 2 2 A N + 0 0 72 430 27 N N N NNNNN N NN NNN N NNNN 3 3 A G S S- 0 0 5 431 33 G G G GGGGG G GG GGG G GGGG 4 4 A T E -A 11 0A 28 453 23 T T T TTTTT T TT TTT T TTTT 5 5 A E E -A 10 0A 112 465 21 E E E EEEEE E EE EEE E EEEE 6 6 A G - 0 0 10 474 21 G G G GGGGG G GG GGG G GGGG 7 7 A P S S+ 0 0 108 466 51 P P P PPPPP P PP PPP P PPPP 8 8 A N S S+ 0 0 72 493 51 N N N NNNNN N NN NNN N DNNN 9 9 A F - 0 0 18 503 17 F F F FFFFF F FF FFF F FFFF 10 10 A Y E -A 5 0A 10 518 39 Y Y Y YYYYY Y YY YYY Y YYYY 11 11 A V E -A 4 0A 0 547 32 V V V VVVVV V VVV VVV V IVVV 12 12 A P H S+ 0 0 6 525 33 P P P PPPPP P PPP PPP P PPPP 13 13 A F H S- 0 0 61 605 34 F F F FFFFF F FFF FFF F MFFF 14 14 A S H - 0 0 45 605 42 S S S SSSSS S SSS SSS S SSSS 15 15 A N H > + 0 0 68 609 20 N N N NNNNN N NNNHHNNN N NNNN |
And the last section holds the profile (see the sequence alignment seminar and practical) from which you can get an idea about the counting statistics.
## SEQUENCE PROFILE AND ENTROPY SeqNo PDBNo V L I M F W Y G A P S T C H R K Q E N D NOCC NDEL NINS ENTROPY RELENT WEIGHT 1 1 A 3 2 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 292 0 0 0.234 8 0.94 2 2 A 0 0 0 0 0 0 0 0 0 0 5 0 0 0 3 0 1 11 65 15 430 0 0 1.153 38 0.47 3 3 A 0 10 0 1 0 0 0 69 0 0 14 0 0 0 0 3 0 0 0 0 431 0 0 1.057 35 0.36 4 4 A 0 0 0 3 0 0 0 0 0 9 1 78 0 0 0 0 2 4 0 2 453 0 0 0.912 30 0.56 5 5 A 0 1 0 0 0 0 0 0 0 1 3 0 0 0 5 0 10 78 0 1 465 0 0 0.890 30 0.62 6 6 A 0 0 0 0 0 0 0 70 3 2 7 0 0 0 0 0 0 7 0 11 474 0 0 1.062 35 0.56 7 7 A 0 1 2 0 13 0 0 0 0 34 17 1 0 0 0 14 2 9 4 3 466 17 1 1.943 65 0.10 8 8 A 0 0 10 0 13 7 10 0 5 0 0 0 0 0 0 5 0 5 33 11 493 17 0 2.036 68 0.10 9 9 A 0 1 10 1 80 1 1 0 0 0 0 0 0 4 0 0 0 1 0 1 503 17 0 0.817 27 0.65 Etcetera. |
Obviously, we would like to have the counts for each residue type inside the helices and inbetween the helices. Unfortunately, the profile gives you alignment occupancies and percentages per residue type. The Web server called teller can be used to convert the HSSP profile values into raw amino acid counts. Just cut-n-paste the values listed in the box into the 'teller' software, and check that you get (and understand):
Now count what you want to count and get the force field parameters. (I am sorry, the 20 divisions and natural logarithms you will have to do by pocket calculator).
First check your force field in terms of plausibility. Do the hydrophobic residues have a high potential for TM helices, and do Arg, Lys, Glu, and ASP indeed score very low as you would expect? If not dicuss with the assistant what went wrong and fix it.
We did not make any software to score things with your force field, but you can see what happens if you write the potentials next to the amino acids in this short sequence:
N G T E G P N F Y V P F S N K T G V V R S P F E A P Q Y Y L A E P W Q F S M L A A Y M F L L I M L G F P I N F L T L Y V T V Q H K K L R T P L N Y |
And if all goes well, you will find one TM helix...
The final one.
Question 57:
How would you write software to predict if a residue is an active site residue or not?
Don't forget that active sites are always at the surface of the protein.
Think of the data-set you need.
Think of the null-model.
Think of the real model, and how to convert counting into scoring.
Think about testing the method.