Structure Function Bioinformatics

Exams

First some examples of the (in)famous amino acid test.

Here is the 2008 version with answers . And here the 2007 version with answers . And here the 2009 version with answers . And here the 2010 version , and the 2010 version with answers. And here the 2014-2015 version , and the 2014-2015 version with answers.

The exam in 2005 (Vriends part), with answers typed

Give three examples of stabilising mutations that mainly influence entropy.
To influence entropy and be stabilizing, the mutation should reduce the freedom of the unfolded chain (the folded chain is mainly immobile already, anyway, so you cannot do much with the folded chain). Examples are then Lys -> Arg because Arg contains the highly rigid guanidinium group. Obviously, Lys -> Arg will only work when the Lys/Arg are involved in some salt-bridge. If the Lys/Arg side chain is fully exposed at the surface and highly mobile, than folding/unfolding will not be very different for the two situations. Other examples are Gly -> Xxx, or Xxx -> Pro. These mutations tend to work very well in practice. The concept is that is that upon folding the main energy gain is the entropy (and of course also the enthalpy) of water that obtais its freedom because it is no longer facing hydrophobic residues. But,... this comes at the cost of loosing the freedom/mobility of the protein chain. If you reduce the freedom of the protein chain in the unfolded form (in the folded form the protein has hardly any freedom/mobility, no matter the sequence) the you loose less freedom upon folding. Obviously, there are more examples, e.g., if you mutate a hydrophobic residue in the core so that it nice fills a cavity (Val -> Ile or also Gly -> are often used, and the latter even works 'double') then you gain a lot of entropy of water. You can also insert an Asn or Asp at a place where it can form a hydrogen bond with the own backbone (see also the Asp versus Glu question three questions further down). See also the cysteine bridge question a few questions further down.
Give three examples of stabilising mutations that mainly influence enthalpy.
To influence the enthalpy, you need more interactions. Examples are the introduction of a Asp, Glu, Lys, or Arg to form a salt bridge, or mutate a Val -> Thr to form a hydrogen bond. More examples can be dreamt up of course, like compensating the dipole moment of a helix, compensating unbalanced charges at/near ions, etc. The summary is making an extra polar interaction.
What is a HSSP file, and why is it useful for stability engineering?
This is a multiple sequence alignment of all sequences in UniProt (and that includes all sequences from the better known SwissProt) that BLAST picks up and that are above the famous Sander and Schneider curve that tells you when an alignment is significant, against the sequence of the PDB file. This is useful for stability engineering because a) You can get ideas where to make mutations (don't mutate a conserved residue because if it is conserved it is important even if you don't know what it is important for); b) It can make suggestions about what to mutate like introducing a cysteine bridge; c) It can tell you what not to do. If you want to introduce a proline somewhere, but there is no proline found in the alignment at that position, then that proline might not be a good idea.
Given two sequences: DCAGWLPYTXGE and DCAGWLGHAARTSPAFGLREWPYTXGE. Both are equally thermostable. In both cases residue X can be mutated to become a cysteine that bridges with the other Cys at position 2. Which peptide is most stable after the mutation (neglect side effects like atomic clashes, strain, etc) and why?
The second one because the introduction of a cysteine bridge improves stability by reducing the freedom in the unfolded form. The further the cysteines are away from each other in the sequence, the more entropy is lost in the unfolded form that is then not lost upon folding.
Why is a hydrogen bond with the own backbone better for Asp than for Glu?
In both cases you gain enthalpy of the interaction at the cost of losing the entropic freedom of the side chain. Glu has more freedom than Asp, so it has more to loose when getting fixed in a H-bond.
What can we learn from structure comparisons?
The correct answer is "a lot". But at the exam, I would appreciate if you would elaborate a bit. Some things you can learn is a) how conserved is the position of a loop; b) Is a water bound in the active site always there; c) Is the odd rotamer observed at a certain position conserved and thus important, or the accidental result of crystal packing; d) Do all ligands always make a contact with a certain (conserved) residue?
Why do all good secondary structure prediction methods use multiple sequence alignments as input?
Best explained by an example. If you see something like DALWAMPKLLELMLQ then you think "Wow, what a beautiful helix, except for that shitty Pro in the middle. If you now see that most homologs have an equally beautiful helix pattern without that Pro, then you know that that Pro is an exceptional Pro in an otherwise nice helix.
What is the major problem when using force fields?
Several answers are correct here. One is determining what is the null-model, i.e., how do you define the situation in which everything is random? This often comes out to be an intellectual challenge. Another problems is that using a force field, like for example in molecular dynamics, takes very, very much CPU time to do it right. Sometimes there isn't enough data to properly design the force field from. Don't worry, in 2011 I would formulate such a question a bit less 'open-ended').
What types of motion are important for enzyme activity?
A whole lot. Almost in order of appearance in the plot: a) The protein happily wobbles around with a bit of freedom here and there (most in the side chains at the surface); b) Protein and substrate swim around due to the Brownian motions (sometimes supported by gradients or even active transport processes); c) The protein meets the substrate and this normally leads to a reduction of freedom in the ligand, to water from the active site pocket gaining entropy when it is replaced by the ligand, to protein active site residues 'locking in' on the ligand, and sometimes to an induced fit of residues and even loops in the vicinity of the active site; d) In some enzymes whole domains move with respect to each other, often in a hinge like motion that opens and closes the active site a bit. In the presence of the ligand the closed form is frozen in; e) The motions at or near the quantum level (vibrations, electrons not being at the most likely position, etcetera) that always take place now become important as they make sure that the enzymatic activity actually takes place; f) In most enzyme actions some water is split, and some hydrogen and/or some elctron moves from one place to another (and another, and another...); g) after the action took place the whole process goes in reverse, so the product(s) leave the active site, water gets back in, induced fits are uninduced, domain motions atrt again, etc.
Give two significantly different classification schemes for membrane proteins.
One scheme would be by function (transporters, receptors, defense proteins, etc), but you can also use their structure: Classify proteins by their transmembrane structure that either is a big circular sheet of strands, or a bunch of helices. Both groups can then be sub-classified by the number of strands in the sheet, and the number of helices in the bundle.
Mention the two major classes of molecules that transfer information through a membrane.
G protein-coupled receptors (or GPCRs), and receptor tyrosine kinases. Especially the GPCRs are a very important target in the drug design industry.
Mention the five most important computational tools for a bioinformatician (and what can you do with them)?
The clear winner is Google, especially in combination with the Wikipedia. PubMed (where you get access to the literature) is important too. Number 3 is BLAST. And after these first three, one can start defending different options. Linux has once been answered (and defended) with success. MRS, or, more general, database lookup software belongs high on the list, as does moleculat visualisation for which we used YASARA during the course. Multiple sequence alignment software must be among your top-5 too.
BRIEFLY describe all steps in homology modeling and mention the most serious problems encountered at each step.
See the homology modelling seminar, sorry, I am not going to type the whole story again.
You get the freshly determined coordinates of an endo-glycanase. Mention several possible ways in which a bioinformatician can determine the active site (residues). (And which active site residues do you expect?)
I would first run BLAST against SwissProt and see if the sequence or any of its close homologs has the active site annotated by the SwissProt experts. But if that doesn't work, make a multiple sequence alignment and look for the conserved residues. If there are too many conserved residues, see if the structure is known or build a model. The conserved residues at the bottom of the biggest surface dent are your best bet (because ligand binding goes best if water is freed-up to gain entopy of water).
What can you tell about this peptide: MNNSAKALTRRGGALTLLAIVLLTLWAIVFMLLLIAFFGGSADA A proteomics experiment indicates that this peptide is 79.9 daltons to heavy (i.e. phosphorylated). Which residue holds this PO₄ group (and why)?
The stretch ALTLLAIVLLTLWAIVFMLLLIA is a transmembrane helix. There are three positive residues in KALTRR, so KALTRR is at the inside (cytosolic side of the membrane) because of the positive-in rule. PO₄ normally is bound to a serine (albeit that it can also bind to Thr or Tyr). The only two candidates are the serines in NNSAK and GGSAD. As phosphorylation is a cytosolic process, the serine you are looking for is the first one.
A weird bacterium lives a normal life at pH 4.8 in a lake with pH 7.2. This bacterium uses the antibiotic peptide: NNGLLLAILMLSLLLAAIVVLLGDGDGNPPP to kill other bacteria that compete for its food. It stores these peptides in a peptosome that, if need arises, quickly presents these peptides to a transfer system that (one by one) brings the peptides across the membrane.
a) Guess how much energy is (minimally) needed per peptide transfer.
b) Describe in some detail how you would calculate this delta-G with a molecular dynamics program.
In 2011 this is no longer an aspect in the course, but you should still know that the rule of 10 exists (see the video on this topic in the video section). If you pump a charge into a gradient, you pay 1 kCal/Mole per pH unit difference per charged residue (i.e. per charge that goes into the gradient; charges that go with the gradient can pay back that energy). The MD story was dropped from the curriculum (albeit that the scheikos should remember from their version of bioinformatics 1...).

This was Vriend's theoretical part of the exam in 2006.

Feel free to answer in Dutch, English or German. I can only give points for an answer if I can read it. I can only overlook errors when I can read it very easily. When in doubt, the shorter answer is virtually guaranteed better. The amount of white space is indicative for the amount of words I think you need for the answer. When an explanation of the answer is not explicitly given, it seems wise to only give an explanation when you are not sure about your answer.

A threonine is buried deeply inside the protein protalionase. Its Oγ doesn't make a hydrogen bond. Which is the best mutation for improving the stability of protalionase? Briefly describe why.
How do you make an antibody against the toxin of the Texan dessert snake? Describe which bioinformatics tools are needed in the process.
Why do all good secondary structure prediction methods use multiple sequence alignments as input?
Why does a salt bridge care (much) less about the inter-atomic distance than a Van der Waals interaction?
What types of motion are important for enzyme activity?
Which is the driving force to keep membranes intact?
The bacterial extracellular paravilon receptor has a sequence that starts with GKNRSKTLLLAILWYLSLLALIMLFFACWLLAINGDSDNG.... This is the major fragment that is always found in proteomics experiments. Sometimes, in those proteomics experiments, this same fragment is found, but about 80 Dalton's too heavy. That must mean phosphorylation. But which Serine is phosphorylated? Briefly explain why it is not any of the other two.
The following sequence fragment was found to contain the active site of Cyclomaltodextrin glucanase. Which are the (two most important) active site residues? Explain Briefly. ....LVGGNTSGDVTIKVESGNSPDLALRAALELAGGSNSEVTVEVTGDSGNRTK....
Why are active sites always located at the bottom of a dent or cave?
Why do many transmembrane helix prediction programs often predict too short helices?
Which are the preferred residues (side chains) to bind Zinc in a protein? And which for Calcium?
Why do small proteins often have more cysteine bridges than big proteins? Think of ALL aspects of this problem (this problem might have more angles and viewpoints than you might initially think).
Your boss wants you to write a secondary structure prediction program. He suggests you use the Chou and Fasman method (you know, the one that relies on one parameter per amino acid per secondary structure type). How would you proceed? Don't skip the details!
Why are recognition sites in DNA often more AT-rich than CG-rich?
Why is Asp a 'better' active site residue than Asn?
Mention two ways to computationally detect cysteine bridges.

Gert Vriend's theoretical part of the exam in 2008

What does the active site of the average Zn-protease look like, and how does a Zn-protease work?
The Zn is normally bound by more than one histidine and occasionally also 'something with an oxygen (Glu, Asp, Tyr, etc). The Zn tends to be involved in activating the water molecule (splitting it and stabilizing the 'half' waters).
Mention several ways to find-out which cysteines are bridged in a protein.
1) Look it up in SwissProt; 2) Load its structure in Yasara and look for bridges; 3) Buld a homology model and proceed as step two; 4) Do a multiple sequence alignment and look for pairs of cysteines that show correlated behaviour (i.e. that are present together and absent together); 5) Search the internet for cys-cys bridge predictors; 6) I know it is dirty, labour intensive, smelling, and painful, but you could of course try to measure it in the lab.
A cysteine in the following peptide is myristoylated, and an asparagine that is very far away from it in the sequence is glycosilated. Underline the myristoylated cysteine and the glycosilated asparagine. TLSNATCSGLWILAMVLLAMILSLAMVVLAMTRKAACGNATAQAG
The bit LWILAMVLLAMILSLAMVVLAM is a transmembrane helix. The RK after this helix are positive charges and thus cytosolic. Glycosilation is an extracellular process, and thus at the other side of the helix from RK. So, the N is SNA is glycoslilated and the C in ACG is myristoylated.
If you want to stabilize a protein by the introduction of a hydrogen bond you must make much more precise prediction than when you want to stabilize a protein by the introduction of a salt bridge. Why?
Hydrogen bonds require proper orbital overlap, so that sticks very precise, while a salt bridge works with q₁ * q₂ / r in which r is the distance between the charged groups q₁ and q₂. A 1/r relations is much more fault tolerant than orbital overlap that deals with fractions of Ångströms.
Mention a few terms (formulas not really needed) that are used in the force field of a molecular dynamics software package. And mention a few terms that generally are not yet in use in such force field?
In: Bond lengths, bond angles, torsion angles, VdW and charge-charge interactions; Sometimes also in: H-bonds, planarities. Out: pi-pi stacking, induced polarities, quantum related effects.
A metabotropic glutamate receptor has seven transmembrane helices and a big extra-cellular N-terminal domain of a few hundred amino acids. Which is the dominant driving force that keeps that extra-cellular domain together? And which is the dominant force to keep the transmembrane helices together?
The entropy of water and the entropy of lipids, respectively.
The four cysteines in this protein are bridged 1-3 and 2-4. One of the four cysteines must be mutated so that a free cysteine is left to which a label can be attached. From a protein stability point of view, which cysteine bridge will you destroy and which one will you leave intact? And Why?
```
    LVGGNCSGDVTEVTVEVTGDSIKVESGNSPDLACRCALECAGGSNSGNRTK
    .....1...........................2.3...4...........
  
```
I added the bar and the numbers as part of the answer. The stabilizing effect of a cysteine bridge is bigger when the cysteines are further away from each other in the sequence. So, don't touch the bridge 1-3.
Mention at least five (different) roles for water in living cells.
The entropy of water is the driving 'force' for many biological processes including keeping proteins folded; Water is the solvent in which everything happens; Water is involved in nearly all enzymatic reactions if not as a substrate or product, then as a part of the catalysis; Water can be part of recognition like bridging hydrogen bonds between protein and DNA; Water often also sits inside proteins to keep hydrogenbonding groups as satisfied as is possible.
The protein blablase has the sequence:
```
     L E A L M L G P V T I T V T I
     1 2 3 4 5 6 7 8 9 0 a b c d e
     H H H H H H L L S S S S S S S
 
```
Draw a fancy Ramachandran plot, and place the digits that I listed underneath the sequence at plausible locations in the plot.
First predict ist secondary structure (I added that as part of the answer...). The Hs get crosses at around -50,-50 in the helix area. The Strands at around -150,150, and the turn/loop GP gets two crosses in any of the three areas (Helix, Strand, Left-handed helix). However the G cannot be helical, and the P not strand.
Why do we find more A and T in promotor binding sites than C and G?
In promotor regions DNA has to wind/unwind/open-up. AT heas only two H-bonds, and thus can more easily undergo structural changes than GC.
The 20 normal amino acids together have in total 163 atoms. If we want to make a very fancy force field to determine the all-atom contact energy, we can do that by making for each of the 163*163 possible inter-atomic interactions (actually we need to do roughly half of those, of course) a histogram in which we represent how often two atoms were found at a distance between 2.5 and 2.6 Angstrom, how often they were found at a distance between 2.6 and 2.7 Angstrom, etc.
If we really were to do this and count all the distances in 25000 proteins in the PDB, what would the plots look like for the contacts between backbone carbonyl oxygens and the O-gamma of serine? And what would the plot look like for the contacts between The Alanine C-beta and the Methionine C-epsilon? I want you to draw both distributions as curves (not as histograms, that is too much work and becomes too unclear). Draw them both for the distance range 2-10 Angstrom in the same plot so that the lines (essentially) overlap at higher distances. Draw the carbonyl-serine line solid and the alanine-methionine line with a finely dashed line.
This is no longer part of the course in 2011.

Gert Vriend's practical part of the exam in 2008

(There will be no practical part in 2011)

Below two questions are listed. Make only one of the two.
1) Uracil-DNA Glycosylase
a) What is the function of this molecule?
b) Describe its structure?
c) Are you surprised by the location of the active site? And why, or why not?
d) Many Uracil-DNA Glycosylases contain an iron-sulfur cluster. What is the role of this cluster?
e) Describe and schematically draw the iron-sulfur cluster.
2) ZIF268 ZINC FINGER-DNA COMPLEX
a) What is the function of this molecule?
b) Why is the DNA recognition of this protein more specific than that of many other DNA binding proteins?
c) What is the role of the Zinc atoms in this molecule?
d) Mention at least ten protein-DNA contacts that ZIF268 'uses' to achieve specificity.
e) Describe and schematically draw the direct environment of the middle of the 3 Zn ions.

And here is the 2009 exam, with video answers.
Here one question from 2009 that is likely to be used again one day...

2011 version with answers.

2013 version with answers.

Some hints for the infamous amino acid test

This section is mainly meant to help the Scheikos who had the misfortune of never doing the amino acid test in their life yet, but perhaps others might like this page too.

The main concept is that:
If you understand the amino acids,
you understand everything.

First the infamous amino acid exam (that will guaranteed be part of the amino acid test for the scheikos):

If you want to know more about the amino acids, you might want to take a look at the following twenty (short) videos that each tell you a few things about one amino acid. These are the special characteristics (that is the right-hand side column in the amino acid test):

Click on the picture to make it rotate. Click on the camera for a small video on that amino acid.

Alanine
Cysteine
Aspartic acid
Glutamic acid
Phenylalanine
Glycine
Histidine
Isoleucine
Lysine
Leucine
Methionine
Asparagine
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine

The other columns

Amino acid names

The names of the amino acids you will need to learn by heart, I am sorry, I know how bad this feels, but trust me, it will help you a lot during the rest of the course.

Amino acid size

The amino acids can be sorted by size as:

GASCTVPLINDMQEKHRFYW

In this list we simply counted non-hydrogen atoms. Obviously, the S atomes are much heavier than the C, O, and N atoms, and we don't count protons, but I think this is precise enough to understand what you are looking at when you see a protein, and intelligently looking at and computing on protein structures are the main topics of the course. You can also look at the masses of the residues and will give a slightly different picture. I personally prefer to remember the word LIND (coloured red in the amino acid series GASCTVPLINDMQEKHRFYW) for the intermediately large ones. As it is easy for a scheiko to at least remember the approximate structures of the amino acids, it should be easy to remember which amino acids are swall and which are big, Obviously, it would also be OK to use: GASTVPCLINDMQEKHRFYW, GASTVPCLINDMQEKHRFYW, or other variants. But the solution GASTVPCLINDMQEKHRFYW, although correct, will not give you many points.

Amino acid hydrophobicity

Hydrophobicity is a difficult concept, related to the entropy of water. But for the amino acid test you need to remember that FCVIPWALM are hydrophobic, YST have intermediate hydrophobicity, and the rest (DENQHRK) are hydrophilic. The only exception is G that has no side chain, and thus has a hard to define hydrophobicity (undetermined).

Amino acid charge

The charges of the negative residues are known ones you know their names (aspartic and glutamic ACID). The other three you will have to learn by heart RK are always positive, and H can be positive, neutral, and negative. At physiological pH histidine is normally neutral (90%), often postively charged (9%), and occasionally negatively charged (1%) in your body.

Amino acid secondary structure preference

There exists an audio seminar on this topic. But you can also try to remember

Helix : AMELK
Strand: VITWYF
Bturn : PSDNG

And those are pronouncable words...

Closing remarks

Please be aware that your biochemistry books were written by non-bioinformaticians. I know there are many books going around at the science faculty that list, for example, cysteine as hydrophilic. And that is wrong. And if you are interested, feel free to poke around at:

Amino acid background material

The table below lists 5 items. You should now briefly look at this material, and study it either at your leisure at home, or when you need it while going through the questions later today or in the coming weeks. The five items are:

Once more the amino acids. This is just one picture with the covalent structures of the amino acids shown. You can use this picture if you forgot to take the orange NBIC amino acid sheet.
Physico-chemical properties. This is a pointer to a long, long list of tables with amino acid properties. In this physico-chemical properties section you can for now skip most parts, but you should look at the parts about solvent accessibility (1.5), chemical classification (3.1), hydrophobicity scales (3.2), and the genetic code (5.1). The other sections will be discussed later.
The bedtime story about the amino acids explains the functional characteristics of the amino acids. This part overlaps with the seminar, but here you get it explained by somebody else, and sometimes that helps clarify things.
Information for the real fanatics The information for the real fanatics forms nice reading, but does not contain material required to do well in the examination.
The infamous amino acid test is listed so that you can print it when needed/desired.

Some old exams

Putative exam questions

This is a list of questions that I can imagine to end up in future exams...

Mutations and stability

Give me five very different reasons why somebody might want to mutate a protein.
When we want to make a protein more stable, we can use the concepts entropic stabilisation and ethalpic stabilisation. a) Explain what is meant with these two terms. b) What are the differences, and what do they have in common?
I want to make my protein more stable, so I decide to mutate a very exposed isoleucine into an aspartic acid, to make the surface of my protein more hydrophilic. Explain why this is a stupid plan.
What is helix capping? and how can it be used to make a protein more stable by mutagenesis?
When working on increasing the stability of my protein by mutagenesis, I often consult a table that holds all the (backbone) torsion angles. Why?
What is a rotamer? And what makes that certain residues in certain secondary structures have only a limited number of rotamers available to them?
The stability of a protein normally is detefined as the ΔG of the U<-->F process in which U stands for fully unforlded and F stands for folded protein. For industrial protein applications this often is not a good definition. Why not?
If I want to make a protein more stable I can try to make mutations that add extra hydrogen bonds. I can also try to make mutations that add more saltbridges, and that is easier than doing it by making hydrogen bonds. Why?

DNA and RNA

DNA normally has a major groove and a minor groove. What is the difference between those grooves?
Describe some protein-DNA interactions (in detail, including the names of the atoms involved) that can contribute to protein - DNA binding specificity.
When transcription factors bind to DNA, they need to do that with some specificity. Explain, for example for the TATA-box binding protein, how this specificity is achieved.
Which atoms are involved in H-bonding in Watson Crick base pairing?
A protein specifically recognizes the human DNA sequence ACCAC (and it counter strand TGGTG). a) At how many places can this protein bind to the human DNA? b) This protein nevertheless binds the DNA rather specifically. How is that possible?

Amino acids structure and function

Aquaporins will let water true in two directions. a) What determines in which direction the water will go through? b) How come aquaporin lets water through at a high speed, and potassium and sodium, for example, almost not at all. c) Part of the selectivity of aquaporin is caused by interaction with the water dipole; which residue (type) has the crucial role in this dipole interaction? d) Ps, what is a dipole? e) Draw the dipole of water and draw the dipole of this one special residue.
Why do we normally see two aspartates involved in binding Ca²⁺, and ten time less often two glutamates? b) How often do you expect to see a Ca²⁺ bound by two glutamates? c) Why? (Ps, obviously there are more atoms involved in the binding, but I only look at the negatively charged ones for this question).
Nature often wants to temporarily store an electron on a metal ion. That doesn't just happen like that. Proteins that hold the metal ions on which the electron sits for a while need to do something to compensate. Which tricks do the proteins have up their sleeve to accomodate electron storage?
Why does nature often use copper or iron ions to 'store' electrons in electron transport processes, and why not sodium or calcium?

Protein details

What is the difference between a bond angle and a torsion angle?
Why do we have the term n in the energy term for torsion angles V = K*(1+cos(n*φ-φ₀), and which values can it have when we do MD on a normal protein?
In the Ramachandran plot for all aspartic acids in the PDB I see more crosses outside the contour lines than in the ramachandran plot for all glutamic acids. Why?
Draw the backbone of gly-gly-gly. Indicate φ, ψ, and ω, and the atoms involved in each of them.
Why is a saltbridge between residue 17 and 43 more important for stability than a saltbridge between residues 23 and 34?
What formula is used in molecular dynamics and energy minimisation calculations to calculate the energy contribution of a saltbridge to the total energy of the molecule?
Why does the contribution of a saltbridge to the stability of the protein get less when we dissolve the protein in 1 M NaCl?
The formula for the electrostatic term in molecular dynamics and energy calculation software contains some ε (epsilon) terms. What do these terms stand for? If I add salt to the protein (in the simulation), will tha make saltbridges stronger or weaker? And what should happen to the ε term(s) to account for the salt? And does this all make sense in terms of what you know from the physical aspects of things?
If an enzyme cleaves a dipeptide, many things are mobile and many things move. Can you describe the whole cleavage process mentioning every type of motion along the path?

Interactions

Give me a series of very different roles for water in a cell.
What are the roles of sugars in a cell?
Why do we know very much about the structure(s) of proteins, much less about the structures of RNA molecules, and barely anything about the structures of sugars and membranes?
When I see a sodium ion in a PDB file, I get worried that it might not be a sodium at all. Why do I get worried?
When I see an Rh, Pd, Ta, or Os ion in a PDB file I get worried. Why?
Which truc do ion-channels apply to make sure they only let through their own ion, and not just anything?
Why are many potassium channels actually tetrameric molecules?
Why do we know sodium pumps and potassium pumps, but not water pumps? We need to transfer not only sodium and potassium over membranes, but also water, don't we?
I see electron density for an ion that is bound by Asp 17 Oδ1, Ala 123 C=O, Gly 64 C=O, and three waters. I see density for another ion that is bound by His 34 Nδ1, His 38 Nε2, Ser 111 Oγ, Glu 117 Oε2, and one water. And I see density for a third ion that is bound by Asp 8 Oδ1, three waters, Thr 14 Oγ, and Glu 22 Oε1.
I know from proteomics measurements that the ions must be sodium, magnesion, and Zinc. Which of the three sites holds which of the three ions?
If I see an ion in an active site, it normally is a Zn, Fe, CU, and essentially never a Na, K, or Ca. Why?

Protein structure and validation

How do X-ray crystallography and NMR work? What kind of data is collected? How is that data converted into structure coordinates?
What are the advantages and disadvantages of NMR and X-ray when determining protein structures.
When we use either NMR or X-ray to determine the structure of a protein, we make different types of experimental errors (both different types of systematic errors and different types of random errors). Why? And which are the more significant errors that are radically different for both methods?
What is an R-factor? And what is a B-factor?
The WHAT_CHECK software checks if ions are correct or actually should be another ion. Why? I.e. how come something as important as an ion can be measured incorrectly by X-ray or NMR?
Several structure validation programs provide lists of residues that should be flipped. What is meant with a flip, and how come such errors are being made upon solving structure coordinates experimentally?
What is a helix dipole? Is a helix dipole good for the stability of the folded protein or bad? What does nature use the helix dipole for?
What is helix capping?

Bioinformatics techniques

(In this section numbers do not need to be correct, but they must be plausible).

Mention five different computational techniques that people have used over the years to predict the secondary structure of proteins.
What is the positive in rule? And when does a bioinformatician use it?
Which forces are commonly applied in molecular dynamics software?
What is the 'time step' in molecular dynamics? And what is a good value for the time step?
What is the average speed of a Leucine Cδ atom in a protein (in m/sec)? So, how far does this atom move in one MD time step?
How does molecular dynamics work? Explain what the computer program does per time step, and explain what happens from time-step to time step.
What is the definition of a force field?
How would you make a force field to predict transmembrane helices in protein sequences? (If the answer does not hold the words 'null-model' and 'validation or callibration of the method' it will be wrong...).
If I want to predict the secondary structure of water soluble proteins, I normally use a machine learning technology called Artificial Neural Network. If I want to predict whether a helix is transmembrane or not, I prefer using a Support Vector Machine. Why do I use such different machine learning techniques for two seemingly similar problems?
Try to think of three biological questions that require a machine learning based approach to get them answered.
What is the difference between a Decision Tree and a Random Forest?
How would you predict the secondary structure of proteins with a Artificial Neural network? I.e. what would the input look like? What would the neural network look like? How would you train the neural network? How would you test the neural network?

Non-exam questions that you should be able to answer anyway...

How many virus particles flow under the Waal-bridge per day? And how big is the total volume of those particles? Despite all these viruses, we can swim in the Waal in the summer without (normally) getting sick. Why?