Answer:


The main problem with this exercise is to know what we are actually doing.

Normally, students run BLAST 8 times in 10 minutes, see that number 7 has annotation that looks like the disease, and that is it.
However, we don't know if the disease is mono-genetic, or that two or more genes are involved. So, they should carefully look at each of the 8 proteins!

Further, it always needs explaining that we are using BLAST to check if the sequences are 100% (full length) identical with any human SwissProt file. SwissProt holds the protein sequences of proteins extracted from healthy individuals. Any difference between the Query sequence and the best SwissProt BLAST hit is indicating a potential candidate.

The low complexity filter should not be used because many disease causing mutations (or deletions or insertions) tend to sit in low complexity regions. Examples are the poly-Gln length differences in, for example, Huntigtine that causes Huntington's disease.

Sequences hiding place

The sequences are stored as the file BIGONE in the directory swift.cmbi.umcn.nl/teach/geheim/

Sequence1

100% identical with hint1_human. But...: the first Met is missing in our query sequence. The Feature table in hint_human tells me:

init_met 	1 	1 	1 	 Removed.

which indicates that the absence of the Met is normal. So, most likely we can forget sequence 1. But if we don't find any good candidate further down the list, we should study this missing Met in greater depth.

Sequence2

At position 47 in hyalp_human we see an A where the SwissProt file has a V. So, would this be our candidate? The Protein Name in the SwissProt file tells us:

Protein name 	Hyaluronidase PH-20 (Hyal-PH20), EC:3.2.1.35
Hyaluronoglucosaminidase PH-20
Sperm surface protein PH-20
Sperm adhesion molecule 1

which suggest this to be a typical male protein. And, even though most women claim that most men cannot listen, this still doesn't make a sperm protein a good cause for deafness. Further, the Features in the SwissProt file tell me:

variant 	47 	47 	1 	 V -> A (in dbSNP:rs34633019).
/FTId=VAR_049213.

which seems to suggest that this is a normal natural variant, or the result of a SNP.

Sequence3

There is one amino acid different with grm8_human. Our Query sequence has an X at position 895. An X means that that residue is unknown. Probably the sequencing was only done in one direction, and then it is not unlikely that 895 bases from the start sometimes the signal to noise ratio gets too low for the determination of a base-type.

Sequence4

This sequence is 100% identical to ndua5_bovin. And that means cow?! I guess that the person who gave us the sequences has been fiddling him/her self with the sequences, and upon cut-n-pasting the sequences in the mail to me, made one little mistake... Another explanation doesn't come to (my) mind.

Sequence5

100% identical to 1v403_human. But...: the first ten residues of our Query sequence are missing. That can be a PCR problem, alternate splicing, a missed exon, or a cut-n-paste error (protein sequences are by many programs listed in blocks of 10...).

Sequence6

100% identical with dss1_bovin AND with dss1_human, This is surprising in itself, but no cause for deafness. But...: we only have in our Query sequence the first 45 of the expected 70 amino acids. Alternate splicing, missed exon, human error?!

Sequence7

"Ladies and gentlemen, weve got him". There is one mutation relative to tomt_human, and that SwissProt file holds, for example:

Keywords 	Alternative splicing; Catecholamine metabolism; Cytoplasm; Deafness; Disease mutation;
                Hearing; Membrane; Methyltransferase; Neurotransmitter degradation; Transferase; Transmembrane

Sequence8

This sequence gives no hits. When you search with a higher E value, or in a larger database you might find something, but sequence 8 seems to be non-existent. That can be because it is part of a sequencing vector, because it is an intron, or because perhaps even contra-strand was sequenced. There are always 1000 ways to do something wrong but just one way to do it right.

Summary

So, sequence 7 is the prime candidate, but none of the other ones can be totally excluded. Honesty dictates that I tell you that except for sequence 7, this whole experiment is utter fantasy, just to let you see what all can go wrong that you should pay attention to...