Normally, students run BLAST 8 times in 10 minutes, see that number 7 has
annotation that looks like the disease, and that is it.
Further, it always needs explaining that we are using BLAST to check if
the sequences are 100% (full length) identical with any human SwissProt file.
SwissProt holds the protein sequences of proteins extracted from
healthy individuals. Any difference between the Query sequence and the best
SwissProt BLAST hit is indicating a potential candidate.
The low complexity filter should not be used because many disease causing mutations (or deletions or insertions)
tend to sit in low complexity regions. Examples are the poly-Gln length differences in, for example, Huntigtine
that causes Huntington's disease.
The sequences are stored as the file BIGONE in the directory swift.cmbi.umcn.nl/teach/geheim/
100% identical with hint1_human. But...: the first Met is missing in our query sequence. The Feature table
in hint_human tells me:
The main problem with this exercise is to know what we are actually doing.
However, we don't know if the disease is mono-genetic, or that two or more
genes are involved. So, they should carefully look at each of the 8 proteins!
Sequences hiding place
Sequence1
init_met 1 1 1 Removed.
which indicates that the absence of the Met is normal. So, most likely we can forget sequence 1. But if we don't
find any good candidate further down the list, we should study this missing Met in greater depth.
At position 47 in hyalp_human we see an A where the SwissProt file has a V. So, would this be our candidate? The Protein Name in the SwissProt file tells us:
Protein name Hyaluronidase PH-20 (Hyal-PH20), EC:3.2.1.35 Hyaluronoglucosaminidase PH-20 Sperm surface protein PH-20 Sperm adhesion molecule 1 |
which suggest this to be a typical male protein. And, even though most women claim that most men cannot listen, this still doesn't make a sperm protein a good cause for deafness. Further, the Features in the SwissProt file tell me:
variant 47 47 1 V -> A (in dbSNP:rs34633019). /FTId=VAR_049213. |
which seems to suggest that this is a normal natural variant, or the result of a SNP.
There is one amino acid different with grm8_human. Our Query sequence has an X at position 895. An X means that that
residue is unknown. Probably the sequencing was only done in one direction, and then it is not unlikely
that 895 bases from the start sometimes the signal to noise ratio gets too low for the determination of a
base-type.
This sequence is 100% identical to ndua5_bovin. And that means cow?! I guess that the person who gave us
the sequences
has been fiddling him/her self with the sequences, and upon cut-n-pasting the sequences in the mail to me, made one
little mistake... Another explanation doesn't come to (my) mind.
100% identical to 1v403_human. But...: the first ten residues of our Query sequence are missing. That can be a PCR problem,
alternate splicing, a missed exon, or a cut-n-paste
error (protein sequences are by many programs listed in blocks of 10...).
100% identical with dss1_bovin AND with dss1_human, This is surprising in itself, but no cause for
deafness. But...: we only have in our Query sequence the first 45 of the expected 70 amino acids. Alternate splicing,
missed exon, human error?!
"Ladies and gentlemen, weve got him". There is one mutation relative to tomt_human, and that SwissProt file holds, for example:
Keywords Alternative splicing; Catecholamine metabolism; Cytoplasm; Deafness; Disease mutation; Hearing; Membrane; Methyltransferase; Neurotransmitter degradation; Transferase; Transmembrane |
This sequence gives no hits. When you search with a higher E value, or in a larger database you might find something, but
sequence 8 seems to be non-existent. That can be because it is part of a sequencing vector, because it is
an intron, or because perhaps even contra-strand was sequenced.
There are always 1000 ways to do something wrong but just one way to do it right.
So, sequence 7 is the prime candidate, but none of the other ones can be totally excluded. Honesty dictates that I tell you that except for sequence 7, this whole experiment is utter fantasy, just to let you see what all can go wrong that you should pay attention to...