• BLAST extra

EU name: BLASTX

(From: ../EUDIR ) (Date: Jan 27 17:59 ../EUDI)

BLAST for other purposes

In this course we will concentrate on the normal BLAST that searches with a protein query sequence in a protein database, and to make sure that we all do the same searches so that the assistants can help you optimally, we only search in the SwissProt database.

However, there are many other BLAST variants. For example, you can search with DNA in a DNA database, or even with protein in a DNA database.

Figure 37. Variants of BLAST that can do "other" searches. These other searches are explained in the supplemental material listed below this figure.

Supplemental material

EU name: COMPLX

(From: ../EUDIR ) (Date: Jan 27 17:59 ../EUDI)

Low complexity regions

We know already that crambin is a seed storage protein from the thionin family. Recently we found a funny crambin variant:

ASCCPSIKVRSNYELCRLPGTPEDLCASFEGCIKIPGATCPNNNNNNNNNNNNNNNNNNNNNNNNNNNN

We want to know if this funny tail of N-s is real or some kind of cloning artefact. So we run BLAST on it to see if it has already been described or that perhaps a close homolog has been described from which we can transfer information. Run with the low complexity filter switched off.

WARNING. When MRS BLAST reformats a sequence because there is no ">something" up front, it also resets all other parameters, including the low complexity filter!

Question 27: Look at the output of this run on ASCCPSIKVRSNYELCRLPGTPEDLCASFEGCIKIPGATCPNNNNetc. We know it is a crambin variant, but why doesn't BLAST find any thionin-like protein as the best hit?

Answer

BLAST parameters

Sometimes you will have to change some parameters before starting your BLAST search. Run BLAST on the sequence:

>many prolines in this sequence...
 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

but before you hit the famous Search button, switch off the low complexity filter.

Question 28: In the BLAST output, look at the comparison between the sequence of 40 prolines and (parts of) the database sequences found. What is going on with this last BLAST-run? Can you infer from the Blast output a relation between your query sequence and the database hits? Can you say anything about the function of your query protein? And if not, why not?

Answer

Question 29: Explain in your own words what you think is the function of this 'Filter low-complexity regions' button?

Answer

EU name: BIGONE

(Date: 6 Aug 14 2018 BIGON)

*TEXT

Important. First think about your strategy.
Then discuss this strategy with the assistant.
And start the actual BLAST runs only after
discussing the strategy with the assistant.

There is a family that suffers from an unpleasant genetic disorder. A certain fraction of the members of the family suffer from it. They are born with poor hearing, and they turn completely deaf at middle age. They came to our hospital about 3 years ago, and using the family analyses techniques as discussed in the seminar by Hannie, the problem was located in a small area of chromosome 7 that is about 2.7 Mega bases long. This section was sequenced by a bachelor student who, unfortunately, hadn't followed any bioinformatics courses. He therefore asks you to help him with the data analysis. His question is clear. Which of the eight proteins listed below is the best candidate for further research?

Question 30: We are going to find out which protein causes the problem.

Make a plan how to find this out. Don't run any BLAST or anything before you have discussed your (very detailed) plan with an assistant. The assistant will, when happy with your plan, tell you where you can find the sequences.

The rules for access to the sequences are given as supplementary material below this question. Use them to answer the crucial last question: Which problem(s) in which molecule(s) caused the disease?

Answer

Supplemental material