After completing this section you will: |
EU name: BIODAT
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
In this course we will mainly use data from three databases. Be aware, though that there are thousands of databases available to you! The three databases we will most often look at are:
![]() |
Figure 10. SwissProt is a well-curated database of protein sequences. |
![]() |
Figure 11. PDB is more a databank than a database. The PDB was started by people at the Brookhave national lab. Nowadays the PDB is kept at Rutgers; with mirror systems at the EBI and in Japan. It holds macromolecular structures solved mainly by X-ray or NMR. These are mainly protein structures but also DNA, RNA, and all kinds of complexes. |
![]() |
Figure 12. EMBL is not only the name of a research institute, but it is also the name of the international depository for nucleic acid sequences, the EMBL database. |
Although we will mainly use Swissprot, the PDB, and the EMBL database, we will also briefly use OMIM and Prosite, and you need to know that UniProt exists. These databases will be discussed at some later time during the course and are here mentioned just for sake of completeness.
![]() |
The OMIM databank is the brainchild of Victor A McKusick. This databank holds description of phenotypes for a whole series of disease causing SNPs / mutations in the human genome. OMIM stands for On-line Mendelian Inheritance in Man. |
![]() |
The Prosite database holds information about sequence patterns that indicate potential post-translational modification sites, cleavage sites, active sites, etc. |
![]() |
Figure 13. UniProt is a much larger protein sequence database that one normally should use if SwissProt doesn't hold what you are looking for. During this course SwissProt alone will always be enough to answer the questions. |
All good databases, in principle, should contain the five data elements:
![]() |
Figure 14. Unique identifier, or accession code |
![]() |
Figure 15. Name of depositor |
![]() |
Figure 16. Literature references |
![]() |
Figure 17. Deposition date |
![]() |
Figure 18. The real data |
EU name: BIODQ1
(From: ../EUDIR )
(Date: Jan 27 17:59 ../EUDI)
Question 1:
1) Which three databases are being used in this course and what kind of
data do they contain?
2) Find out for each of the three databases if the 5
'essential' data elements are really present.
3) Look at the GPCRDB and SwissProt. Both provide biological
data to the user.
What do these systems have in common and what are the major differences?
(Hint 1: Perhaps you can first answer the question which of the two systems
is called an information system and which is called a database?
Hint 2: Think about the data-types they provide, and think about the
completeness of the stored data per data-type. Hint 3: Think of the types
of questions that the systems might help answer.)
Answer
Question 2:
Later we will teach you how to use MRS to find the so-called flat-file
version of the EMBL entry for human lysozyme with accession code
X14008. For now, use this local copy that was in August 2012 stored
at X14008.docx.
At the left-hand side you see many two letter codes. These are the so called keys.
Try to find out what all important keys mean (so complete at least the table below):
Two letter key Write here in a few words which information this key points at ID AC DT DE KW R* DR FT SQ |
The question what the XX record is good for is both very simple and very complicated at the same time. If there is time left, give it a shot...
Answer