After completing the PDB section and the corresponding MRS exercises you will: |
The Protein Data Bank, or PDB for short, is the collection point for macromolecular structure data (three dimensional coordinates). For many years the PDB was located at the Brookhaven National Laboratory on Long Island. Some years ago, the PDB moved to Rutgers, the EBI became the European partner. And the Japanese partner is situated in Osaka. The CMBI Edu-Wiki entry for the PDB ( PDB) holds pointers to the several PDB websites around the globe.
Crystallographers and NMR spectroscopists are obliged to deposit coordinates in the PDB if they want to publish their results in a journal. Publishing many articles in high quality peer-reviewed journals is the most successful survival strategy for research groups in academia. Therefore the PDB is growing rapidly.
However, the PDB is a databank and not a database. That means that the PDB accepts the data just the way they are submitted to them. This means that administrative and scientific errors will often not be corrected before the coordinates are released to the users.
PDB files are so-called keyword-organised flat-files. That means that, the file is human readable (flat file or ASCII file) and every line starts with a keyword (in the PDB that is a 3-6 letter code). These keywords explain what kind of data follows on the line. The name of each PDB file is a four-letter word consisting of 1 digit followed by three characters that can be a letter or a digit. This four-letter word is both the name of the file and at the same time its unique identifier.
![]() |
Figure 20. The PDB format is old. Indeed, it is so old that computers still worked with punched cards when the PDB was designed. The whole PDB was transported originally in boxes like this. |
![]() |
Figure 22. A metal Cα-trace of a protein, made with Byron's bender. |
The PDB is located at http://www.rcsb.org/pdb/, and the European partner is located at http://www.ebi.ac.uk/pdbe/node/1. The best way to obtain a PDB file is via MRS. If you do so, use the Download facility. Be aware that most computer programs prefer it when a PDB file has the extension .pdb, but MRS wants to save the file with the extension .txt. So either you change the extension upon downloading, or you change the file name afterwards. The download procedure is explained in this help file.
The supplemental material below is the PDB file for crambin:
Supplemental materialSome of the more important records will be described below.
HEADER PLANT SEED PROTEIN 30-APR-81 1CRN |
The header describes the molecule, gives the deposition date, and repeats the file name.
COMPND CRAMBIN |
This is the name of the molecule. For proteins this should in principle be the same as the DE line in SwissProt files. I extracted the first five proteins (alphabetically) for which the PDB and SwissProt files exist, and for which cross pointers exist in both directions.
DE 14-3-3 PROTEIN ZETA/DELTA (PROTEIN KINASE C INHIBITOR PROTEIN-1) COMPND 2 MOLECULE: 14-3-3 PROTEIN ZETA; |
DE HLA CLASS I HISTOCOMPATIBILITY ANTIGEN, A-2 ALPHA CHAIN PRECURSOR. COMPND HUMAN CLASS I HISTOCOMPATIBILITY ANTIGEN A2 (/HLA-A2$, COMPND 2 HUMAN LEUCOCYTE ANTIGEN) |
DE PROTEIN PHOSPHATASE PP2A, 65 KD REGULATORY SUBUNIT, ALPHA ISOFORM DE (PROTEIN PHOSPHATASE PP2A SUBUNIT A, ALPHA ISOFORM) (PR65-ALPHA) DE (MEDIUM TUMOR ANTIGEN-ASSOCIATED 61-KD PROTEIN). COMPND 2 MOLECULE: PROTEIN PHOSPHATASE PP2A; COMPND 3 CHAIN: A, B; COMPND 4 FRAGMENT: 65 KD REGULATORY SUBUNIT; |
DE DNA-3-METHYLADENINE GLYCOSYLASE (EC 3.2.2.21) (3-METHYLADENINE DNA DE GLYCOSIDASE) (ADPG) (3-ALKYLADENINE DNA GLYCOSYLASE) (N-METHYLPURINE- DE DNA GLYCOSIRASE). COMPND 2 MOLECULE: 3-METHYLADENINE DNA GLYCOSYLASE; |
DE ALZHEIMER'S DISEASE AMYLOID A4 PROTEIN PRECURSOR (PROTEASE NEXIN-II) DE (PN-II) (APPI) [CONTAINS: BETA-AMYLOID PROTEIN (BETA-APP)]. COMPND PROTEASE INHIBITOR DOMAIN OF ALZHEIMER'S AMYLOID COMPND 2 BETA-PROTEIN PRECURSOR (/APPI$) |
As you can see, the DE and COMPND records indicate the same molecule, but the nomenclature (i.e. naming system) is sufficiently different to make automatic searches across databases using the descriptions on these records very difficult.
REMARK 1 1CRN 7 REMARK 1 REFERENCE 1 1CRNC 2 REMARK 1 AUTH M.M.TEETER 1CRNC 3 REMARK 1 TITL WATER STRUCTURE OF A HYDROPHOBIC PROTEIN AT ATOMIC 1CRNC 4 REMARK 1 TITL 2 RESOLUTION. PENTAGON RINGS OF WATER MOLECULES IN 1CRNC 5 REMARK 1 TITL 3 CRYSTALS OF CRAMBIN 1CRNC 6 REMARK 1 REF PROC.NAT.ACAD.SCI.USA V. 81 6014 1984 1CRNC 7 REMARK 1 REFN ASTM PNASA6 US ISSN 0027-8424 040 1CRNC 8 Some lines removed REMARK 2 1CRN 21 REMARK 2 RESOLUTION. 1.5 ANGSTROMS. 1CRN 22 Some lines removed REMARK 9 CORRECTION. CHANGE DEPOSITION DATE FROM 31-APR-81 TO 1CRND 4 REMARK 9 30-APR-81. 29-FEB-87. 1CRND 5 |
REMARK records were originally meant for English text remarks about the structure. But later they got used for everything. For example, there exists a JRNL record to give the literature reference, but if two or more references are needed the REMARK card is needed. Over the years more and more REMARK subrecords have been introduced. The only REMARK that is important for us is the resolution remark. The resolution describes the quality of the X-ray data that were used to solve the structure.
ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70 ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71 ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72 | | | | | | | | | | | | | | | | | | | --> B-factor | | | | | | | | --> Occupancy | | | | | | | --> Z coordinate | | | | | | --> Y coordinate | | | | | --> X coordinate | | | | --> Residue number | | | --> Residue type | | --> Atom type | --> Punch card counter --> Record type keyword. |
The ATOM record contains the atom, residue, coordinates and some information that we will discuss later.