• PDB

EU name: PDBFOR

(From: ../EUDIR ) (Date: 3 Jan 27 17:59 ../EU)

After completing the PDB section and the corresponding MRS exercises you will:
Know that the PDB is a databank of three-dimensional structures of biomolecules (proteins, nucleic acids and the ligands present in their complexes).
Know that the structures in the PDB are derived experimentally, mainly via NMR or X-ray experiments.
Know several of the important fields in PDB files and you are able find information there when needed.

The Protein Data Bank, or PDB for short, is the collection point for macromolecular structure data (three dimensional coordinates). For many years the PDB was located at the Brookhaven National Laboratory on Long Island. Some years ago, the PDB moved to Rutgers, the EBI became the European partner. And the Japanese partner is situated in Osaka. The CMBI Edu-Wiki entry for the PDB ( PDB) holds pointers to the several PDB websites around the globe.

Crystallographers and NMR spectroscopists are obliged to deposit coordinates in the PDB if they want to publish their results in a journal. Publishing many articles in high quality peer-reviewed journals is the most successful survival strategy for research groups in academia. Therefore the PDB is growing rapidly.

However, the PDB is a databank and not a database. That means that the PDB accepts the data just the way they are submitted to them. This means that administrative and scientific errors will often not be corrected before the coordinates are released to the users.

PDB files are so-called keyword-organised flat-files. That means that, the file is human readable (flat file or ASCII file) and every line starts with a keyword (in the PDB that is a 3-6 letter code). These keywords explain what kind of data follows on the line. The name of each PDB file is a four-letter word consisting of 1 digit followed by three characters that can be a letter or a digit. This four-letter word is both the name of the file and at the same time its unique identifier.

Figure 20. The PDB format is old. Indeed, it is so old that computers still worked with punched cards when the PDB was designed. The whole PDB was transported originally in boxes like this.

Figure 21. Together with the PDB one would get a dozen computer programs. One such program would calculate coordinates for Byron's bender, a small metal box that can put fixed well-defined torsion angles in 1 mm thick copper bars. With Byron's bender you could make real three dimensional models:

Figure 22. A metal Cα-trace of a protein, made with Byron's bender.

Figure 23. Every line in a PDB file still resembles a punched card, and the last eight columns (and now don't laugh) are reserved for a code that can help you sort the cards when you drop your PDB file on the floor.

How to obtain a PDB file

The PDB is located at http://www.rcsb.org/pdb/, and the European partner is located at http://www.ebi.ac.uk/pdbe/node/1. The best way to obtain a PDB file is via MRS. If you do so, use the Download facility. Be aware that most computer programs prefer it when a PDB file has the extension .pdb, but MRS wants to save the file with the extension .txt. So either you change the extension upon downloading, or you change the file name afterwards. The download procedure is explained in this help file.

What does a PDB file look like

The supplemental material below is the PDB file for crambin:

Supplemental material

Some of the more important records will be described below.

The HEADER record

HEADER    PLANT SEED PROTEIN                      30-APR-81   1CRN

The header describes the molecule, gives the deposition date, and repeats the file name.

The COMPND record

COMPND    CRAMBIN

This is the name of the molecule. For proteins this should in principle be the same as the DE line in SwissProt files. I extracted the first five proteins (alphabetically) for which the PDB and SwissProt files exist, and for which cross pointers exist in both directions.

DE   14-3-3 PROTEIN ZETA/DELTA (PROTEIN KINASE C INHIBITOR PROTEIN-1)
COMPND   2 MOLECULE: 14-3-3 PROTEIN ZETA;

DE   HLA CLASS I HISTOCOMPATIBILITY ANTIGEN, A-2 ALPHA CHAIN PRECURSOR.
COMPND    HUMAN CLASS I HISTOCOMPATIBILITY ANTIGEN A2 (/HLA-A2$,
COMPND   2 HUMAN LEUCOCYTE ANTIGEN)

DE   PROTEIN PHOSPHATASE PP2A, 65 KD REGULATORY SUBUNIT, ALPHA ISOFORM
DE   (PROTEIN PHOSPHATASE PP2A SUBUNIT A, ALPHA ISOFORM) (PR65-ALPHA)
DE   (MEDIUM TUMOR ANTIGEN-ASSOCIATED 61-KD PROTEIN).
COMPND   2 MOLECULE: PROTEIN PHOSPHATASE PP2A;
COMPND   3 CHAIN: A, B;
COMPND   4 FRAGMENT: 65 KD REGULATORY SUBUNIT;

DE   DNA-3-METHYLADENINE GLYCOSYLASE (EC 3.2.2.21) (3-METHYLADENINE DNA
DE   GLYCOSIDASE) (ADPG) (3-ALKYLADENINE DNA GLYCOSYLASE) (N-METHYLPURINE-
DE   DNA GLYCOSIRASE).
COMPND   2 MOLECULE: 3-METHYLADENINE DNA GLYCOSYLASE;

DE   ALZHEIMER'S DISEASE AMYLOID A4 PROTEIN PRECURSOR (PROTEASE NEXIN-II)
DE   (PN-II) (APPI) [CONTAINS: BETA-AMYLOID PROTEIN (BETA-APP)].
COMPND    PROTEASE INHIBITOR DOMAIN OF ALZHEIMER'S AMYLOID
COMPND   2 BETA-PROTEIN PRECURSOR (/APPI$)

As you can see, the DE and COMPND records indicate the same molecule, but the nomenclature (i.e. naming system) is sufficiently different to make automatic searches across databases using the descriptions on these records very difficult.

The REMARK record

REMARK   1                                                              1CRN   7
REMARK   1 REFERENCE 1                                                  1CRNC  2
REMARK   1  AUTH   M.M.TEETER                                           1CRNC  3
REMARK   1  TITL   WATER STRUCTURE OF A HYDROPHOBIC PROTEIN AT ATOMIC   1CRNC  4
REMARK   1  TITL 2 RESOLUTION. PENTAGON RINGS OF WATER MOLECULES IN     1CRNC  5
REMARK   1  TITL 3 CRYSTALS OF CRAMBIN                                  1CRNC  6
REMARK   1  REF    PROC.NAT.ACAD.SCI.USA         V.  81  6014 1984      1CRNC  7
REMARK   1  REFN   ASTM PNASA6  US ISSN 0027-8424                  040  1CRNC  8
Some lines removed
REMARK   2                                                              1CRN  21
REMARK   2 RESOLUTION. 1.5 ANGSTROMS.                                   1CRN  22
Some lines removed
REMARK   9 CORRECTION. CHANGE DEPOSITION DATE FROM 31-APR-81 TO         1CRND  4
REMARK   9  30-APR-81.  29-FEB-87.                                      1CRND  5

REMARK records were originally meant for English text remarks about the structure. But later they got used for everything. For example, there exists a JRNL record to give the literature reference, but if two or more references are needed the REMARK card is needed. Over the years more and more REMARK subrecords have been introduced. The only REMARK that is important for us is the resolution remark. The resolution describes the quality of the X-ray data that were used to solve the structure.

The ATOM record

ATOM      1  N   THR     1      17.047  14.099   3.625  1.00 13.79      1CRN  70
ATOM      2  CA  THR     1      16.967  12.784   4.338  1.00 10.80      1CRN  71
ATOM      3  C   THR     1      15.685  12.755   5.133  1.00  9.19      1CRN  72
 |        |  |    |      |        |       |       |      |     |
 |        |  |    |      |        |       |       |      |     --> B-factor
 |        |  |    |      |        |       |       |      --> Occupancy
 |        |  |    |      |        |       |       --> Z coordinate
 |        |  |    |      |        |       --> Y coordinate
 |        |  |    |      |        --> X coordinate
 |        |  |    |      --> Residue number
 |        |  |    --> Residue type
 |        |  --> Atom type
 |        --> Punch card counter
 --> Record type keyword.

The ATOM record contains the atom, residue, coordinates and some information that we will discuss later.