DSSP

Explanation

The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB). DSSP is also the name of the program that calculates DSSP entries from PDB entries.

The above means there are actually two ways of looking at DSSP. First of all there are the precalculated DSSP files for each PDB entry. And then there's the application called DSSP that can create these files.

Theory

The DSSP program works by calculating the most likely secondary structure assignment given the 3D structure of a protein. It does this by reading the position of the atoms in a protein followed by calculation of the H-bond energy between all atoms. The algorithm will discard any hydrogens present in the input structure and calculates the optimal hydrogen positions by placing them at 1.000 Å from the backbone N in the opposite direction from the backbone C=O bond. The best two H-bonds for each atom are then used to determine the most likely class of secondary structure for each residue in the protein.

This means you do need to have a full and valid 3D structure for a protein to be able to calculate the secondary structure. There's no magic in DSSP, so e.g. it cannot guess the secondary structure for a mutated protein for which you don't have the 3D structure. And, again, DSSP does not predict secondary structures, it just extracts this information from the 3D coordinates.

Description

The DSSP program defines secondary structure, geometrical features and solvent exposure of proteins, given atomic coordinates in Protein Data Bank format (PDB) or macromolecular Crystallographic Information File format. (mmCIF)

In 1995 the format of the DSSP output files had to be changed. These changes are listed in this page, and are separately available.

In the beginning of this century Elmar Krieger made a series of corrections and adaptations to PDB file format modifications.

In 2011 Maarten Hekkelman completely rewrote DSSP. The original DSSP is from now on referred to as DSSPold.

In 2017 the DSSP format was extended, to hold the 4-character long chain IDs in the mmCIF file format.

Usage and command line options for DSSP

The current version of DSSP is available as a source package. You can download the sources from https://github.com/cmbi/xssp/releases

Using the application is as simple as opening a terminal window (on Windows this is called the Command Prompt, you can find it under the Start menu, Desk Accessories). Then, in the terminal you type the command to execute dssp and the file to operate on, e.g.:

mkdssp -i my-pdb.ent -o my-ss.dssp

In this example the PDB file called my-pdb.ent will be used as input and the file my-ss.dssp will be created containing the resulting DSSP output. If you omit this last parameter, the output will be written to your terminal instead.

DSSPold had a series of command-line options. Examples:

  dssp [-na] [-v] pdb_file [dssp_file]
  dssp [-na] [-v] -- [dssp_file]
  dssp [-h] [-?] [-V]

The possible DSSPold commandline options are:

-na     Disables the calculation of accessible surface.
-c      Classic (pre-July 1995) format.
-v      Verbose.
--      Read from standard input.
-h      Prints a help message.
-?      Same as -h
-l      Prints the license information.
-V      Prints version, as in first line of the output.

Output

The output from DSSP contains secondary structure assignments and other information, one line per residue. Extract from 1est.dssp (simplified):

HEADER    HYDROLASE   (SERINE PROTEINASE)         17-MAY-76   1EST
...
  240  1  4  4  0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS,
                  NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN)                .
 10891.0   ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2)
  162 67.5   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J)  ; PER 100 RESIDUES
    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS IN     PARALLEL BRIDGES; PER 100 RESIDUES
   84 35.0   TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES; PER 100 RESIDUES
...
   26 10.8   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2)
   30 12.5   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3)
   10  4.2   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4)
...
  #  RESIDUE AA STRUCTURE BP1 BP2  ACC   N-H-->O  O-->H-N  N-H-->O  O-->H-N
    2   17   V  B 3   +A  182   0A   8  180,-2.5 180,-1.9   1,-0.2 134,-0.1
                                   ...Next two lines wrapped as a pair...
                                    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA
                                  -0.776 360.0   8.1 -84.5 125.5  -14.7   34.4   34.8
                                   ...Next two lines wrapped as a pair...
                                               CHAIN AUTHCHAIN
                                                   A         A
....;....1....;....2....;....3....;....4....;....5....;....6....;....7..
    .-- sequential resnumber, including chain breaks as extra residues
    |    .-- original PDB resname, not nec. sequential, may contain letters
    |    | .-- one-letter chain ID, if any
    |    | | .-- amino acid sequence in one letter code
    |    | | |  .-- secondary structure summary based on columns 19-38
    |    | | |  | xxxxxxxxxxxxxxxxxxxx recommend columns for secstruc details
    |    | | |  | .-- 3-turns/helix
    |    | | |  | |.-- 4-turns/helix
    |    | | |  | ||.-- 5-turns/helix
    |    | | |  | |||.-- geometrical bend
    |    | | |  | ||||.-- chirality
    |    | | |  | |||||.-- beta bridge label
    |    | | |  | ||||||.-- beta bridge label
    |    | | |  | |||||||   .-- beta bridge partner resnum
    |    | | |  | |||||||   |   .-- beta bridge partner resnum
    |    | | |  | |||||||   |   |.-- beta sheet label
    |    | | |  | |||||||   |   ||   .-- solvent accessibility
    |    | | |  | |||||||   |   ||   |
  #  RESIDUE AA STRUCTURE BP1 BP2  ACC
    |    | | |  | |||||||   |   ||   |
   35   47 A I  E     +     0   0    2
   36   48 A R  E >  S- K   0  39C  97
   37   49 A Q  T 3  S+     0   0   86
   38   50 A N  T 3  S+     0   0   34
   39   51 A W  E <   -KL  36  98C   6

Histograms:

the number 2 under column '8' in line 'residues per alpha helix' means: there are 2 α-helices of length 8 residues in this data set.

For definitons, see the original Kabsch and Sander article.

In addition note:
Each line contains the following residue information

RESIDUE

two columns of residue numbers. First column is DSSP's sequential residue number, starting at the first residue actually in the data set and including chain breaks; this number is used to refer to residues throughout. Second column gives crystallographers' 'residue sequence number','insertion code' and 'chain identifier' (see protein data bank file record format manual), given for reference only. This column may be '>' if the chain identifier is longer than one character, in which the actual chain identifier can be found at the far right under columns 150-153 and 160-163.

AA

one letter amino acid code, lower case for SS-bridge CYS. So in case cysteines are bridged, then the first bridged cysteine in the sequence and its partner where-ever else in the sequence it is, both become a lower case character a. The next bridged cysteine (that is not yet converted into lower case) and its partner both become a lower case character b, etcetera. unbridged cysteines remain an upper case C.

S (first column in STRUCTURE block)

compromise summary of secondary structure, intended to approximate crystallographers' intuition, based on columns 19-38, which are the principal result of DSSP analysis of the atomic coordinates.

BP1 BP2

residue number of first and second bridge partner followed by one letter sheet label

ACC

number of water molecules in contact with this residue *10. or residue water exposed surface in Angstrom**2.

N-H-->O etc.

hydrogen bonds; e.g. -3,-1.4 means: if this residue is residue i then N-H of I is h-bonded to C=O of I-3 with an electrostatic H-bond energy of -1.4 kcal/mol. There are two columns for each type of H-bond, to allow for bifurcated H-bonds.

TCO

cosine of angle between C=O of residue I and C=O of residue I-1. For α-helices, TCO is near +1, for β-sheets TCO is near -1. Not used for structure definition.

KAPPA

virtual bond angle (bend angle) defined by the three Cα atoms of residues I-2,I,I+2. Used to define bend (structure code 'S').

ALPHA

virtual torsion angle (dihedral angle) defined by the four Cα atoms of residues I-1,I,I+1,I+2.Used to define chirality (structure code '+' or '-').

PHI PSI

IUPAC peptide backbone torsion angles

X-CA Y-CA Z-CA

echo of Cα atom coordinates

CHAIN AUTHCHAIN

The rcsb-given and author-given chain ids respectively. These will be the same for PDB files, but different for mmCIF files. Also, in mmCIF files these ids can be longer than one character.

Warnings

The values for solvent exposure may not mean what you think:

Effects leading to larger than expected values: solvent exposure calculation ignores unusual residues, like ACE, or residues with incomplete backbone, like ALA 1 of data set 1CPA. it also ignores HETATOMS, like a heme or metal ligands. Also, side chains may be incomplete (an error message is written).
Effects leading to smaller than expected values: if you apply this program to protein data bank data sets containing oligomers, solvent exposure is for the entire assembly, not for the monomer. Also, atom OXT of c-terminal residues is treated like a side chain atom if it is listed as part of the last residue. also, peptide substrates, when listed as atoms rather than hetatoms, are treated as part of the protein, e.g. residues 499 s and 500 s in 1CPA.
Unknown or unusual residues are named X on output and are not checked for standard number of sidechain atoms. All explicit water molecules, like other hetatoms, are ignored.

DSSP output format history

The new DSSP uses exactly the most recent format of DSSPold.

From the July 1995 version onwards, the output format got three new features (for details, see below).

The Hbond columns are two characters wider.
PDB chain break identifier is indicated by a star (*).
The DSSP file header changed to alert the user to the format changes.

To obtain output in the pre-July 1995 format use the -c option in DSSPold. Example:

dssp -c myprotein.pdb myprotein.dssp

Wider Hbond Columns

The Hbond columns (energy and residue offset) are wider by two characters, in order to accommodate residue number offsets up to +/-99999. The format of the first block of columns (up to the Solvent Accessibility) is not affected by this change.

PDB chain break identifier

In addition to the chain break residue (!) detected as a discontinuity of backbone coordinates, DSSP now also detects a discontinuity in the PDB-supplied chain identifier, recorded as (*). The (*) is in the column between the Amino Acid Letter and the Secondary Structure Summary columns.

Examples
    * Discontinuity in backbone coordinates
         35   39 A Y  H <5        0   0  110     -4,-1.4    -3,-0.2    -5,-0.3    -1,-0.2   0.906 360.0 360.0 -70.6 -52.6   74.0   40.9   29.8
         36   40 A G    <<        0   0  101     -5,-1.1    -3,-0.1    -4,-0.6    -1,-0.1   0.383 360.0 360.0 -73.5 360.0   76.8   41.2   27.3
         37        !              0   0    0      0, 0.0     0, 0.0     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0 360.0    0.0    0.0    0.0
         38   60 A I              0   0  161      0, 0.0     2,-0.2     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0 161.5  106.5   37.7   49.5
         39   61 A T        -     0   0   91      2,-0.0     2,-0.4     0, 0.0     0, 0.0  -0.814 360.0-165.8-128.3 135.3  105.4   35.7   46.6
    * Discontinuity in chain identifier and in backbone coordinates
        246  247 B K              0   0  161      1,-0.4   -25,-0.2   -29,-0.1   -26,-0.1  -0.879 360.0 360.0 178.1 167.1   53.2   14.3    7.1
        247  248 B H              0   0  200     -2,-0.2    -1,-0.4   -27,-0.1   -26,-0.2   0.741 360.0 360.0 -48.4 360.0   52.1   10.8    8.1
        248        !*             0   0    0      0, 0.0     0, 0.0     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0 360.0    0.0    0.0    0.0
        249    1 C A              0   0  133      0, 0.0     2,-1.4     0, 0.0     0, 0.0   0.000 360.0 360.0 360.0-178.2   46.5   82.7    0.4
        250    2 C P        +     0   0   94      0, 0.0     2,-0.1     0, 0.0     0, 0.0  -0.822 360.0  58.6 -85.7  94.3   46.3   80.1    3.1

Recommended usage: to find a chain border in a xxxx.dssp file, locate any line containing the string !*

New header line

To reflect the change in format, a new header line is used (line starts with ====, text is in mixed case):

==== Secondary Structure Definition by the program DSSP, Version July 1995 ==== DATE=27-JUL-1995                      .

If the pre-July 1995 format is forced (using dssp -c ...) first line reverts to the pre-July 1995 style (line starts with ****, text is in upper case):

 **** SECONDARY STRUCTURE DEFINITION BY THE PROGRAM DSSP, VERSION JULY 1995 **** DATE=27-JUL-1995

Running DSSP at a Windows machine

Although I generally discourage people in Bioinformatics from using Windows as their operating system, we have made available a Windows version of DSSP.

You must run Windows DSSP from a cmd window, using the same commands as described for the Unix system that are listed above.