The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment. DSSP is a database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB). DSSP is also the name of the program that calculates DSSP entries from PDB entries.
The above means there are actually two ways of looking at DSSP. First of all there are the precalculated DSSP files for each PDB entry. And then there's the application called DSSP that can create these files.
The DSSP program works by calculating the most likely secondary structure assignment given the 3D structure of a protein. It does this by reading the position of the atoms in a protein followed by calculation of the H-bond energy between all atoms. The algorithm will discard any hydrogens present in the input structure and calculates the optimal hydrogen positions by placing them at 1.000 Å from the backbone N in the opposite direction from the backbone C=O bond. The best two H-bonds for each atom are then used to determine the most likely class of secondary structure for each residue in the protein.
This means you do need to have a full and valid 3D structure for a protein to be able to calculate the secondary structure. There's no magic in DSSP, so e.g. it cannot guess the secondary structure for a mutated protein for which you don't have the 3D structure. And, again, DSSP does not predict secondary structures, it just extracts this information from the 3D coordinates.
The DSSP program defines secondary structure, geometrical features and solvent exposure of proteins, given atomic coordinates in Protein Data Bank format (PDB) or macromolecular Crystallographic Information File format. (mmCIF)
In 1995 the format of the DSSP output files had to be changed. These changes are listed in this page, and are separately available.
In the beginning of this century Elmar Krieger made a series of corrections and adaptations to PDB file format modifications.
In 2011 Maarten Hekkelman completely rewrote DSSP. The original DSSP is from now on referred to as DSSPold.
In 2017 the DSSP format was extended, to hold the 4-character long chain IDs in the mmCIF file format.
The current version of DSSP is available as a source package. You can download the sources from https://github.com/cmbi/xssp/releases
Using the application is as simple as opening a terminal window (on Windows this is called the Command Prompt, you can find it under the Start menu, Desk Accessories). Then, in the terminal you type the command to execute dssp and the file to operate on, e.g.:
mkdssp -i my-pdb.ent -o my-ss.dssp |
In this example the PDB file called my-pdb.ent
will be used
as input and the file my-ss.dssp
will be created containing
the resulting DSSP output. If you omit this last parameter, the output
will be written to your terminal instead.
DSSPold had a series of command-line options. Examples:
dssp [-na] [-v] pdb_file [dssp_file] dssp [-na] [-v] -- [dssp_file] dssp [-h] [-?] [-V] |
The possible DSSPold commandline options are:
-na Disables the calculation of accessible surface. -c Classic (pre-July 1995) format. -v Verbose. -- Read from standard input. -h Prints a help message. -? Same as -h -l Prints the license information. -V Prints version, as in first line of the output. |
The output from DSSP contains secondary structure assignments and other information, one line per residue. Extract from 1est.dssp (simplified):
HEADER HYDROLASE (SERINE PROTEINASE) 17-MAY-76 1EST ... 240 1 4 4 0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN) . 10891.0 ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2) 162 67.5 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J) ; PER 100 RESIDUES 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS IN PARALLEL BRIDGES; PER 100 RESIDUES 84 35.0 TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES; PER 100 RESIDUES ... 26 10.8 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2) 30 12.5 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3) 10 4.2 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4) ... # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N 2 17 V B 3 +A 182 0A 8 180,-2.5 180,-1.9 1,-0.2 134,-0.1 ...Next two lines wrapped as a pair... TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA -0.776 360.0 8.1 -84.5 125.5 -14.7 34.4 34.8 ...Next two lines wrapped as a pair... CHAIN AUTHCHAIN A A ....;....1....;....2....;....3....;....4....;....5....;....6....;....7.. .-- sequential resnumber, including chain breaks as extra residues | .-- original PDB resname, not nec. sequential, may contain letters | | .-- one-letter chain ID, if any | | | .-- amino acid sequence in one letter code | | | | .-- secondary structure summary based on columns 19-38 | | | | | xxxxxxxxxxxxxxxxxxxx recommend columns for secstruc details | | | | | .-- 3-turns/helix | | | | | |.-- 4-turns/helix | | | | | ||.-- 5-turns/helix | | | | | |||.-- geometrical bend | | | | | ||||.-- chirality | | | | | |||||.-- beta bridge label | | | | | ||||||.-- beta bridge label | | | | | ||||||| .-- beta bridge partner resnum | | | | | ||||||| | .-- beta bridge partner resnum | | | | | ||||||| | |.-- beta sheet label | | | | | ||||||| | || .-- solvent accessibility | | | | | ||||||| | || | # RESIDUE AA STRUCTURE BP1 BP2 ACC | | | | | ||||||| | || | 35 47 A I E + 0 0 2 36 48 A R E > S- K 0 39C 97 37 49 A Q T 3 S+ 0 0 86 38 50 A N T 3 S+ 0 0 34 39 51 A W E < -KL 36 98C 6 |
the number 2 under column '8' in line 'residues per alpha helix' means: there are 2 α-helices of length 8 residues in this data set.
For definitons, see the original Kabsch and Sander article.
In addition note:
Each line contains the following residue information
two columns of residue numbers. First column is DSSP's sequential residue number, starting at the first residue actually in the data set and including chain breaks; this number is used to refer to residues throughout. Second column gives crystallographers' 'residue sequence number','insertion code' and 'chain identifier' (see protein data bank file record format manual), given for reference only. This column may be '>' if the chain identifier is longer than one character, in which the actual chain identifier can be found at the far right under columns 150-153 and 160-163.
one letter amino acid code, lower case for SS-bridge CYS. So in case cysteines are bridged, then the first bridged cysteine in the sequence and its partner where-ever else in the sequence it is, both become a lower case character a. The next bridged cysteine (that is not yet converted into lower case) and its partner both become a lower case character b, etcetera. unbridged cysteines remain an upper case C.
compromise summary of secondary structure, intended to approximate crystallographers' intuition, based on columns 19-38, which are the principal result of DSSP analysis of the atomic coordinates.
residue number of first and second bridge partner followed by one letter sheet label
number of water molecules in contact with this residue *10. or residue water exposed surface in Angstrom**2.
hydrogen bonds; e.g. -3,-1.4 means: if this residue is residue i then N-H of I is h-bonded to C=O of I-3 with an electrostatic H-bond energy of -1.4 kcal/mol. There are two columns for each type of H-bond, to allow for bifurcated H-bonds.
cosine of angle between C=O of residue I and C=O of residue I-1. For α-helices, TCO is near +1, for β-sheets TCO is near -1. Not used for structure definition.
virtual bond angle (bend angle) defined by the three Cα atoms of residues I-2,I,I+2. Used to define bend (structure code 'S').
virtual torsion angle (dihedral angle) defined by the four Cα atoms of residues I-1,I,I+1,I+2.Used to define chirality (structure code '+' or '-').
IUPAC peptide backbone torsion angles
echo of Cα atom coordinates
The rcsb-given and author-given chain ids respectively. These will be the same for PDB files, but different for mmCIF files. Also, in mmCIF files these ids can be longer than one character.
The values for solvent exposure may not mean what you think:
The new DSSP uses exactly the most recent format of DSSPold.
From the July 1995 version onwards, the output format got three new features (for details, see below).
To obtain output in the pre-July 1995 format use the -c option in DSSPold. Example:
dssp -c myprotein.pdb myprotein.dssp |
The Hbond columns (energy and residue offset) are wider by two characters, in order to accommodate residue number offsets up to +/-99999. The format of the first block of columns (up to the Solvent Accessibility) is not affected by this change.
In addition to the chain break residue (!) detected as a discontinuity of backbone coordinates, DSSP now also detects a discontinuity in the PDB-supplied chain identifier, recorded as (*). The (*) is in the column between the Amino Acid Letter and the Secondary Structure Summary columns.
Examples * Discontinuity in backbone coordinates 35 39 A Y H <5 0 0 110 -4,-1.4 -3,-0.2 -5,-0.3 -1,-0.2 0.906 360.0 360.0 -70.6 -52.6 74.0 40.9 29.8 36 40 A G << 0 0 101 -5,-1.1 -3,-0.1 -4,-0.6 -1,-0.1 0.383 360.0 360.0 -73.5 360.0 76.8 41.2 27.3 37 ! 0 0 0 0, 0.0 0, 0.0 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 360.0 0.0 0.0 0.0 38 60 A I 0 0 161 0, 0.0 2,-0.2 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 161.5 106.5 37.7 49.5 39 61 A T - 0 0 91 2,-0.0 2,-0.4 0, 0.0 0, 0.0 -0.814 360.0-165.8-128.3 135.3 105.4 35.7 46.6 * Discontinuity in chain identifier and in backbone coordinates 246 247 B K 0 0 161 1,-0.4 -25,-0.2 -29,-0.1 -26,-0.1 -0.879 360.0 360.0 178.1 167.1 53.2 14.3 7.1 247 248 B H 0 0 200 -2,-0.2 -1,-0.4 -27,-0.1 -26,-0.2 0.741 360.0 360.0 -48.4 360.0 52.1 10.8 8.1 248 !* 0 0 0 0, 0.0 0, 0.0 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 360.0 0.0 0.0 0.0 249 1 C A 0 0 133 0, 0.0 2,-1.4 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0-178.2 46.5 82.7 0.4 250 2 C P + 0 0 94 0, 0.0 2,-0.1 0, 0.0 0, 0.0 -0.822 360.0 58.6 -85.7 94.3 46.3 80.1 3.1 |
Recommended usage: to find a chain border in a xxxx.dssp file, locate any line containing the string !*
To reflect the change in format, a new header line is used (line starts with ====, text is in mixed case):
==== Secondary Structure Definition by the program DSSP, Version July 1995 ==== DATE=27-JUL-1995 . |
If the pre-July 1995 format is forced (using dssp -c ...) first line reverts to the pre-July 1995 style (line starts with ****, text is in upper case):
**** SECONDARY STRUCTURE DEFINITION BY THE PROGRAM DSSP, VERSION JULY 1995 **** DATE=27-JUL-1995 |
Although I generally discourage people in Bioinformatics from using Windows as their operating system, we have made available a Windows version of DSSP.
You must run Windows DSSP from a cmd window, using the same commands as described for the Unix system that are listed above.