Every HSSP file contains a series of blocks of information:
The information contained in HSSP files will be explained using the HSSP file for 1crn ( crambin) as an example.
HSSP HOMOLOGY DERIVED SECONDARY STRUCTURE OF PROTEINS , VERSION 3.0 2017 PDBID 1CRN THRESHOLD according to: t(L)=(290.15 * L ** -0.562) + 5 REFERENCE Sander C., Schneider R. : Database of homology-derived protein structures. Proteins, 9:56-68 (1991). CONTACT Maintained at http://www.cmbi.umcn.nl/ by Coos Baakman |
Most of these lines are self-explanatory
## PROTEINS : identifier and alignment statistics NR. ID STRID %IDE %WSIM IFIR ILAS JFIR JLAS LALI NGAP LGAP LSEQ2 ACCNUM PROTEIN 1 : CRAM_CRAAB 1YV8 0.98 1.00 1 46 1 46 46 0 0 46 P01542 Crambin OS=Crambe hispanica subsp. abyssinica GN=THI2 PE=1 SV=2 2 : Q9S979_CRAAB 0.86 0.93 3 46 9 52 44 0 0 118 Q9S979 Crambin=THIONIN variant THI2CA5 (Precursor) OS=Crambe hispanica subsp. abyssinica PE=4 SV=1 3 : Q9S976_CRAAB 0.57 0.82 2 45 26 69 44 0 0 134 Q9S976 Crambin=THIONIN variant THI2CA10 (Precursor) OS=Crambe hispanica subsp. abyssinica PE=4 SV=1 4 : Q43227_TULGE 0.56 0.78 2 46 14 58 45 0 0 112 Q43227 Thionin class 1 (Precursor) OS=Tulipa gesneriana GN=Thi1-4 PE=2 SV=1 ..... 62 : I1H3P5_BRADI 0.40 0.60 2 46 30 74 45 0 0 135 I1H3P5 Uncharacterized protein OS=Brachypodium distachyon GN=BRADI1G57296 PE=4 SV=1 63 : Q9S9D7_HORVU 0.40 0.71 2 46 30 74 45 0 0 137 Q9S9D7 Thionin OS=Hordeum vulgare PE=4 SV=1 64 : THN6_HORVU 0.40 0.71 2 46 30 74 45 0 0 137 P09618 Leaf-specific thionin BTH6 OS=Hordeum vulgare PE=2 SV=3 ## ALIGNMENTS 1 - 64 |
This block holds the meta data per sequence, and some vital alignment statistics that have been explained in the NOTATION records of the first block. The ID column holds the name of the sequences, and the STRID column holds the name of the corresponding PDB file (if existing), etcetera.
## ALIGNMENTS 1 - 64 SeqNo PDBNo AA STRUCTURE BP1 BP2 ACC NOCC VAR ....:....1....:....2....:....3....:....4....:....5....:....6....:....7 CHAIN AUTHCHAIN 1 1 A T 0 0 75 2 0 T A A 2 2 A T E -A 34 0A 21 61 10 T SSSSSTSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS SSSSS SSSSSSSSSSSSSSSS A A 3 3 A a E -A 33 0A 0 65 0 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC A A 4 4 A b - 0 0 0 65 4 CCCCCCCCCCFCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC A A 5 5 A P S S+ 0 0 54 65 51 PPPPKPPKKRPPPPPRRPPPPPPPPPKPKKPRKKPPPPPPPPKPPPPPKKRKPKPPPKKKKKKK A A 6 6 A S S > S- 0 0 49 65 60 SNSSDTSDDNSSNNSNNTTSSSSSSSNSNNSNNNSNSSSSRSTSSSSRTTTSNNTNSNNNNNDD A A 7 7 A I H > S+ 0 0 120 65 30 ITITITIDTTTTTTTTTMPTTTTTTTTTTTTTTTTTTTTTTETTTTTTTTTTTTKTTTTTTTTT A A .... 44 44 A Y G < S+ 0 0 68 48 18 YYYYYYLYYYYYYYWYYYYW YYYY YYYWY Y F YYLHWYYL YYYYYYYY A A 45 45 A A < 0 0 71 46 47 AAPPP PPPPDSPPTEE PB PPPP PPPNP T P TPRPPPPP VPPPPPPP A A 46 46 A N 0 0 76 41 54 NN KK KKK KKKNKK HH KKKK RKKHK H HKKKKKK HKSSSSKK A A ## SEQUENCE PROFILE AND ENTROPY |
The most confusing thing about the alignment always is the vertical orientation of the individual sequences. The columns till ACC are copied from the corresponding DSSP file. VAR and NOCC are explained in the header block.
## SEQUENCE PROFILE AND ENTROPY SeqNo PDBNo V L I M F W Y G A P S T C H R K Q E N D NOCC NDEL NINS ENTROPY RELENT WEIGHT CHAIN AUTHCHAIN 1 1 A 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 2 0 0 0.000 0 1.00 A A 2 2 A 0 0 0 0 0 0 0 0 0 0 95 5 0 0 0 0 0 0 0 0 61 0 0 0.196 6 0.90 A A 3 3 A 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 65 0 0 0.000 0 1.00 A A 4 4 A 0 0 0 0 2 0 0 0 0 0 0 0 98 0 0 0 0 0 0 0 65 0 0 0.079 2 0.96 A A .... 45 45 A 2 0 0 0 0 0 0 0 7 70 2 7 0 0 2 0 0 4 2 2 46 0 0 1.161 38 0.53 A A 46 46 A 0 0 0 0 0 0 0 0 0 0 10 0 0 15 2 63 0 0 10 0 41 0 0 1.115 37 0.45 A A ## INSERTION LIST |
The profile holds per amino acid type its percentage in the list of residues observed at that position. Be aware that these are frequencies scaled to 100. In the crambin example you see 100 for T ( threonine) at position 1 but if you look in the actual alignment you see that this is 100% of just 1 amino acid because only the first sequence has a residue at this position. NOCC is actually 2 at this position because the query sequence (from 1crn.pdb) is also part of the alignment...
HSSP alignments rigorously follow the sequence of the PDB file. That is easy to do in case of deletions . Deletions are represented by a period if in the middle of the sequence and by a blank at the termini . Insertions are more complicated because throwing them away is the same as throwing away information, and that always hurts.
In the section of the alignment listed in the box below, you find two pairs of residues that are in lower case and in red gi and gr. If you study the whole example hssp file for 1crn you will see that these are not the only two lower case pairs; but anyway, these lower case pairs indicate that between them there is an insertion.
19 19 A P T 3 5S- 0 0 109 65 65 PPPPPPPPPPPTTTRPPLTTTTTTTTATAAAPAATGAAAATYGAAAATATGAAATGAALAALAA 20 20 A G T < 5 + 0 0 52 65 4 GGGGGGGGGggGGGLGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 21 21 A T < - 0 0 39 64 53 TTTTTTTTTirTAA.TTTSTAATAATGSGGGTGGAGGGGGAAGGGGGAGATAGGTGGGGGGGGG 22 22 A P >> - 0 0 83 65 43 PAPPPPPPPSPSPPPPPPTSSSSSSSSSSSSPSSSSSSSSSSASSSSSSSPPSSSSSSTSSTSS |
The actual sequences of those insertions are found in the last block of the HSSP file, for example like:
## INSERTION LIST AliNo IPOS JPOS Len Sequence 10 20 21 1 gTi 11 20 39 1 gCr // |