|
For every column p in a MSA the entropy Sp and variability Vp are calculated. The location in an EV plot of the point (Vp,Sp) is related to its function as described at the previous page for the Boxes 11, 12, 22, 23, and 33. |
Nowadays, we would take the 20-log rather than the natural logarithm to
make sure the values end up between 0 and 1, but taking the natural
logarithm or the 20-log differs only by a constant.
fp,i ranges
from 0.0 when a residue type i is not present in column p in the MSA to 1.0
when this residue type i is fully conserved. Sp can range from 0.0
for a fully conserved residue to ln(20) when all 20 amino acid types i are
observed in column p.
The variability Vp of column p in a MSA is defined as the the number of different amino acid types i are observed in column p of a MSA. Laerte, 20 years ago, demanded that a residue type should be present for at least 0.5% in a MSA (so, residue type i must be observed at position p in 1 (or more) of 200 aligned sequences. He made this rule to avoid errors cause by the many sequencing errors that were still made in those days. I guess that nowadays this 0.5% sequencing errors can be forgotten, but the 0.5% is still useful to deal with the occasional totally wrong protein that accidentally ended up in the MSA.