1) we need data about residue frequencies in total, in TM helices, and NOT in TM helices
2) Null model is that the change of having any particular residue in a TM helix
is the same as the percentage of residues in a TM helix.
3) Find proteins with TM helices in the PDB.
4) Read further down about the counter server.
5) Score for (aa,TM)=ln(Fobs/Fnull) with Fobs is the observed frequency of (aa,TM)
and Fnull is the frequency as predicted from the null model (which is that the % of aa in
a TM is the % aa in the dataset multiplied by the % TM in the dataset.
6) Input a sequence, add the TM scores for stretches of about 20 aa, and compare the summed scores
with a treshold value determined by looking at real TM helices.