Answer:


1) we need data about residue frequencies in total, in TM helices, and NOT in TM helices
2) Null model is that the change of having any particular residue in a TM helix is the same as the percentage of residues in a TM helix.
3) Find proteins with TM helices in the PDB. 4) Read further down about the counter server. 5) Score for (aa,TM)=ln(Fobs/Fnull) with Fobs is the observed frequency of (aa,TM) and Fnull is the frequency as predicted from the null model (which is that the % of aa in a TM is the % aa in the dataset multiplied by the % TM in the dataset. 6) Input a sequence, add the TM scores for stretches of about 20 aa, and compare the summed scores with a treshold value determined by looking at real TM helices.