You need to know how big is the chance of finding an alanine in a helix, in a
strand, and in any of the other structure classes (turn, loop, etc). Lets
call these P(Ala,H), P(Ala,S), P(Ala,R) in wich H,S, and R stand for Helix,
Strand, and Rest. Obviously P(X,H)+P(X,S)+P(X,R)=1.0 for each of the twenty
amino acids X.
So, how big is P(Ala,H)? Well, P(Ala,H)=P(Ala)*P(H). And those two chances we
can obtain from counting in the whole dataset all residues, all Ala, and all H.
Typical numbers could be: data set size = 407128. Number of Ala = 28777.
Number of H is 122991. This gives us P(Ala) = 28777/407128 = 7.1% and P(H) =
122991/407128 = 30.2%. So that P(Ala,H) = 2.1%. And the expected number of
Ala in a helix Fpred(Ala,H) = 0.021 * 407128 = 8693 (check that
that is 30.2% of 28777, and check that you understand why that should be).
This you now do three times for all 20 amino acids. So you want P(Ala,H),
P(Ala,S), P(Ala,R), P(Cys,H), P(Cys,S), etcetera, 60 numbers in total. These
chances can be converted into Fpred values. And that is your
null-model.