|
The LR-score was originally implemented to catch NMR structures that consist largely of unstructured loops with perhaps a few turns in the middle. In case of NMR studies, these structures do make a bit of sense as they teach us about natively unfolded proteins, but from a point of view of a homology modeller looking for a template, they are useless. |
|
This explains the name of the factor, a lone ranger/residue without contacts with orther rangers residues. Lone Residue -> LR. |
The LR-score calculation follows a few steps:
The resulting number falls between 0.0% and 100%. 100% is found for fully extended proteins or proteins that consist of just one long helix. I ran this ′algorithm′ over all Alphafold models for more than 20K human proteins, and made a primitive histogram of theese 20K LR-scores. The result seemed somewhat surprising. Almost half of all models seem to have a LR-score of 50% or worse...:
0.000 - 10.000 ( 698) ===== 10.000 - 20.000 ( 3196) ===================== 20.000 - 30.000 ( 4511) ============================== 30.000 - 40.000 ( 3464) ======================= 40.000 - 50.000 ( 2432) ================ 50.000 - 60.000 ( 1894) ============= 60.000 - 70.000 ( 1467) ========== 70.000 - 80.000 ( 1459) ========== 80.000 - 90.000 ( 1548) ========== 90.000 - 100.000 ( 2081) ==============
Worried that there was a bug, I ran the same algorithm over 10K randomly selected Xray PDB files and got:
0.000 - 10.000 ( 9654) ============================== 10.000 - 20.000 ( 1054) === 20.000 - 30.000 ( 97) 30.000 - 40.000 ( 17) 40.000 - 50.000 ( 13) 50.000 - 60.000 ( 5) 60.000 - 70.000 ( 1) 3s4r 70.000 - 80.000 ( 3) 1fav 2ymk 4lh9 80.000 - 90.000 ( 0) 90.000 - 100.000 ( 1) 1nyh
I checked a few of the high (=bad) scoring PDB files:
So, nothing wrong with these files, but useless as modelling template, unless you want to model a close homolog.
The point is that 99% of the Xray PDB files score in the lower two bins, while for the Alphafold models this isn′t even 20%.