The LR-score

The LR-score was originally implemented to catch NMR structures that consist largely of unstructured loops with perhaps a few turns in the middle. In case of NMR studies, these structures do make a bit of sense as they teach us about natively unfolded proteins, but from a point of view of a homology modeller looking for a template, they are useless.

Why the name

This explains the name of the factor, a lone ranger/residue without contacts with orther rangers residues. Lone Residue ->  LR.

Implementation

The LR-score calculation follows a few steps:

  1. For all residues in a protein all residue-residue contacts are counted. No symmetry related molecules are taken into account here. Contacts are only calculated with residues that are five residues away in the sequence.
  2. In the resulting row of numbers that typically range from zero till around ten, all stretches of five or less zeros are removed by setting them to one.
  3. The total number of zeros left is divided by the total number of amino acids and multiplied by 100%.

The resulting number falls between 0.0% and 100%. 100% is found for fully extended proteins or proteins that consist of just one long helix. I ran this ′algorithm′ over all Alphafold models for more than 20K human proteins, and made a primitive histogram of theese 20K LR-scores. The result seemed somewhat surprising. Almost half of all models seem to have a LR-score of 50% or worse...:

   0.000 -  10.000 (  698)    =====
  10.000 -  20.000 ( 3196)    =====================
  20.000 -  30.000 ( 4511)    ==============================
  30.000 -  40.000 ( 3464)    =======================
  40.000 -  50.000 ( 2432)    ================
  50.000 -  60.000 ( 1894)    =============
  60.000 -  70.000 ( 1467)    ==========
  70.000 -  80.000 ( 1459)    ==========
  80.000 -  90.000 ( 1548)    ==========
  90.000 - 100.000 ( 2081)    ==============

Worried that there was a bug, I ran the same algorithm over 10K randomly selected Xray PDB files and got:

   0.000 -  10.000 ( 9654)    ==============================
  10.000 -  20.000 ( 1054)    ===
  20.000 -  30.000 (   97)
  30.000 -  40.000 (   17)
  40.000 -  50.000 (   13)
  50.000 -  60.000 (    5)
  60.000 -  70.000 (    1) 3s4r
  70.000 -  80.000 (    3) 1fav 2ymk 4lh9
  80.000 -  90.000 (    0)
  90.000 - 100.000 (    1) 1nyh

I checked a few of the high (=bad) scoring PDB files:

So, nothing wrong with these files, but useless as modelling template, unless you want to model a close homolog.

The point is that 99% of the Xray PDB files score in the lower two bins, while for the Alphafold models this isn′t even 20%.