Examples 25%

This is a manual analysis of some of the Alphafold2 models that got a WHAT IF LR-score of ~25.
Again, the LR-score tells nothing about the protein analysed itself. The score only warns when a protein structure would be a useless template for a homology modelling experiment.
Very correct natively unfolded structures, very small structures, structures that consist mainly of one very long helix, etc, all get a high LR-score despite normally being the best representation of the protein at hand.

AF-Q6ZUA9-F1-model_v1.pdb

COMPND   2 MOLECULE: MAESTRO HEAT-LIKE REPEAT FAMILY MEMBER 5;
MDRQCSERPYSCTPTGRVSSAVSQNSRISPPVSTSMKDSSCMKVHQDSARRDRWSHPTTI LLHKSQSSQATLMLQEHRMFMGEAYSAATGFKMLQDMNSADPFHLKYIIKKIKNMAHGSP KLVMETIHDYFIDNPEISSRHKFRLFQTLEMVIGASDVLEETWEKTFTRLALENMTKATE LEDIYQDAASNMLVAICRHSWRVVAQHLETELLTGVFPHRSLLYVMGVLSSSEELFSQED KACWEEQLIQMAIKSVPFLSTDVWSKELLWTLTTPSWTQQEQSPEKAFLFTYYGLILQAE KNGATVRRHLQALLETSHQWPKQREGMALTLGLAATRHLDDVWAVLDQFGRSRPIRWSLP SSSPKNSEDLRWKWASSTILLAYGQVAAKARAHILPWVDNIVSRMVFYFHYSSWDETLKQ SFLTATLMLMGAVSRSEGAHSYEFFQTSELLQCLMVLMEKEPQDTLCTRSRQQAMHIASS LCKLRPPIDLERKSQLLSTCFRSVFALPLLDALEKHTCLFLEPPNIQLWPVARERAGWTH QGWGPRAVLHCSEHLQSLYSRTMEALDFMLQSLIMQNPTADELHFLLSHLYIWLASEKAH ERQRAVHSCMILLKFLNHNGYLDPKEDFKRIGQLVGILGMLCQDPDRATQRCSLEGASHL YQLLMCHKTGEALQAESQAPKELSQAHSDGAPLWNSRDQKATPLGPQEMAKNHIFQLCSF QVIKDIMQQLTLAELSDLIWTAIDGLGSTSPFRVQAASEMLLTAVQEHGAKLEIVSSMAQ AIRLRLCSVHIPQAKEKTLHAITLLARSHTCELVATFLNISIPLDSHTFQLWRALGAGQP TSHLVLTTLLACLQERPLPTGASDSSPCPKEKTYLRLLAAMNMLHELQFAREFKQAVQEG YPKLFLALLTQMHYVLELNLPSEPQPKQQAQEAAVPSPQSCSTSLEALKSLLSTTGHWHD FAHLELQGSWELFTTIHTYPKGVGLLARAMVQNHCRQIPAVLRQLLPSLQSPQERERKVA ILILTKFLYSPVLLEVLPKQAALTVLAQGLHDPSPEVRVLSLQGLSNILFHPDKGSLLQG QLRPLLDGFFQSSDQVIVCIMGTVSDTLHRLGAQGTGSQSLGVAISTRSFFNDERDGIRA AAMALFGDLVAAMADRELSGLRTQVHQSMVPLLLHLKDQCPAVATQAKFTFYRCAVLLRW RLLHTLFCTLAWERGLSARHFLWTCLMTRSQEEFSIHLSQALSYLHSHSCHIKTWVTLFI GHTICYHPQAVFQMLNAVDTNLLFRTFEHLRSDPEPSIREFATSQLSFLQKVSARPKQ

What worries me a bit is that I find some structural similarity to 4GKB.PDB:

Q:     741 TAIDGLGSTSPFRVQAASEMLLT-----AVQEHGAKLE-IV--SSMAQAIRLRLCSVHIP     792
           TA+ G G+TS +     +++ LT     A++EHG ++  ++    M    R  + +   P
S:     142 TAVTGQGNTSGYCASKGAQLALTREWAVALREHGVRVNAVIPAEVMTPLYRNWIATFEDP     201
Q:     793 QAKEKTLHAITLLARSHTC--ELVATFLNISIPLDSHTFQLWRALGAG     838
           +AK   + A   L R  T   E+  T + +  P  SHT   W  +  G
S:     202 EAKLAEIAAKVPLGRRFTTPDEIADTAVFLLSPRASHTTGEWLFVDGG     249
but these two structure elements are rather dissimilar. 6% of this protein is disordered according to Swissprot.

AF-O95025-F1-model_v1.pdb

COMPND   2 MOLECULE: SEMAPHORIN-3D;
MNANKDERLKARSQDFHLFPALMMLSMTMLFLPVTGTLKQNIPRLKLTYKDLLLSNSCIP FLGSSEGLDFQTLLLDEERGRLLLGAKDHIFLLSLVDLNKNFKKIYWPAAKERVELCKLA GKDANTECANFIRVLQPYNKTHIYVCGTGAFHPICGYIDLGVYKEDIIFKLDTHNLESGR LKCPFDPQQPFASVMTDEYLYSGTASDFLGKDTAFTRSLGPTHDHHYIRTDISEHYWLNG AKFIGTFFIPDTYNPDDDKIYFFFRESSQEGSTSDKTILSRVGRVCKNDVGGQRSLINKW TTFLKARLICSIPGSDGADTYFDELQDIYLLPTRDERNPVVYGVFTTTSSIFKGSAVCVY SMADIRAVFNGPYAHKESADHRWVQYDGRIPYPRPGTCPSKTYDPLIKSTRDFPDDVISF IKRHSVMYKSVYPVAGGPTFKRINVDYRLTQIVVDHVIAEDGQYDVMFLGTDIGTVLKVV SISKEKWNMEEVVLEELQIFKHSSIILNMELSLKQQQLYIGSRDGLVQLSLHRCDTYGKA CADCCLARDPYCAWDGNACSRYAPTSKRRARRQDVKYGDPITQCWDIEDSISHETADEKV IFGIEFNSTFLECIPKSQQATIKWYIQRSGDEHREELKPDERIIKTEYGLLIRSLQKKDS GMYYCKAQEHTFIHTIVKLTLNVIENEQMENTQRAEHEEGKVKDLLAESRLRYKDYIQIL SSPNFSLDQYCEQMWHREKRRQRNKGGPKWKHMQEMKKKRNRRHHRDLDELPRAVAT

59% sequence identity to 4GZ8.

AF-Q92979-F1-model_v1.pdb

COMPND   2 MOLECULE: RIBOSOMAL RNA SMALL SUBUNIT METHYLTRANSFERASE NEP1;
MAAPSDGFKPRERSGGEQAQDWDALPPKRPRLGAGNKIGGRRLIVVLEGASLETVKVGKT YELLNCDKHKSILLKNGRDPGEARPDITHQSLLMLMDSPLNRAGLLQVYIHTQKNVLIEV NPQTRIPRTFDRFCGLMVQLLHKLSVRAADGPQKLLKVIKNPVSDHFPVGCMKVGTSFSI PVVSDVRELVPSSDPIVFVVGAFAHGKVSVEYTEKMVSISNYPLSAALTCAKLTTAFEEV WGVI

This is PDB file 5FAI.PDB with a funny loop at the left ′modelled′ for the part that Swissprot calls disordered.

AF-Q9H9V9-F1-model_v1.pdb

COMPND   2 MOLECULE: 2-OXOGLUTARATE AND IRON-DEPENDENT OXYGENASE JMJD4;
MRAGPEPQALAGQKRGALRLLVPRLVLTVSAPAEVRRRVLRPVLSWMDRETRALADSHFR GLGVDVPGVGQAPGRVAFVSEPGAFSYADFVRGFLLPNLPCVFSSAFTQGWGSRRRWVTP AGRPDFDHLLRTYGDVVVPVANCGVQEYNSNPKEHMTLRDYITYWKEYIQAGYSSPRGCL YLKDWHLCRDFPVEDVFTLPVYFSSDWLNEFWDALDVDDYRFVYAGPAGSWSPFHADIFR SFSWSVNVCGRKKWLLFPPGQEEALRDRHGNLPYDVTSPALCDTHLHPRNQLAGPPLEIT QEAGEMVFVPSGWHHQVHNLDDTISINHNWVNGFNLANMWRFLQQELCAVQEEVSEWRDS MPDWHHHCQVIMRSCSGINFEEFYHFLKVIAEKRLLVLREAAAEDGAGLGFEQAAFDVGR ITEVLASLVAHPDFQRVDTSAFSPQPKELLQQLREAVDAAAAP

BLAST finds 30% sequence identity to 3LD8. However, AF finds a different solution for the first ~45 amino acids. I also think the way AF treated indels is at times open for improvement. But then, this is not CASP, so we can only fly by my experience as modeller ever since CASP1, but we cannot refer to the truth.