Answer for question in bioinformatics course

Answer:

One thing is that you should make sure that your dataset is representative. Don't take the whole PDB because the hundreds of lysozyme mutant structures and HIV proteases with inhibitors will polute your dataset so that at the end everything will predict like a mix of lysozyme and protease.

You might also want to think about adding proteins that sequence-wise look like the one you want to predict, to make the prediction for your one protein of interest better.

And if you feed me a beer one day, we can think about five more possible modifications.