Walkthrough of the Cradle-1 ML pipeline for protein lead optimisation, covering masked language models, evotuning via MSAs, and iterative wet-lab feedback loops.
Key Takeaways
Lead optimisation takes a partially functional protein and improves it through iterative mutation, lab testing, and model-guided candidate proposals.
Cradle-1 uses a transformer-based protein language model trained via masked language modelling on tens of millions of natural protein sequences.
Fine-tuning (called evotuning) narrows the model to a relevant protein family by training on homologs found via Multiple Sequence Alignment, filtering suggestions to evolutionarily plausible variants.
Cradle operates its own wet lab to tighten the model-to-experiment feedback loop; clients include Novo Nordisk, Bayer, and J&J.
A key limitation: proteins very different from any natural sequence are hard to model because evolution-derived training data simply does not cover that space.