An idiot's guide to lead optimisation for proteins

· ai · Source ↗

TLDR

  • Walkthrough of the Cradle-1 ML pipeline for protein lead optimisation, covering masked language models, evotuning via MSAs, and iterative wet-lab feedback loops.

Key Takeaways

  • Lead optimisation takes a partially functional protein and improves it through iterative mutation, lab testing, and model-guided candidate proposals.
  • Cradle-1 uses a transformer-based protein language model trained via masked language modelling on tens of millions of natural protein sequences.
  • Fine-tuning (called evotuning) narrows the model to a relevant protein family by training on homologs found via Multiple Sequence Alignment, filtering suggestions to evolutionarily plausible variants.
  • Cradle operates its own wet lab to tighten the model-to-experiment feedback loop; clients include Novo Nordisk, Bayer, and J&J.
  • A key limitation: proteins very different from any natural sequence are hard to model because evolution-derived training data simply does not cover that space.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN