Sanders et al.'s diffusion model boosts antibody design accuracy by 20%
Germline-absorbing diffusion raises antibody prediction from 26% to 46% accuracy.
A team led by Justin Sanders has published a paper on arXiv detailing a new approach to computational antibody design. While protein language models have shown promise, existing methods often memorize germline sequences and struggle with flexible conditional generation. The researchers address these limitations with two contributions: discrete diffusion fine-tuning that allows generation conditioned on any off-the-shelf classifier, and a novel germline absorbing diffusion process. Instead of using a masked token as the absorbing state, the model treats the germline sequence as the absorbing state. This biologically motivated inductive bias restricts learning to the trajectory from germline to observed antibody sequence, effectively excluding genetic variation and V(D)J recombination statistics. The result is a dramatic reduction in germline bias, improving non-germline residue prediction accuracy from 26% to 46%, which is near the theoretical upper bound set by true biological variability.
The utility of this model is demonstrated on conditional generation tasks for therapeutic antibodies. The team used classifier guidance to sample antibodies with improved hydrophobicity and predicted binding affinity. On both tasks, the germline diffusion model achieved a better tradeoff between class adherence and sample quality compared to EvoProtGrad, a popular gradient-based discrete MCMC sampling strategy. By enabling precise control over key developability properties, this work offers a practical path to designing antibody candidates computationally rather than relying solely on experimental screening. The approach could significantly accelerate antibody drug discovery pipelines.
- Germline absorbing diffusion uses the germline sequence as an absorbing state, improving non-germline residue prediction from 26% to 46%.
- Discrete diffusion fine-tuning enables classifier-guided conditional generation for any property, without retraining the model.
- Outperforms EvoProtGrad on hydrophobicity and binding affinity generation with better class adherence and sample quality.
Why It Matters
This method could accelerate antibody drug discovery by enabling precise computational design of therapeutic antibodies.