Customized Amazon Nova models improve molecular-property prediction in drug discovery
A single fine-tuned LLM achieves GNN-level accuracy while enabling conversational reasoning for chemists.
Amazon's Generative AI Innovation Center and AGI organization have developed customized Nova large language models (LLMs) that transform molecular-property prediction for drug discovery. Working with biotech firm Nimbus Therapeutics, the team used supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) to adapt a general-purpose LLM to predict 11 critical drug properties across three categories: lipophilicity, permeability, and clearance. This single model achieves accuracy comparable to traditional graph neural networks (GNNs), which previously required multiple specialized models to be built and maintained separately.
The breakthrough eliminates the fragmented workflow where chemists had to navigate different interfaces and data formats for each property. Now, a single query returns predictions for all properties simultaneously. More importantly, the language model enables conversational interaction—chemists can ask for reasoning behind predictions or request molecular modifications to achieve desired properties. This represents a shift from disconnected numerical outputs to an interactive reasoning partner that speaks the language of medicinal chemistry.
This approach could significantly accelerate the early stages of drug development, where designing molecules with druglike properties is critical. With traditional drug development taking 10-15 years and costing over $2 billion per successful drug, AI assistants that unify prediction and generation in one interface could increase productivity and candidate viability. The technology gives lean biotech teams practical AI collaboration tools that understand scientific context, potentially reducing the time from molecule design to clinical trials.
- Single fine-tuned LLM replaces multiple graph neural networks (GNNs) with comparable accuracy on 11 molecular properties
- Enables conversational reasoning—chemists can ask 'why' behind predictions and request molecular modifications
- Reduces weeks-long model training for new properties to incremental fine-tuning of existing LLM
Why It Matters
Could accelerate early drug discovery where only 8% of candidates succeed, potentially reducing $2B+ development costs per drug.