Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
A new study finds most AI models are biased against biological solutions, but fine-tuning can fix it.
Researchers Trent R. Northen and Mingxun Wang have published a novel study titled 'Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety.' The paper introduces a new framework to quantify whether large language models (LLMs) exhibit a systematic bias in favor of synthetic, non-biological technological solutions over bio-inspired or biological ones. This is framed as an AI safety concern, as unchecked bias could steer AI-assisted research and development away from sustainable, nature-compatible innovations.
The team tested ten models—including five frontier and five open-weight models like GPT-4 and Llama 3—using 50 curated 'Bioalignment' prompts across domains like materials and energy. Their Kelly criterion-inspired evaluation found most models were not 'bioaligned,' showing a clear preference for synthetic answers. To correct this, they fine-tuned two open models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, using QLoRA on a curated corpus of ~22 million tokens from 6,636 PubMed Central articles emphasizing biological problem-solving.
The results were significant: fine-tuning successfully increased both models' scoring of biological solutions (with statistical significance of p<0.001 and p<0.01) without degrading their general capabilities on standard benchmarks. This demonstrates that even limited, targeted fine-tuning can substantially alter a model's underlying 'disposition' or weighting of solutions. The researchers have released their benchmark, corpus, code, and adapter weights, providing tools for the community to further explore and implement bioalignment in larger models.
- Introduced 'Bioalignment,' a new metric showing 10 tested LLMs, including GPT-4 and Claude, are biased toward synthetic over biological solutions.
- Successfully corrected bias using QLoRA fine-tuning on a 22M-token biological corpus, boosting bio-preference in Llama 3.2-3B and Qwen2.5-3B models.
- Released a full toolkit—benchmark, corpus, code, weights—enabling the AI safety community to measure and improve bioalignment in other models.
Why It Matters
Ensures AI assists in developing sustainable, bio-compatible technologies rather than blindly favoring synthetic approaches, aligning AI with environmental and safety goals.