Agent Frameworks

Ablation Study of a Fairness Auditing Agentic System for Bias Mitigation in Early-Onset Colorectal Cancer Detection

A two-agent AI system using RAG outperformed raw LLMs in identifying demographic disparities in colorectal cancer models.

Deep Dive

A research team including Amalia Ionescu, Jason H. Moore, and Tiffani J. Bright has developed and tested an agentic AI system designed to audit biomedical machine learning models for fairness, specifically targeting early-onset colorectal cancer (EO-CRC) detection. The system employs a two-agent architecture: a Domain Expert Agent that synthesizes medical literature on EO-CRC disparities, and a Fairness Consultant Agent that recommends which sensitive attributes (like race or gender) and fairness metrics should be used for model evaluation. This approach aims to address the critical problem of algorithmic bias in clinical AI, where limited oversight can allow safety risks and inequities to persist.

The researchers conducted an ablation study to compare the system's performance across three configurations: a pretrained LLM-only baseline, an Agent system without Retrieval-Augmented Generation (RAG), and an Agent system with RAG. They tested these configurations on three different sizes of Ollama large language models: 8B, 20B, and 120B parameters. The key finding was that across all model sizes, the 'Agent with RAG' configuration consistently achieved the highest semantic similarity scores when its outputs were compared to reference statements derived from human experts. This was particularly true for the crucial task of identifying specific demographic disparities.

This result indicates that augmenting an agentic framework with external knowledge retrieval (RAG) significantly improves its ability to reason about complex, domain-specific fairness issues. The study suggests that such automated, multi-agent systems could be a scalable solution for auditing clinical AI models, providing much-needed oversight to mitigate bias before models are deployed in real-world healthcare settings where demographic disparities in outcomes are a documented concern.

Key Points
  • The system uses a two-agent architecture: a Domain Expert Agent and a Fairness Consultant Agent, designed for clinical AI auditing.
  • An ablation study tested three Ollama LLMs (8B, 20B, 120B parameters) and found the 'Agent with RAG' configuration performed best.
  • The 'Agent with RAG' achieved the highest semantic similarity to expert references, especially for identifying demographic disparities in cancer detection.

Why It Matters

Provides a scalable, automated method to audit healthcare AI for bias, potentially preventing discriminatory outcomes in critical medical diagnostics.