Research & Papers

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

New data augmentation method reduces bias amplification in automated test scoring without needing expensive real data.

Deep Dive

A research team led by Yun Wang has introduced BRIDGE, a novel framework designed to combat bias amplification in automated scoring systems for English Language Learners (ELLs). The paper addresses a critical problem in educational AI: as automated scoring increasingly relies on deep learning and large language models (LLMs), these systems tend to amplify existing biases, particularly against underrepresented groups like ELL students. The researchers identified that standard models trained with empirical risk minimization favor majority linguistic patterns, causing them to systematically under-predict scores for ELL students who demonstrate comparable domain knowledge but use different language structures. This bias amplification creates larger prediction gaps between student groups than exist in the original training data.

The BRIDGE framework offers a cost-effective solution by generating synthetic training data through inter-group augmentation. Instead of requiring expensive collection of additional real minority samples, BRIDGE creates high-scoring ELL examples by "pasting" construct-relevant content (rubric-aligned knowledge and evidence) from abundant high-scoring non-ELL responses into authentic ELL linguistic patterns. The system includes a discriminator model to ensure synthetic sample quality. Testing on California Science Test datasets demonstrated that BRIDGE effectively reduces prediction bias for high-scoring ELL students while maintaining overall scoring performance. Notably, the method achieved fairness improvements comparable to what would require collecting additional real human-scored data, making it particularly valuable for large-scale assessment systems where fairness is paramount but resources are limited.

Key Points
  • BRIDGE reduces bias amplification by 40% for high-scoring English Language Learners in automated test scoring
  • Framework synthesizes training data by combining rubric-aligned content with authentic minority linguistic patterns
  • Achieves fairness gains comparable to using additional real human data at significantly lower cost

Why It Matters

Enables fairer automated assessment for millions of English learners worldwide while maintaining scoring accuracy and reducing implementation costs.