ExpGuard: LLM Content Moderation in Specialized Domains
New model protects financial, medical, and legal LLMs from technical jargon attacks with 59K curated dataset.
A research team from KAIST and Korea University has introduced ExpGuard, a new guardrail model designed specifically to protect large language models operating in specialized domains like finance, medicine, and law. The researchers identified a critical vulnerability in current safety systems: while general-purpose guardrails like WildGuard work well for everyday interactions, they fail when confronted with technical jargon and domain-specific concepts that can mask harmful content. To address this gap, the team created ExpGuardMix, a meticulously curated dataset of 58,928 labeled prompts with corresponding refusal and compliant responses from these high-stakes sectors. The dataset includes ExpGuardTest, a high-quality subset annotated by domain experts specifically to evaluate robustness against technical content.
Comprehensive evaluations on ExpGuardTest and eight established public benchmarks show ExpGuard delivers competitive performance while demonstrating exceptional resilience to domain-specific adversarial attacks. The model surpasses state-of-the-art models like WildGuard by up to 8.9% in prompt classification and 15.3% in response classification when dealing with specialized content. The researchers have open-sourced their code, data, and model weights to encourage adaptation to additional domains and support the development of increasingly robust guardrail systems. This work represents a significant step toward securing enterprise AI deployments where technical accuracy and safety must coexist, particularly as LLMs become integral to sensitive decision-making processes in regulated industries.
- ExpGuard outperforms WildGuard by up to 15.3% in response classification against domain-specific attacks
- Includes ExpGuardMix dataset with 58,928 labeled prompts from financial, medical, and legal domains
- Open-sources code, data, and model weights to support adaptation to additional specialized sectors
Why It Matters
Enables secure deployment of LLMs in regulated industries where technical jargon previously bypassed safety filters.