WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain
New framework unifies 13 fragmented work-domain AI tasks, enabling standardized evaluation across academia and industry.
A 19-researcher consortium led by Matthias De Lange has introduced WorkRB (Work Research Benchmark), the first open-source, community-driven framework specifically designed to evaluate AI systems in work-domain applications. Published on arXiv, this benchmark addresses a critical fragmentation problem: current research in hiring, talent management, and workforce analytics uses divergent ontologies (like ESCO and O*NET), heterogeneous task formulations, and diverse model families, making cross-study comparison nearly impossible. WorkRB organizes 13 diverse tasks across 7 unified groups, transforming them into standardized recommendation and natural language processing (NLP) problems.
WorkRB's architecture enables both monolingual and cross-lingual evaluation through dynamic loading of multilingual ontologies, crucial for global workforce applications. Developed within a multi-stakeholder ecosystem including academia, industry, and public institutions, the framework features a modular design that allows seamless community contributions. Critically, its design enables organizations to integrate proprietary tasks and sensitive employment data without requiring public disclosure, addressing privacy concerns that have historically limited open evaluation in this domain. The benchmark is available under the permissive Apache 2.0 license, encouraging widespread adoption and collaboration.
- Unifies 13 fragmented work-domain AI tasks across 7 groups including job/skill recommendation and skill extraction
- Enables cross-lingual evaluation through dynamic loading of multilingual ontologies (ESCO, O*NET, national taxonomies)
- Allows integration of proprietary tasks without disclosing sensitive data, addressing privacy barriers in employment AI
Why It Matters
Provides standardized testing for HR tech and workforce AI, enabling better comparison of models powering hiring and talent management systems.