Developer Tools

DRAGON AI Classifies 825k Code Repos Without READMEs, Beats SOTA by 6%

arXiv cs.SE February 11, 2026

⚡This new model can finally organize the messy, undocumented codebases developers hate.

Deep Dive

Researchers have unveiled DRAGON, a new AI model for classifying massive collections of software repositories. It uniquely works without relying on README files, using only lightweight signals like file and directory names from version control. DRAGON improves classification accuracy (F1@5) from 54.8% to 60.8%, beating the state of the art. Its performance degrades by only 6% when READMEs are missing, making it robust for real-world use. The team also released the largest open dataset for this task: 825,000 repositories from Software Heritage.

Why It Matters

It enables large-scale organization and discovery of undocumented code, unlocking value from massive, messy software archives.

Read Original Article

DRAGON AI Classifies 825k Code Repos Without READMEs, Beats SOTA by 6%

Why It Matters

Related Articles

🚀 Stay Ahead in AI