PapersWithCode revival adds multi-metric leaderboards and paper lineage
Track SOTA across AI domains with leaderboards now supporting WER, FPS, and more.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
One week after his revival of paperswithcode.co, Hugging Face's Niels Rogge has rolled out a slate of updates that make the SOTA tracking platform more powerful and flexible. The biggest addition is support for multiple metrics per benchmark: the Open ASR Leaderboard now shows both Word Error Rate (WER) and Inverse Real-Time Factor (RTFx), while the Object Detection leaderboard reports frames-per-second (FPS) alongside mean average precision (mAP). This allows researchers to compare models on more nuanced performance dimensions. The platform also now accepts paper submissions from sources beyond Arxiv, including GitHub repos, blog posts, and BiorXiv. When a paper is submitted, AI automatically enriches it with task tags, method tags, and links to evaluations.
Another key feature is paper lineage, which displays a banner above the abstract showing predecessor or follow-up papers—visible for entries like Mamba-3, DINOv2, and GLM-4.5. New popular methods such as Gated DeltaNet, Kimi Delta Attention, and Mamba-2 have been added, each with a list of citing papers. For social sharing, each benchmark includes a "copy image" button for scatter plots and tables. Finally, over 3,000 evaluations have been loaded, starting with all models supported in the Transformers library, appearing at the bottom of each paper page (e.g., Qwen 3.6). Rogge plans to continue adding features and has opened a Discord channel for feedback.
- Leaderboards now support multiple metrics per benchmark, e.g., WER and RTFx for ASR, FPS and mAP for object detection.
- Papers can be submitted from non-Arxiv sources like GitHub, blog posts, and BiorXiv, with AI auto-enrichment for tasks and methods.
- Over 3,000 evaluations have been added, covering all models supported in Hugging Face's Transformers library.
Why It Matters
Makes comparing AI models across diverse tasks more flexible and accessible, aiding research and deployment decisions.