Free tool I built to score dataset quality (LQS) — feedback welcome [D]
Upload any dataset to get a 0-100 quality score across 7 dimensions with specific improvement flags.
Labelsets has open-sourced the core scoring engine from its dataset marketplace as a free, standalone tool called the Label Quality Score (LQS). Users can upload datasets in common machine learning formats—including CSV, Parquet, JSONL, COCO JSON, and YOLO—and receive a comprehensive 0-100 quality score. This overall score is decomposed across seven distinct dimensions, such as label consistency, annotation completeness, and class balance. For each dimension, the tool generates specific, actionable flags that pinpoint exactly what is degrading the dataset's quality, moving beyond a simple pass/fail to provide diagnostic insights.
The tool is designed for professionals who work directly with training data, including ML engineers, data scientists, and AI researchers. By providing a standardized, automated audit, it aims to reduce the manual, time-consuming process of vetting dataset integrity before model training begins. The creators are actively seeking feedback on the scoring methodology from the community to refine its accuracy and usefulness. This release reflects a growing trend of providing open, evaluative tools for the often opaque and critical stage of data preparation in the AI development pipeline.
- Generates a 0-100 Label Quality Score (LQS) broken down across 7 specific dimensions like consistency and completeness.
- Supports uploads for standard ML formats including CSV, Parquet, JSONL, COCO JSON, and YOLO.
- Provides actionable flags identifying specific issues degrading quality, not just a final score.
Why It Matters
Automates the critical but tedious dataset vetting process, helping teams build better models faster by starting with higher-quality data.