Code Review Dataset: 200k+ Cases of Human-Written Code Reviews from Top OSS Projects
A new dataset with 200k+ real code reviews from React and TensorFlow improves AI review quality 4x.
An independent researcher has compiled and released a massive new dataset containing over 200,000 real-world, human-written code reviews. Sourced from top-tier open-source projects including React, TensorFlow, VSCode, and others, this dataset provides a rich, practical corpus of how experienced developers critique and improve code. The creator used this data to fine-tune Alibaba's Qwen2.5-Coder-32B-Instruct model, creating a specialized AI for the code review task. This process demonstrates a direct application of high-quality, domain-specific data to improve a large language model's performance on a complex, nuanced job.
The results were striking. The fine-tuned model showed a 4x improvement over the base model in key natural language generation metrics like BLEU-4, ROUGE-L, and SBERT scores when generating code fixes and review comments. This indicates the AI's outputs became significantly more aligned with human-written feedback in terms of language quality and semantic similarity. The dataset is now publicly available, inviting other developers and researchers to integrate it into their own LLM training pipelines. This release provides a powerful tool for anyone looking to enhance AI coding assistants, build better automated review tools, or improve the code-generation capabilities of models like GPT-4, Claude, or Llama by training them on proven, high-quality human feedback.
- Dataset contains 200,000+ real code reviews from major OSS projects like React and TensorFlow.
- Fine-tuned Qwen2.5-Coder-32B model achieved 4x better BLEU-4, ROUGE-L, and SBERT scores.
- Public release enables developers to train better AI coding assistants and review tools.
Why It Matters
Provides a high-quality training foundation to make AI code review and generation more practical and aligned with expert human feedback.