CrossTrace: A Cross-Domain Dataset of Grounded Scientific Reasoning Traces for Hypothesis Generation
New dataset of 1,389 reasoning traces helps AI models generate novel scientific hypotheses across domains.
Researchers have introduced CrossTrace, a groundbreaking dataset designed to train AI models in scientific hypothesis generation. Created by Andrew Bouras, this collection contains 1,389 detailed reasoning traces that document the step-by-step logical progression from established knowledge to novel scientific hypotheses across biomedical research (518 traces), AI/ML (605 traces), and cross-domain work (266 traces). Each trace follows a structured Input/Trace/Output schema that extends the Bit-Flip-Spark framework, with every reasoning step explicitly grounded in source paper text to prevent fabrication.
When researchers fine-tuned the Qwen2.5-7B-Instruct model on CrossTrace using QLoRA, the results were dramatic. The model's IAScore (a measure of hypothesis quality) jumped from 0.828 to 0.968 when evaluated by GPT-4o, and structural compliance improved from 0% to 100%. Human validation of 150 records confirmed exceptional accuracy, with 99.7% step-level grounding and a 0.0% fabrication rate. Crucially, balanced cross-domain training outperformed single-domain approaches, demonstrating that scientific reasoning patterns transfer effectively between disciplines like biomedicine and computer science.
The dataset represents a significant advancement over previous resources, which were limited to single domains and lacked explicit reasoning traces. By capturing eight distinct discovery patterns and providing verifiable connections between prior knowledge and novel contributions, CrossTrace addresses a critical bottleneck in accelerating scientific research. The publicly available dataset and code enable other researchers to build upon this work, potentially leading to AI systems that can assist scientists in generating testable hypotheses across multiple fields.
- Contains 1,389 reasoning traces across biomedical (518), AI/ML (605), and cross-domain (266) research
- Fine-tuning Qwen2.5-7B-Instruct improved IAScore from 0.828 to 0.968 and achieved 100% structural compliance
- Human validation shows 99.7% step-level grounding accuracy with 0.0% fabrication rate
Why It Matters
Enables AI systems to assist scientists in generating novel, grounded hypotheses, potentially accelerating research breakthroughs across disciplines.