Research & Papers

DDAP lets non-AI scientists build production pipelines via LLM guidance

A 4-stage framework turns research intent into implementable AI code without expert knowledge.

Deep Dive

Building AI pipelines typically requires deep expertise in data preprocessing, model selection, hyperparameter tuning, and deployment. This creates a barrier for domain scientists—biologists, doctors, or agronomists—who need AI but lack software engineering backgrounds. DDAP (Domain-Driven Adaptable AI Pipelines) directly addresses this gap. The framework guides users through four structured stages: (1) problem definition, where the user describes their goal in natural language; (2) compute environment specification, where resource constraints are set; (3) pipeline generation, where the LLM proposes a complete ML workflow; and (4) code generation, producing executable scripts. Crucially, the user remains in the loop at every decision point, ensuring the pipeline matches their domain intent.

The researchers evaluated DDAP across multiple real-world datasets from business, biology, and health science domains, comparing its output pipelines to those developed by AI experts. Results show DDAP's models achieve competitive performance in regression and classification tasks—often within 5% of expert baselines—while requiring no coding from the user. The one notable weakness was text-based clustering, where domain-specific linguistic nuances still challenge the LLM-driven approach. Nonetheless, the work demonstrates that a controlled agentic framework can dramatically lower the barrier to high-quality AI adoption for non-expert scientists, paving the way for broader scientific AI use.

Key Points
  • DDAP uses a 4-stage workflow: problem definition, environment specification, pipeline generation, and code generation.
  • Tested on business, biology, and health science datasets; pipelines matched expert performance within 5% on most tasks.
  • Text clustering remains a weakness—performance lagged significantly behind expert baselines in those tasks.

Why It Matters

Democratizes AI pipeline creation for domain experts without coding, accelerating research across medicine, agriculture, and social sciences.