Developer Tools

Streamlining generative AI development with MLflow v3.10 on Amazon SageMaker AI

New MLflow 3.10 on SageMaker AI brings built-in gen AI evaluation and tracing

Deep Dive

Amazon SageMaker AI has announced that its MLflow Apps now support MLflow version 3.10, bringing significant improvements for generative AI development and experiment tracking. The update introduces better tracing for complex multi-turn agentic workflows and tighter integration with popular LLM frameworks, allowing teams to log interactions and invocations more easily. A standout addition is the mlflow.genai.evaluation() API, which provides a programmatic interface for measuring generative AI quality across the development-to-production lifecycle. Built-in metrics cover relevance, faithfulness, correctness, and safety, and integrate seamlessly with SageMaker AI workflows. Observability is also upgraded with more granular trace filtering, richer metadata capture for debugging, and pre-built performance dashboards that surface latency distributions, request counts, quality scores, and token usage at a glance—eliminating manual chart configuration. These features give teams running production workloads clear visibility into operational costs while MLflow workspaces help organize artifacts across teams and projects.

Getting started is straightforward: users can create a SageMaker AI MLflow App through the SageMaker Studio console, AWS CLI, or API, with the default configuration automatically provisioning MLflow 3.10. Prerequisites include an AWS account with billing enabled and a SageMaker Studio AI domain. After creating the app, users receive an MLflow ARN for connection, then install the required Python packages (mlflow==3.10.1 and sagemaker-mlflow==0.3.0) to begin tracking experiments. The managed environment supports SageMaker Studio Jupyter Lab, Code Editor, or local IDEs, making it easy for data scientists and ML engineers to accelerate generative AI initiatives from experimentation to production while maintaining governance at scale.

Key Points
  • MLflow 3.10 on SageMaker AI introduces the mlflow.genai.evaluation() API with built-in metrics for relevance, faithfulness, correctness, and safety
  • New pre-built performance dashboards automatically surface latency distributions, request counts, quality scores, and token usage without manual configuration
  • Teams can get started via console, CLI, or API with automatic provisioning of MLflow 3.10 and require only an AWS account and SageMaker Studio domain

Why It Matters

Makes generative AI experiment tracking and production monitoring enterprise-ready with built-in evaluation and observability on AWS.