Developer Tools

Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints

New custom parser solves format mismatch, letting Llama 3.1 and other models work with Strands' agent SDK.

Deep Dive

A new technical guide from AWS Labs and Strands addresses a growing pain point for enterprises deploying custom AI agents. Organizations are increasingly hosting their own large language models (LLMs) like Meta's Llama 3.1 on Amazon SageMaker AI endpoints using frameworks like SGLang or vLLM for cost and control benefits. However, these frameworks typically output responses in OpenAI-compatible formats, creating a mismatch with the Strands Agents SDK, which is built to consume the Amazon Bedrock Messages API format. This incompatibility, often resulting in 'TypeError' failures, has blocked the use of custom-hosted models within Strands' agentic workflows.

The solution is a three-layer implementation using the open-source `awslabs/ml-container-creator` tool. First, it automates the deployment of a model (e.g., Llama 3.1) on SageMaker. The core innovation is a custom Parser Layer—a `LlamaModelProvider` class that extends Strands' `SageMakerAIModel`. This parser intercepts the model's native response and translates it into the Bedrock Messages API structure expected by the Strands Agent Layer. This approach decouples the serving framework from the agent SDK, granting developers the flexibility to use their preferred, optimized inference stacks without sacrificing compatibility with advanced agent toolkits, effectively future-proofing their AI architecture.

Key Points
  • Solves format mismatch between SageMaker-hosted LLMs (OpenAI format) and Strands Agents (Bedrock Messages API).
  • Uses awslabs/ml-container-creator, an open-source Yeoman generator, to automate SageMaker BYOC container deployment.
  • Core fix is a custom parser class that extends SageMakerAIModel to translate response formats for seamless agent integration.

Why It Matters

Unlocks enterprise AI flexibility: use any optimized LLM serving framework with leading agent SDKs, avoiding vendor lock-in.