Developer Tools

Reinforcement fine-tuning on Amazon Bedrock: Best practices

New reinforcement fine-tuning technique delivers major accuracy gains for code, math, and moderation tasks.

Deep Dive

Amazon has introduced Reinforcement Fine-Tuning (RFT) as a new customization technique on its Bedrock platform, enabling developers to significantly improve foundation model performance without traditional labeled datasets. Unlike supervised fine-tuning that requires curated input-output pairs, RFT uses reward signals—either rule-based systems or AI judges—to train models through iterative feedback loops. This approach delivers up to 66% accuracy gains over base models while reducing customization complexity and costs, making it particularly effective for code generation, mathematical reasoning, and content moderation tasks.

RFT excels in two primary scenarios: tasks with verifiable correctness (like code that must pass unit tests) and subjective tasks where AI judges evaluate quality. The technique uses either Reinforcement Learning with Verifiable Rewards (RLVR) for objective tasks or Reinforcement Learning with AI Feedback (RLAIF) for subjective evaluations. Amazon demonstrated RFT's effectiveness using the GSM8K mathematical reasoning dataset, showing how models can learn complex problem-solving strategies through reward-based training rather than static examples.

Implementation on Amazon Bedrock involves creating custom AWS Lambda functions that serve as reward mechanisms during training cycles. These functions score model responses and update weights to increase the probability of high-reward outputs. The approach is especially valuable when desired behaviors are difficult to demonstrate through examples alone, allowing models to discover optimal strategies for tasks ranging from structured data extraction to agentic workflows.

Key Points
  • Achieves up to 66% accuracy improvements over base foundation models without requiring large labeled datasets
  • Uses reward signals through AWS Lambda functions instead of traditional supervised fine-tuning with input-output pairs
  • Effective for code generation (unit test pass rates), math reasoning (GSM8K dataset), and content moderation tasks

Why It Matters

Enables enterprises to customize AI models for specific use cases with higher accuracy and lower data preparation costs.