Open Source

claude_converter turns Claude Code sessions into fine-tuning datasets

Every Claude Code session is a .jsonl file — now use it to train local models.

Deep Dive

If you use Anthropic's Claude Code, every session you run is already saved as a .jsonl file under ~/.claude/projects/. These logs contain multi-turn editing conversations, tool calls, reasoning traces, and real code interactions — essentially free, high-quality training data. The catch is that the raw format doesn't match what any popular fine-tuning framework expects. That's the gap Fredy Rivera's open-source tool claude_converter fills: it transforms those raw session logs into the structured messages format that apply_chat_template() consumes.

Claude_converter outputs in sharegpt format, making it directly compatible with TRL's SFTTrainer, Axolotl, and LLaMA-Factory — all with zero dependencies beyond standard Python. It ships a clean_messages() helper that strips tool_use, tool_result, and thinking blocks before training, plus an inspect_session() CLI function that shows token counts and block breakdowns so you know exactly what you're working with. A caveat worth noting: raw sessions include failed attempts, retries, and dead ends. The author recommends filtering to only those sessions where the final assistant turn actually solved the problem. Install via `uv pip install claude-converter` or grab the repo at https://github.com/FredyRivera-dev/claude_converter.

Key Points
  • Converts Claude Code .jsonl sessions into messages format for TRL/SFTTrainer, Axolotl, and LLaMA-Factory (sharegpt format).
  • Includes clean_messages() helper to strip tool_use, tool_result, and thinking blocks before training.
  • Zero-dependency tool with inspect_session() for token counts and block breakdowns; available via uv pip install claude-converter.

Why It Matters

Turns existing coding logs into high-quality fine-tuning data for local models, reducing data collection overhead.

📬 Get the top 10 AI stories daily