claude_converter turns Claude Code sessions into fine-tuning datasets
Every Claude Code session is a .jsonl file — now use it to train local models.
If you use Anthropic's Claude Code, every session you run is already saved as a .jsonl file under ~/.claude/projects/. These logs contain multi-turn editing conversations, tool calls, reasoning traces, and real code interactions — essentially free, high-quality training data. The catch is that the raw format doesn't match what any popular fine-tuning framework expects. That's the gap Fredy Rivera's open-source tool claude_converter fills: it transforms those raw session logs into the structured messages format that apply_chat_template() consumes.
Claude_converter outputs in sharegpt format, making it directly compatible with TRL's SFTTrainer, Axolotl, and LLaMA-Factory — all with zero dependencies beyond standard Python. It ships a clean_messages() helper that strips tool_use, tool_result, and thinking blocks before training, plus an inspect_session() CLI function that shows token counts and block breakdowns so you know exactly what you're working with. A caveat worth noting: raw sessions include failed attempts, retries, and dead ends. The author recommends filtering to only those sessions where the final assistant turn actually solved the problem. Install via `uv pip install claude-converter` or grab the repo at https://github.com/FredyRivera-dev/claude_converter.
- Converts Claude Code .jsonl sessions into messages format for TRL/SFTTrainer, Axolotl, and LLaMA-Factory (sharegpt format).
- Includes clean_messages() helper to strip tool_use, tool_result, and thinking blocks before training.
- Zero-dependency tool with inspect_session() for token counts and block breakdowns; available via uv pip install claude-converter.
Why It Matters
Turns existing coding logs into high-quality fine-tuning data for local models, reducing data collection overhead.