Open Source

Post-training expert urges AI tinkerers to ditch benchmarks and fine-tune instead

Stop being an 'inference monkey' — learn the dark art of post-training for real profits.

Deep Dive

The post challenges the common practice of simply downloading a model and sharing inference benchmarks after acquiring high-end hardware. The author, with 4 years of experience running a 'post-training-as-a-service' business, argues this is intellectually lazy and misses the real opportunity: fine-tuning models for production use cases. They describe real projects—identifying malicious consumer chats, tagging mouse movements for corporate espionage, and profiling sales leads—all done on a single 4090 server with supervised fine-tuning (SFT). The key takeaway is that post-training is a dark art: no tutorials exist, AI tools like Claude can't automate it, and the data mix (synthesis, transformation) is critical. Model choice matters too; Qwen models are crammed with knowledge but hard to tune, while Llama models absorb fine-tuning easily but lack base knowledge. The author emphasizes iteration speed via a low-power, massively-parallel stack (hint in the picture) to quickly find the best model.

Beyond SFT, the next frontier is Reinforcement Fine-Tuning (RFT), described as the 'wild west.' RFT requires a model doing fast inference/rollouts, a reward system (potentially spawning Docker containers to build and test code), and weight updates using PPO/GRPO/RLOO. This needs a specialized build-out that few are doing solo—the author is only starting. Post-training shops like Prime RL run in datacenters. The post ends with a call to action: consider post-training as a more intellectually rewarding and profitable path than just running benchmarks.

Key Points
  • The author has earned significantly over 4 years using a 4090 server for post-training tasks like fraud detection, mouse-movement analysis, and sales profiling.
  • Post-training requires dark art skills: no tutorials, hard to automate with AI, and model-specific behaviors (Qwens benchmaxxed, Llamas absorb knowledge).
  • The next frontier is Reinforcement Fine-Tuning (RFT), combining inference, rollouts, Docker-based rewards, and weight updates via PPO/GRPO/RLOO on specialized hardware.

Why It Matters

For AI professionals, mastering post-training can unlock high-value custom solutions, far beyond simple model benchmarking.

📬 Get the top 10 AI stories daily