Why do only big ML labs dominate widely-used models despite many open-source pretrained models smaller labs could do RL on? [D]
RLHF costs and data quality may be the real moats, not just pretraining.
A Reddit user, boringblobking, sparked a viral discussion by questioning why only big AI labs like OpenAI and Anthropic dominate widely-used models, despite the availability of open-source pretrained models at similar scales (e.g., Kimi, DeepSeek). The user argues that the expensive pretraining compute is already done for these open models, so the real differentiator should be reinforcement learning from human feedback (RLHF) or other fine-tuning techniques. They ask why smaller labs can't leverage these open models to compete, given that RLHF is supposedly more accessible in cost.
The responses highlight several key barriers. First, RLHF is not cheap: it requires massive human annotation teams, expensive compute for iterative training, and careful infrastructure to avoid reward hacking. Second, data quality and curation for RLHF are often proprietary and hard to replicate, giving big labs an edge. Finally, distribution and trust matter: even if a smaller lab produces a competitive model, they lack the brand, API infrastructure, and ecosystem to achieve widespread adoption. The discussion underscores that pretraining is only one part of the puzzle, and the full stack—data, RL, deployment, and brand—creates a formidable moat for incumbents.
- Open-source models like Kimi and DeepSeek exist at similar scales to GPT/Claude, but RLHF costs remain high for smaller labs.
- Data quality and curation for RLHF are often proprietary, giving big labs a competitive advantage.
- Distribution, trust, and API infrastructure are major barriers to widespread adoption beyond model quality alone.
Why It Matters
Reveals that AI dominance hinges on RLHF, data, and ecosystem, not just pretraining compute.