Open Source

How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified.

An anonymous researcher duplicated 7 middle layers in Qwen2-72B, achieving top benchmark scores without modifying any weights.

Deep Dive

An independent AI researcher has achieved a top score on the competitive Open LLM Leaderboard with an unconventional, weight-preserving technique. The method involves duplicating a specific block of approximately 7 middle layers within the Qwen2-72B model. Crucially, this 'layer duplication' hack does not modify any of the model's original trained weights. The result was a performance boost across all benchmark tasks, propelling the modified model to first place. Remarkably, this discovery was made using a relatively accessible hardware setup of just two NVIDIA RTX 4090 GPUs.

The finding reveals a fascinating structural insight into large language models. The researcher notes that duplicating single layers or blocks that are too small or too large does not work; only circuit-sized blocks of around 7 layers yield improvements. This suggests that during pre-training, the model carves out discrete, functional 'circuits' within its layer stack that must be preserved as a whole unit to function correctly. The researcher, who also built the notable GLaDOS system and now runs a powerful dual GH200 rig, plans to release the code and apply the technique to newer models like Qwen3.5 and GLM-4.7, promising 'special RYS versions' soon.

Key Points
  • A researcher duplicated a 7-layer block in Qwen2-72B, boosting performance to top the Open LLM Leaderboard without changing model weights.
  • The technique only works with 'circuit-sized' blocks of ~7 layers, suggesting models form discrete functional units during training.
  • The discovery was made on a consumer-grade 2x RTX 4090 setup, with code and new model versions (like Qwen3.5 RYS) promised for release.

Why It Matters

This demonstrates that significant performance gains can be found through novel architectural edits, potentially offering a low-cost alternative to full model retraining.