How I topped the Open LLM Leaderboard using 2x 4090 GPUs - Research notes in Blog form
A single researcher using just 2x RTX 4090 GPUs discovered duplicating 7-layer blocks boosts AI performance, claiming the #1 leaderboard spot.
An independent researcher has achieved a top spot on the competitive Open LLM Leaderboard using a surprisingly simple yet effective technique: duplicating specific 7-layer blocks within the Qwen2-72B model. The key finding was that copying a contiguous block of approximately seven middle layers, without modifying any of the model's learned weights, led to performance gains across all major benchmarks, including ARC, HellaSwag, and MMLU. This method proved to be a "Goldilocks zone"—duplicating single layers did nothing, while copying too many layers degraded performance. The researcher, who previously built the GLaDOS voice assistant, accomplished this on a modest setup of just two consumer-grade RTX 4090 GPUs, challenging the notion that major AI breakthroughs require massive, institutional-scale compute.
The discovery has profound implications for our understanding of how large language models (LLMs) organize knowledge. The fact that only circuit-sized blocks of around seven layers work suggests that during pre-training, the model carves out discrete, functional "circuits" responsible for specific capabilities, and these circuits must be preserved as intact units to function properly. This provides a new lens for model interpretability and efficient scaling. The researcher, who now runs newer models like GLM-4.7 and Qwen3.5 on a more powerful dual GH200 system, plans to release code and specialized "RYS" versions of the Qwen3.5 27B and 35A3B models, making the technique accessible to the broader open-source community.
- Duplicating a specific 7-layer block in Qwen2-72B boosted performance across all Open LLM Leaderboard benchmarks, securing #1 rank.
- The breakthrough was achieved with only 2x consumer RTX 4090 GPUs, demonstrating accessible pathways to AI research.
- The finding suggests pre-training creates discrete functional "circuits" in the layer stack that only work when preserved whole.
Why It Matters
This democratizes high-impact AI research, showing that novel architectural insights, not just compute scale, can drive major performance gains.