RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'
Repeating middle transformer layers boosts performance, revealing cross-lingual similarities in model 'thought'.
Independent researcher dnhkng has published findings from the 'RYS-II' project, providing intriguing evidence that large language models like the 27-billion-parameter Qwen3.5 may develop a cross-lingual 'universal language' in their internal representations. The key discovery is that during the middle layers of the transformer architecture, the model's latent representations (its internal 'thoughts') for the same concept expressed in Chinese and English are more similar to each other than they are to different concepts expressed in the same language. This suggests the model builds an abstract, language-agnostic understanding before generating specific linguistic outputs.
Building on this insight, the researcher found that strategically repeating blocks of layers in the middle of the transformer stack yields significant performance benefits. The technique works best when combined with fine-tuning. To demonstrate, dnhkng has released four new models on Hugging Face under the 'RYS-Qwen3.5-27B-FP8' name (S, M, L, XL variants), which use 8-bit floating-point precision for efficiency. The project is now collaborating with developer TurboDerp to create a new model format that would allow duplicated layers to be stored as copies without consuming extra VRAM during inference, potentially making the architecture more practical for widespread use.
- Evidence of a 'universal language': Middle-layer representations for same concepts in Chinese/English are more similar than different concepts in one language.
- Architectural innovation: Repeating transformer blocks in the model's middle, combined with fine-tuning, boosts performance for the 27B parameter Qwen3.5 model.
- Public release: Four new FP8-precision models (RYS-Qwen3.5-27B-FP8-S/M/L/XL) are available on Hugging Face for experimentation.
Why It Matters
This work provides a novel architectural tweak that could improve model efficiency and offers a tangible clue about how LLMs internally abstract meaning across languages.