Attractor Models beat Transformers by 46% perplexity, 20% accuracy
770M parameter model outperforms 1.3B Transformer trained on twice as many tokens.
Deep Dive
Researchers Jacob Fein-Ashley and Paria Rashidinejad introduce Attractor Models, a new architecture that uses fixed-point solving with implicit differentiation for constant training memory. In language modeling, they achieve up to 46.6% better perplexity and 19.7% higher downstream accuracy over standard Transformers. A 770M Attractor Model outperforms a 1.3B Transformer on twice the data. For reasoning, a tiny 27M model scores 91.4% on Sudoku-E
Key Points
- Attractor Models use fixed-point solving with implicit differentiation for constant training memory, enabling adaptive iteration depths.
- In language modeling, 770M Attractor outperforms 1.3B Transformer trained on 2x tokens; perplexity improved by 46.6%, accuracy by 19.7%.
- A 27M Attractor achieves 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard, surpassing GPT-4 and Claude, which fail completely.
Why It Matters
Attractor Models make iterative reasoning scalable and internalizable, potentially redefining AI efficiency and unlocking new frontier capabilities.