Research & Papers

Attractor Models beat Transformers by 46% perplexity, 20% accuracy

770M parameter model outperforms 1.3B Transformer trained on twice as many tokens.

Deep Dive

Researchers Jacob Fein-Ashley and Paria Rashidinejad introduce Attractor Models, a new architecture that uses fixed-point solving with implicit differentiation for constant training memory. In language modeling, they achieve up to 46.6% better perplexity and 19.7% higher downstream accuracy over standard Transformers. A 770M Attractor Model outperforms a 1.3B Transformer on twice the data. For reasoning, a tiny 27M model scores 91.4% on Sudoku-E

Key Points
  • Attractor Models use fixed-point solving with implicit differentiation for constant training memory, enabling adaptive iteration depths.
  • In language modeling, 770M Attractor outperforms 1.3B Transformer trained on 2x tokens; perplexity improved by 46.6%, accuracy by 19.7%.
  • A 27M Attractor achieves 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard, surpassing GPT-4 and Claude, which fail completely.

Why It Matters

Attractor Models make iterative reasoning scalable and internalizable, potentially redefining AI efficiency and unlocking new frontier capabilities.