In language modeling, 770M Attractor outperforms 1.3B Transformer trained on 2x tokens; perplexity improved by 46.6%, accuracy by 19.7%?

In language modeling, 770M Attractor outperforms 1.3B Transformer trained on 2x tokens; perplexity improved by 46.6%, accuracy by 19.7%.

A 27M Attractor achieves 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard, surpassing GPT-4 and Claude, which fail completely?

A 27M Attractor achieves 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard, surpassing GPT-4 and Claude, which fail completely.

Research & Papers

Attractor Models beat Transformers by 46% perplexity, 20% accuracy

arXiv cs.NE May 13, 2026

⚡770M parameter model outperforms 1.3B Transformer trained on twice as many tokens.

Deep Dive

Researchers Jacob Fein-Ashley and Paria Rashidinejad introduce Attractor Models, a new architecture that uses fixed-point solving with implicit differentiation for constant training memory. In language modeling, they achieve up to 46.6% better perplexity and 19.7% higher downstream accuracy over standard Transformers. A 770M Attractor Model outperforms a 1.3B Transformer on twice the data. For reasoning, a tiny 27M model scores 91.4% on Sudoku-E

Key Points

Attractor Models use fixed-point solving with implicit differentiation for constant training memory, enabling adaptive iteration depths.
In language modeling, 770M Attractor outperforms 1.3B Transformer trained on 2x tokens; perplexity improved by 46.6%, accuracy by 19.7%.
A 27M Attractor achieves 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard, surpassing GPT-4 and Claude, which fail completely.

Why It Matters

Attractor Models make iterative reasoning scalable and internalizable, potentially redefining AI efficiency and unlocking new frontier capabilities.

Read Original Article

Attractor Models beat Transformers by 46% perplexity, 20% accuracy

Why It Matters

Related Articles

🚀 Stay Ahead in AI