Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
This new architecture could make your AI models smarter without getting bigger.
Researchers propose RecursiveVLM, a new Transformer architecture for Large Multimodal Models (LMMs) that reuses parameters through recursive refinement to extract stronger representations without increasing model size. Key innovations include a Recursive Connector for feature alignment and a Monotonic Recursion Loss. Experiments show consistent gains of +3% over standard Transformers and +7% over vanilla recursive baselines, enabling on-demand refinement for efficient, deployment-adaptive AI systems.
Why It Matters
It enables more powerful AI on resource-constrained devices, potentially lowering compute costs and energy use.