MALLVI: a multi agent framework for integrated generalized robotics manipulation
Researchers' new system coordinates specialized AI agents to give robots adaptive, feedback-driven control.
Researchers from multiple universities developed MALLVi, a Multi-Agent Large Language and Vision framework for robotics. It coordinates specialized agents (Decomposer, Localizer, Thinker, Reflector) to handle perception, reasoning, and planning. Given a language instruction and environment image, it generates executable actions, then uses a Vision Language Model (VLM) for closed-loop feedback and error recovery. Tests show it improves generalization and success rates in zero-shot manipulation tasks compared to open-loop methods.
Why It Matters
Enables more reliable, adaptive robots that can handle dynamic environments without extensive retraining for each new task.