Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning
New AI training method combines failure-driven SFT and geometric rewards to teach robots long-horizon reasoning.
Researchers Al Jaber Mahmud and Xuan Wang have introduced a novel framework for teaching large language models (LLMs) to solve complex robotic planning problems. Their paper, "Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning," tackles the challenge of rigid-body motion through multiple sequential narrow openings. This requires long-horizon geometric reasoning, as the configuration used to pass through an initial opening directly constrains the possible poses for all subsequent ones. The core innovation is a training pipeline that forces the LLM to output structured, machine-readable waypoint sequences that are both executable and coordinated across the entire sequence of obstacles.
Their method employs a two-stage bi-level training pipeline. First, they perform failure-driven supervised fine-tuning (SFT) using Low-Rank Adaptation (LoRA) on human demonstration data. This stage explicitly teaches the model common failure modes by incorporating structured feedback, ensuring it learns the correct output format. Second, they refine the same LoRA adapters using Group Relative Policy Optimization (GRPO), a reinforcement learning technique. Here, the model's proposed waypoint sequences are densified by a model-based planner and then scored with a deterministic, geometry-derived reward function that verifies continuous-motion feasibility.
The results demonstrate the framework's effectiveness. In simulations, their geometry-aligned LLM achieved the highest success rates in both in-distribution and out-of-distribution test environments. Qualitatively, the model exhibits advanced reasoning by strategically selecting exit poses from one opening that naturally facilitate entry into the next, showcasing true long-horizon planning. This work represents a significant step toward bridging the gap between high-level language model reasoning and low-level, geometrically-valid robotic control.
- Uses a bi-level pipeline: failure-driven LoRA SFT followed by GRPO refinement with geometric verification.
- Generates fixed-length, machine-readable waypoint sequences for coordinated motion through sequential openings.
- Achieved highest simulation success rates by teaching the LLM to select poses that facilitate subsequent maneuvers.
Why It Matters
Enables more autonomous and reliable robots for logistics, search & rescue, and maintenance in cluttered, constrained environments.