Image & Video

FlatLands: Generative Floormap Completion From a Single Egocentric View

Researchers' new model can generate a complete, metric map of a room from just one egocentric image.

Deep Dive

A team of researchers has unveiled FlatLands, a significant new dataset and benchmark designed to train AI models to predict complete floorplans from a single first-person photo. The core challenge is that a typical egocentric image captures only a tiny fraction of a room's floor. FlatLands provides the data and framework to train models to perform generative completion, imagining the full, traversable layout beyond the immediate field of view. The dataset is substantial, aggregating 270,575 observations from 17,656 real-world indoor scenes sourced from six existing datasets, complete with aligned ground-truth maps for training.

The benchmark allows for comparison of various AI approaches, from training-free methods to deterministic models and advanced stochastic generative models. The researchers also demonstrated an end-to-end pipeline that takes a monocular RGB image and outputs a predicted floormap. This work directly addresses a critical need in robotics and augmented reality, where a complete understanding of navigable space is essential. By providing a standardized, large-scale testbed, FlatLands aims to accelerate progress in uncertainty-aware indoor mapping, a foundational capability for future embodied AI agents and navigation systems.

Key Points
  • Generates complete Bird's-Eye View floor maps from a single first-person (egocentric) image.
  • Built on a massive dataset of 270,575 observations from 17,656 real indoor scenes.
  • Establishes a benchmark for comparing generative AI models focused on uncertainty-aware mapping for robotics.

Why It Matters

This technology is a key step towards enabling robots and AR systems to reliably navigate and understand complex, unseen indoor environments.