Research & Papers

Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning

A new system uses Google's AlphaEarth embeddings to answer complex environmental questions with 9 specialized tools.

Deep Dive

A team of researchers has published a deep dive into the geometric structure of Google's AlphaEarth foundation model, which creates dense vector embeddings from Earth observation data. Their analysis of 12.1 million samples from the Continental United States (2017-2023) reveals that AlphaEarth's 64-dimensional embeddings occupy a highly non-Euclidean manifold. The effective dimensionality is just 13.3, meaning most of the 64 dimensions are redundant. The researchers found that tangent spaces rotate substantially across the manifold, with 84% of locations exceeding 60 degrees of rotation, making traditional vector arithmetic unreliable for environmental reasoning.

Building on this characterization, the team developed an agentic system with nine specialized tools that decomposes complex environmental queries into reasoning chains. The system operates over a FAISS-indexed database of embeddings, using retrieval rather than parametric methods. In a five-condition ablation study with 120 queries across three complexity tiers, embedding retrieval dominated response quality, scoring 3.79±0.90 versus 3.03±0.77 for parametric-only approaches on a 1-5 scale. The system performed best on multi-step comparisons, achieving 4.28±0.43.

A cross-model benchmark showed that the geometric tools had varying effects depending on the underlying LLM. While Anthropic's Sonnet 4.5 saw a slight score reduction of 0.12 points, Opus 4.6 improved by 0.07 points and achieved significantly higher geometric grounding (3.38 vs. 2.64). This suggests that the value of geometric characterization scales with the reasoning capability of the consuming model, with more capable models like Opus 4.6 better leveraging the structured environmental knowledge.

Key Points
  • AlphaEarth's 64D embeddings have effective dimensionality of just 13.3, with 84% of locations showing >60° tangent space rotation
  • Retrieval-based agent system scored 3.79/5 vs. 3.03 for parametric methods, excelling at multi-step comparisons (4.28±0.43)
  • Geometric tools improved Opus 4.6's performance by 0.07 points but reduced Sonnet 4.5's by 0.12, showing model-dependent benefits

Why It Matters

This research provides a blueprint for building more reliable environmental AI agents that can answer complex questions about climate, agriculture, and land use.