Media & Culture

Seeing the Emotion Vectors Visualized in Gemma 2 2B

Open-source tool probes AI's internal state, revealing how models process user sentiment before responding.

Deep Dive

An independent researcher, MapleLeafKing, has developed an open-source interpretability tool called 'Emotion Scope' designed to probe and visualize the internal workings of smaller open-weight AI models. Inspired by Anthropic's research methodology, the project specifically tested Google's recently released Gemma 2 2B model. The tool performs a 'layer sweep,' probing the model at different points during inference to capture how its internal state—represented as emotion vectors—evolves from processing user input to formulating a response. In one visualized example, the model's state appeared to shift from interpreting a user's 'desperate' tone to a more 'hopeful' stance in its own reply, though the researcher notes these interpretations remain vague.

The research was conducted in a novel, semi-agentic way using Claude Code, where the AI assistant ran experiments and tested hypotheses under human supervision to ensure rigor. The created harness automates complex tasks like data corpus generation, layer sweeps, and adding emotional probes, with the goal of being replicable for larger open-weight models. A key finding was that more sophisticated internal structures, like the 'dual speaker representation' identified in larger models, did not reliably emerge in the 2B-parameter Gemma 2. The project's ultimate aspiration is to democratize advanced interpretability research, helping the community understand when and how specific reasoning structures arise in AI models of varying scales.

Key Points
  • Tool visualizes 'emotion vectors' in Google's Gemma 2 2B via layer sweeps and probing.
  • Research was conducted semi-autonomously using Claude Code for agentic experimentation.
  • Aims to provide an accessible, replicable harness for interpretability work on open-weight models.

Why It Matters

Democratizes advanced AI interpretability research, allowing developers to understand and debug model reasoning processes.