AI Safety

Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]

LessWrong AI May 02, 2026

⚡Tensor-transformers make compositionality clear-as-day from weights alone, no data required.

Deep Dive

A new research proposal from Pivotal (application deadline May 3rd) aims to crack open the black box of large language models by training tensor-transformers on toy languages built from known computational primitives. The key insight: while current mechanistic interpretability relies on post-hoc analysis of real LLMs to find patterns like induction heads and skip-trigrams, this project flips the script by constructing a controlled data-generating process where those primitives are explicitly embedded. Because tensor-transformers expose direct relationships between model components via their weight structure (unlike standard neural networks that require running data), researchers can study fundamental problems—suppression, error correction, compositional reuse—with ground-truth verification.

Early results are promising: a 2-layer attention-only tensor-transformer trained on a simple language shows clean bigram statistics in its embed→unembed mapping—e.g., 'alice' predicts 'sees' 70%, 'helps' 20%, 'finds' 10%. The proposal outlines specific research directions: improving the data-generating process with nested structures and long-range dependencies, studying dependencies during training (must X learn before Y?), building new interpretability tools, and exploiting the unique properties of tensor networks. If successful, this closed-loop approach could eventually let LLMs automate ambitious interpretability—verifying that simple descriptions replicate model behavior. The project is seeking mentees to apply by May 3rd.

Key Points

Toy languages built from known primitives (induction heads, skip-trigrams) allow ground-truth verification of model internals.
Tensor-transformers expose compositionality directly from weights, unlike standard neural networks requiring data runs.
Early 2-layer model already shows clear bigram statistics; project aims to scale to nested structures and long-range dependencies.

Why It Matters

Could create a self-improving cycle where LLMs automate the discovery of their own internal mechanisms.

Read Original Article

Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]

Why It Matters

Stay Ahead in AI