Research & Papers

Toy experiment: frozen Pythia-70M can use a forward-derived fast memory for contextual one-shot symbolic recall [D]

A tiny frozen model uses output geometry for context-dependent binding without weight updates.

Deep Dive

This experiment tests whether a frozen open-weight transformer (Pythia-70M) can support temporary in-context memory using only forward-pass geometry. Instead of backpropagation, the method computes a memory value as the output embedding correction: E[target] minus the expected token embedding from the model's distribution. This correction vector is stored and later retrieved via cosine similarity during generation, allowing the model to bind invented words to specific meanings without altering weights.

Results show that with a single shared memory and top-1 retrieval (both_top1 mode), the frozen model achieved 80.5% exact match on same-context recall, nearly matching an explicit context gate (80.1%). The system also partially generalized to new context labels (60.2% for game C/D) but became fragile with stylistically different contexts (34% for lab north/south). The key finding is that the learned retrieval geometry can keep two conflicting meanings separated by context, hinting at a scalable approach for in-context learning without gradient updates.

Key Points
  • Uses frozen Pythia-70M with no weight updates; memory vectors computed from output embedding geometry (E[target] - expected embedding).
  • Achieves 80.5% exact match on one-shot symbolic recall with conflicting meanings (e.g., 'blicket' = red in game A, blue in game B).
  • Context generalization drops to 60.2% for new game labels and 34% for dissimilar contexts, showing fragility.

Why It Matters

Demonstrates a lightweight, backprop-free memory mechanism that could enable scalable in-context learning for large frozen models.