AI Safety

Substrate: Formalism

A 4-tuple framework reveals hidden risks in AI deployment layers.

Deep Dive

In a new LessWrong post, researchers Vardhan and mfatt from the AI Safety Camp project MoSSAIC (part of Groundless) present a formal framework for substrates—the computational layers that sit between model architecture and hardware, influencing AI behavior in ways standard toolkits miss. The third post in a sequence, "Substrate: Formalism" builds on earlier work showing how choices like LayerNorm placement, quantization format, and memory layout affect refusal behavior, robustness, and jailbreak susceptibility.

To address the lack of clean reasoning tools, the authors propose a 4-tuple definition: a language (syntactic expressions the substrate accepts), a semantics map (assigning abstract behaviors to expressions), a resource profile (time, memory, energy, precision budgets), and an observable interface (what external monitors can see). Using a banking analogy—where transferring €500 via web, phone, or cheque involves different languages, processes, and observables—they demonstrate how evaluators only see interface-dependent slices of behavior, leaving gaps exploitable by adversaries. The formalism aims to help safety researchers name and design around these gaps, enabling better comparisons of model deployments across different hardware and software stacks.

Key Points
  • Substrate formalism defines a 4-tuple: language, semantics map, resource profile, and observable interface
  • Choices like LayerNorm placement, quantization format, and DRAM topology affect refusal behavior, robustness, and jailbreaks
  • Framework enables comparing 'same' model across different deployments by naming gaps in observability

Why It Matters

Gives AI safety researchers a precise language to identify and mitigate risks hidden below the architecture level.