AI Safety

Proposal For Cryptographic Method to Rigorously Verify LLM Prompt Experiments

LessWrong AI March 08, 2026

⚡New protocol uses blockchain-inspired signatures to prevent AI prompt tampering and gaslighting attacks.

Deep Dive

Developer weberr13 has proposed a novel cryptographic protocol designed to bring mathematical rigor to LLM prompt engineering experiments. The method, detailed in a March 2026 LessWrong post, addresses a critical vulnerability demonstrated by YouTuber Michael Reeves, who showed how editing past conversation turns could "gaslight" LLMs into logical breakdowns. The system uses EdDSA (Edwards-curve Digital Signature Algorithm) asymmetric key signing to create what weberr13 calls a "cryptographic braid"—a structure that prevents tampering with AI conversation history while maintaining verification efficiency.

The protocol works by independently signing each interaction block (prompt, response, and chain-of-thought if available) and creating a unidirectional acyclic graph where each block references previous signatures. This creates an immutable audit trail that can be verified without per-user storage requirements. The design borrows concepts from blockchain and JWT (JSON Web Token) signing but adds cross-object back-referencing to prevent manipulation. Verification involves walking the entire braid structure, checking signature uniqueness, ensuring only one "genesis" node exists, and confirming the acyclic property of the graph.

If peer-reviewed and adopted, this cryptographic braid method could transform how researchers and companies like OpenAI and Anthropic conduct prompt engineering studies. It provides a mathematically rigorous way to verify that LLM experiments haven't been tampered with, addressing concerns about weak evidence chains in AI research. The protocol also offers protection against "context hacking" where past conversation turns might be edited server-side, potentially enabling more trustworthy AI agent systems and more scientific discussions of model pathologies.

Key Points

Uses EdDSA asymmetric key signing to create immutable audit trails for LLM conversations
Prevents "gaslighting" attacks where malicious actors edit past conversation turns to manipulate AI behavior
Enables stateless verification without requiring per-user storage through cryptographic braid structure

Why It Matters

Provides mathematical rigor for AI prompt engineering research and protects against manipulation of LLM conversation history.

Read Original Article

Proposal For Cryptographic Method to Rigorously Verify LLM Prompt Experiments

Why It Matters

Stay Ahead in AI