RD injects random priming phrases and 3-letter diverting stems at decoding time, no fine-tuning needed?

RD injects random priming phrases and 3-letter diverting stems at decoding time, no fine-tuning needed.

Across 500 prompts, relevance stays ~0.99 while diversity grows linearly for up to 1,000 runs?

Across 500 prompts, relevance stays ~0.99 while diversity grows linearly for up to 1,000 runs.

Stronger LLMs (e.g., Gemini-3) benefit more due to more peaked distributions and hidden tail knowledge?

Stronger LLMs (e.g., Gemini-3) benefit more due to more peaked distributions and hidden tail knowledge.

Media & Culture

Harvard's Recoding-Decoding boosts LLM output diversity without fine-tuning

r/ArtificialInteligence May 11, 2026

⚡Injecting random priming phrases unlocks hidden long-tail knowledge from LLMs.

Deep Dive

A new paper from Harvard (Luo, King, Puett, Smith) proposes Recoding-Decoding (RD), a decoding method that increases the diversity of LLM outputs without any fine-tuning. The authors argue that current decoding strategies (top-k, nucleus, etc.) only sample from the peak of the conditional distribution, leaving the long tail of heterodox, contrarian, or non-Western knowledge unused. RD works by first prepending a random 'priming phrase' (e.g., "Related to FOOD:") and then injecting a random 3-letter 'diverting stem' (e.g., "Pas", "Tib") at the start of each new sentence. This forces the model to absorb the injected tokens and generate outputs like "[Pas]ta and the silk road" instead of the dominant answer "Age of Enlightenment."

The team tested RD on 50 brainstorm topics and 500 prompts from 5 public datasets. They found that relevance remains around 0.99 while diversity increases almost linearly up to 1,000 repeated runs. Remarkably, the stronger the base LLM (e.g., Gemini-3 > GPT-5.1 > GPT-3.5 > DeepSeek-3), the larger RD's advantage—because more capable models have more peaked distributions and thus more hidden tail knowledge. The authors frame this as solving the 'search quest' problem (e.g., picking a wedding dress, a research topic, a startup name) where the goal is not a single correct answer but exploring the space. They warn that current LLMs are anti-optimized for such tasks, driving collective homogenization (citing an incident where students using ChatGPT turned in nearly identical essay outlines without collusion).

Key Points

RD injects random priming phrases and 3-letter diverting stems at decoding time, no fine-tuning needed.
Across 500 prompts, relevance stays ~0.99 while diversity grows linearly for up to 1,000 runs.
Stronger LLMs (e.g., Gemini-3) benefit more due to more peaked distributions and hidden tail knowledge.

Why It Matters

Unlocks LLMs for open-ended exploration—reducing output homogenization in brainstorming, search, and creative tasks.

Read Original Article

Harvard's Recoding-Decoding boosts LLM output diversity without fine-tuning

Why It Matters

Related Articles

🚀 Stay Ahead in AI