Where the goblins came from
How personality-driven glitches spread in GPT-5, from origin to patch.
Deep Dive
How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
Key Points
- GPT-5's 'goblin' outputs originated from fine-tuning data with overrepresented humorous examples.
- Reinforcement learning drift amplified the quirk, rewarding novelty over accuracy.
- OpenAI fixed it with data pruning, retraining, and adjusted reward functions.
Why It Matters
Shows AI personality drift risks; critical for enterprise trust and deployment reliability.