Getting Claude to rank the inkhaven bloggers
Anthropic's Claude spent $40 in API credits to rank 15+ posts from the rationalist writing residency.
In a follow-up to Alexander Wales' viral experiment, Sean Herrington deployed Anthropic's Claude Opus 4.6 to systematically rank blog posts from the Inkhaven writing residency. Instead of simple pairwise comparisons, Herrington implemented a Bradley-Terry model—a statistical method Claude itself suggested, analogous to the Elo rating system in chess. The AI judged 8 posts at a time across five iterative rounds, burning through $40 in API credits to refine its rankings based on a detailed prompt asking it to predict the taste of a rationalist audience.
Claude evaluated posts on criteria including insight, craft, honest thinking, distinctive voice, and humor, with explicit instructions not to be 'generous or encouraging.' The final leaderboard of over 15 posts was topped by Natalie Cargill's 'How to invent a disease' with a score of +2.77 standard deviations. The list showcases the eclectic mix of Inkhaven, featuring runner-ups on topics from 'More Legal Systems Very Different From Ours' to 'The phenomenology of being hungry while pregnant.'
The experiment demonstrates a move beyond simple chat interactions, using Claude Opus 4.6 as an analytical engine for complex, multi-step evaluation tasks. It also inadvertently tested the model's consistency, as Herrington's attempt to 'push' his own post from 2nd to 1st place instead saw it drop to 10th. The project highlights how advanced LLMs can be prompted to perform sophisticated comparative judgment at scale, offering a novel lens on content quality within niche communities.
- Used Claude Opus 4.6 with a Bradley-Terry model to rank 15+ Inkhaven posts across 5 iterative rounds
- Top post was 'How to invent a disease' by Natalie Cargill scoring +2.77σ, beating entries on legal systems and AI myths
- Experiment cost $40 in Anthropic API credits and tested the model's consistency against prompt engineering attempts
Why It Matters
Shows how professionals can use LLMs like Claude for complex, multi-step analytical judgment beyond simple Q&A, at a predictable cost.