Used Claude Opus 4.8 via Claude Code and GPT 5.5 Extra High in Codex for two months of math research on logical induction and trust?

Used Claude Opus 4.8 via Claude Code and GPT 5.5 Extra High in Codex for two months of math research on logical induction and trust.

Lean 4 formal verification caught multiple gaps between English claims and actual proofs, but researcher didn't read the proofs directly?

Lean 4 formal verification caught multiple gaps between English claims and actual proofs, but researcher didn't read the proofs directly.

Warns of 'intellectual debt' — AI makes impressive math easy to produce, but human understanding struggles to keep pace, reducing the signal quality of mathematical rigor?

Warns of 'intellectual debt' — AI makes impressive math easy to produce, but human understanding struggles to keep pace, reducing the signal quality of mathematical rigor.

AI Safety

Claude 4.8 and GPT 5.5 power 'vibe research' in math, but risk intellectual debt

LessWrong AI June 29, 2026

⚡Lean-verified proofs via AI chats, but human discernment lags behind.

Deep Dive

Abram Demski details two months of 'vibe research' — using AI models to do mathematics, specifically extending results on logical induction and trustworthiness. Working with Claude Opus 4.8 (via Claude Code) and GPT 5.5 Extra High (in Codex), he and collaborator Anson Berns translated prior work on deference into the logical induction framework, with Lean 4 providing formal verification. Demski emphasizes that while he didn't read the Lean proofs himself, he used AI summaries and conversations to check alignment between English/latex claims and formal verifications. This process caught several significant gaps.

The approach has crossed a personal tipping point where Demski feels he can 'just keep going and keep making progress' in a new way. However, he warns of 'intellectual debt' — the ease of producing impressive-looking mathematics with AI makes it harder to distinguish genuine insight from plausible-sounding results. The risk: previously, a polished mathematical model signaled deep thought; now it can be generated quickly, potentially fooling researchers. The overall goal is to model when humans can justifiably trust AI, with applications to recursive self-improvement and moral feedback under uncertainty.

Key Points

Used Claude Opus 4.8 via Claude Code and GPT 5.5 Extra High in Codex for two months of math research on logical induction and trust.
Lean 4 formal verification caught multiple gaps between English claims and actual proofs, but researcher didn't read the proofs directly.
Warns of 'intellectual debt' — AI makes impressive math easy to produce, but human understanding struggles to keep pace, reducing the signal quality of mathematical rigor.

Why It Matters

AI-assisted math research accelerates progress but demands new vigilance against superficial rigor eroding trust in results.

Read Original Article

Claude 4.8 and GPT 5.5 power 'vibe research' in math, but risk intellectual debt

Why It Matters

Related Articles

🚀 Stay Ahead in AI