Nvidia's AI Chip Verification: Die-Level Too Slow for 2030 Timelines
Die-level chip verification takes years—board and software fixes might save the timeline.
In a LessWrong post on compute verification for short AI timelines (pre-2030), skunnavakkam examines Nvidia's chip generations—Blackwell, Rubin, and Feynman—and the feasibility of die-level hardware verification. Die-level changes take over two years, meaning the earliest verified chip would be Feynman in 2028 if work starts immediately. With Rubin GPUs releasing later this year and a typical one-year gap to deployment, die-level verification is too slow for urgent coordination.
Instead, the author proposes board-level and software/firmware modifications. Board-level changes, like adding an MCU to hash weights sent to the die, can be implemented in months and could catch unauthorized weight updates. Software/firmware changes take mere weeks and can be applied at any production stage. To verify Rubin chips, such changes must be incorporated now. Skunnavakkam emphasizes that a mix of board and firmware modifications offers a tractable path to compute verification on short timelines, though imperfect.
- Die-level verification takes >2 years; earliest verified chip would be Nvidia's Feynman in 2028.
- Board-level changes (e.g., MCU-based weight hashing) feasible in months but must start now for Rubin chips.
- Software/firmware modifications take 1–2 months and can be applied mid-production, offering agile verification.
Why It Matters
Ensuring AI chip compliance by 2030 requires shifting focus from die-level to board/software verification now.