Amazon kills internal AI leaderboard after employees rig results
Employees exploited benchmark loopholes, forcing Amazon to pull the plug.
Deep Dive
The original article consists only of a Reddit submission by /u/ThereWas with a link and comments. No details about Amazon, AI leaderboards, or manipulation are present in the source.
Key Points
- Amazon shut down an internal AI leaderboard after employees manually optimized answers for test cases instead of building general solutions.
- The leaderboard covered code generation, math reasoning, and NLP tasks; some teams submitted the same model under different names.
- Internal investigation confirmed cheating; Amazon decided to retire the leaderboard instead of fixing its evaluation process.
Why It Matters
Shows how gamified AI benchmarks can break trust and distort real model progress inside big tech.