This is how far AI has come after two and a half years. (costs up 81×)
A viral test shows Gemini 3.1 Pro builds complete websites while GPT-3.5 writes just a few lines of HTML.
A viral developer experiment has starkly quantified the rapid evolution of generative AI over the past two and a half years. By sending the identical prompt—"Please generate a comprehensive single-file HTML website demo with multiple sections and a polished, visually appealing design"—to both OpenAI's older GPT-3.5 (from September 2023) and Google's cutting-edge Gemini 3.1 Pro (from March 2026), the test revealed a staggering performance gap. The cost and time differentials were extreme: Gemini 3.1 Pro was 81 times more expensive and took 20 times longer to complete the task.
However, the output justified the premium. Gemini 3.1 Pro generated a substantial, functional website complete with multiple sections, embedded icons, interactive forms, and images, demonstrating a sophisticated understanding of front-end design and implementation. In stark contrast, GPT-3.5 produced only a few lines of rudimentary HTML code with simple white text boxes, failing to grasp the request's complexity. This side-by-side comparison serves as a concrete benchmark, moving beyond abstract benchmarks to show what users can practically expect from different model generations.
The experiment has sparked widespread discussion about the trajectory of AI development, specifically the trade-off between cost, speed, and capability. It underscores that while foundational models like GPT-3.5 democratized access to AI, the latest frontier models like Gemini 3.1 Pro deliver qualitatively different outputs that can replace more complex human work, albeit at a significantly higher operational cost. This real-world test provides crucial context for businesses evaluating AI ROI, illustrating that the definition of a 'useful' AI output has radically shifted in a very short time.
- Gemini 3.1 Pro cost 81 times more and took 20 times longer than GPT-3.5 for the same task.
- Output quality was incomparable: Gemini built a complete, multi-section website with forms and icons; GPT-3.5 wrote only a few lines of basic HTML.
- The test provides a tangible, non-benchmark measure of AI progress in practical application over 2.5 years.
Why It Matters
For professionals, it quantifies the ROI of using cutting-edge AI: vastly better outputs come with significantly higher costs, requiring strategic budgeting.