Google launches Gemini: multimodal AI model beats GPT-4 in 30 of 32 benchmarks
Three sizes, one model: Google's Gemini takes on OpenAI with native video and audio understanding.
Google officially launched Gemini, its most advanced large language model, on December 6, 2023. CEO Sundar Pichai and DeepMind CEO Demis Hassabis positioned Gemini as a transformative AI model that will eventually power most Google products, from Search and Ads to Chrome. Unlike previous models, Gemini was built from the ground up as a multimodal system—meaning it can natively understand and generate text, images, audio, and video without separate specialist models. This is a strategic differentiator against OpenAI's GPT-4, which relies on separate models like DALL-E and Whisper for image and audio tasks.
Gemini comes in three tiers: Gemini Nano, a lightweight version for on-device Android tasks; Gemini Pro, which now powers Bard and will be available to developers via Google's Generative AI Studio and Vertex AI from December 13; and Gemini Ultra, the most powerful variant aimed at data centers and enterprise use, expected next year. Google ran 32 well-established benchmarks comparing Gemini to GPT-4 and claims it leads on 30 of them, with particular strengths in multimodal reasoning and Python code generation. The initial release supports English only, with broader language support and deeper product integration expected throughout 2024. This marks Google's most significant response to the ChatGPT-era AI race, nearly a year after OpenAI's launch.
- Gemini has three sizes: Nano (on-device), Pro (Bard backbone), Ultra (enterprise).
- Google claims it beats GPT-4 in 30 out of 32 benchmarks, including code and multimodal tasks.
- Bard is now powered by Gemini Pro; Pixel 8 Pro users get Gemini Nano features.
Why It Matters
Google finally fields a credible GPT-4 competitor, and its native multimodality could reshape how AI is integrated across Search, ads, and devices.