Enterprise & Industry

DeepSeek adds AI vision in major move: ‘the whale can now see’

DeepSeek's 'little whale can now see' with multimodal beta launch...

Deep Dive

Chinese AI startup DeepSeek has added multimodal capabilities to its flagship chatbot for the first time, enabling it to process images and video alongside text. The limited beta release, announced Wednesday by multimodal team leader Chen Xiaokang, adds an 'image recognition mode' to the chat interface on both the website and mobile app. This feature, available to select users for testing, brings DeepSeek in line with competitors like OpenAI and Google that already offer similar functionality. Senior researcher Chen Deli celebrated the launch on social media with the phrase 'the little whale can now see,' referencing DeepSeek's whale logo.

The move addresses a significant gap for DeepSeek, which became a household name in January 2025 for its powerful reasoning capabilities and cost-efficiency but lacked multimodal support. The new image recognition mode joins two other recently introduced chat modes: 'expert' and 'flash.' This expansion into multimodal processing is crucial for moving beyond simple text conversations into more complex and economically valuable domains, such as visual analysis and video understanding. The launch comes just days after DeepSeek released its new flagship model V4 and implemented extensive price cuts, signaling an aggressive push to maintain its competitive edge in the rapidly evolving AI landscape.

Key Points
  • DeepSeek adds first multimodal capabilities with image and video processing
  • Limited beta launch adds 'image recognition mode' alongside 'expert' and 'flash' modes
  • Follows recent V4 model release and extensive price cuts to maintain competitiveness

Why It Matters

DeepSeek closes a key competitive gap, enabling visual analysis for more complex enterprise use cases.