Hacker Gives Text-Only GPT-OSS-120B Vision Using Google Lens & OpenCV
A clever hack bypasses API limits to give any local LLM real vision capabilities.
A developer built an MCP server that gives local LLMs like GPT-OSS-120B real vision by combining OpenCV and Google Lens. The system detects objects in an image, crops them, and queries Lens for identification, successfully recognizing hardware like an NVIDIA DGX Spark. The tool also provides 17 other Google services like Search and Maps without API keys, though commenters immediately raised concerns about TOS violations and fragility.
Why It Matters
This hack demonstrates a powerful, low-cost method to add multimodal capabilities to any text model, but its legal and technical fragility is a major risk.