Qwen 35B trying to recreate scenes from photos in 3D!
A 35-billion parameter model creates explorable 3D environments from simple images using only llama.cpp.
A developer on the r/LocalLLaMA subreddit has demonstrated an unexpected capability of Alibaba's Qwen 35B A3B model: generating basic 3D scenes from 2D photographs. The experiment involved feeding images to the 35-billion parameter model and prompting it to recreate them as interactive HTML scenes that could be virtually "walked" through. Using the open-source llama.cpp framework and a Q4 quantized version of the model, the results, while described as "far from perfect" and "pretty bad" for professional use, showcase an impressive emergent ability for a model not specifically trained for 3D reconstruction. The creator emphasized the project was purely for fun, but it hints at the latent spatial reasoning and creative synthesis possible with current mid-sized open-weight models running on consumer hardware.
The technical setup is notably accessible, relying on local inference with llama.cpp and a heavily quantized (Q4) version of Qwen 35B, making the experiment reproducible without massive computational resources. The output consists of HTML code defining a 3D environment, suggesting the model can interpret visual content and translate it into a structured, spatial representation with basic geometry. This experiment matters because it pushes the boundaries of what is expected from general-purpose language models, moving beyond text into the domain of spatial and visual program synthesis. It points to a future where multimodal AI agents could rapidly prototype simple 3D worlds from conceptual sketches or reference images, lowering the barrier for content creation in gaming, simulation, and design.
- Alibaba's Qwen 35B A3B model generated explorable 3D HTML code from 2D image inputs.
- The experiment used a Q4 quantized model running locally via the open-source llama.cpp framework.
- Results are imperfect but show emergent 3D spatial reasoning in a model not specifically trained for the task.
Why It Matters
Shows how accessible, open models can perform creative visual-to-code synthesis, hinting at future tools for rapid 3D prototyping.