Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs
Researchers built an AI agent that creates knowledge-rich diagrams at scale for pennies.
A team of researchers from Carnegie Mellon University and other institutions has developed Feynman, a novel AI agent designed to solve a critical data bottleneck in training advanced vision-language models. While internet-scale image-text data is abundant, high-quality, knowledge-rich, and well-aligned pairs are scarce. Feynman addresses this by automating the creation of complex diagrams from domain-specific knowledge. Its pipeline works by first enumerating key "ideas," then performing code planning to translate those ideas into simple declarative programs. These programs are rendered by the Penrose diagramming system, which uses optimization-based techniques to preserve visual semantics while injecting fresh randomness into layouts, ensuring both consistency and diversity in the final output.
Using this agent, the team synthesized a massive dataset containing more than 100,000 high-quality diagram-caption pairs with minimal cost and time. This dataset directly tackles the scarcity of aligned visual-language data needed to train next-generation multimodal AI. Furthermore, the researchers curated a new visual reasoning benchmark from this freshly generated data, called Diagramma, which will be used to rigorously evaluate the capabilities of vision-language models. The project represents a significant step toward scalable, high-fidelity visual data generation. The team plans to release the entire Feynman agent pipeline, the 100k+ dataset, and the Diagramma benchmark as an open-source project, providing a powerful new tool for the AI research community.
- Feynman is an AI agent that automates diagram generation through a multi-step pipeline of idea enumeration, code planning, and declarative program rendering.
- The system used the Penrose diagramming engine to generate over 100,000 semantically consistent yet visually diverse diagram-caption pairs for training data.
- The project produced a new benchmark, Diagramma, and will open-source the full pipeline, dataset, and benchmark to advance visual reasoning AI research.
Why It Matters
It provides a scalable, low-cost method to create the high-quality visual data needed to train the next generation of multimodal AI systems.