A painter with 50 years of figurative work just open-sourced his entire archive. Fine-tune on it.
A painter with works in MoMA and the Met releases 3,000+ documented works as a free, licensed dataset.
New York-based figurative artist Michael Hafftka, whose work resides in major institutions like the Metropolitan Museum of Art and MoMA, has taken the unprecedented step of publishing his complete artistic archive as an open dataset. Hosted on Hugging Face, the "Michael Hafftka Catalog Raisonné" contains between 3,000 and 4,000 meticulously documented works created over five decades, all licensed under Creative Commons (CC-BY-NC-4.0). This represents roughly half his total lifetime output, with plans to add more. The collection's unique value lies in its singular focus on the human figure across a sustained 50-year practice, offering a rare longitudinal study of stylistic evolution within one artist's hand.
Unlike many scraped art collections online, this dataset is artist-controlled, published with full provenance and metadata, making it a legally and ethically clear resource for AI training. Hafftka, who is not a developer, explicitly released it to see what the AI community would create, inviting fine-tuning on models like Stable Diffusion. The coherent subject matter—the human form—across media like oil, etching, and digital work provides a powerful, focused training corpus. In its first week, the dataset was downloaded over 2,500 times, signaling strong interest from researchers and developers seeking high-quality, licensed artistic data to explore new frontiers in generative AI art.
- Artist Michael Hafftka releases 3,000-4,000 works from his 50-year career as an open dataset on Hugging Face, licensed CC-BY-NC-4.0.
- The dataset is a rare, coherent collection focused solely on the human figure, with full metadata and provenance from a single artist.
- It saw over 2,500 downloads in its first week, providing a legally clear resource for fine-tuning AI image models like Stable Diffusion.
Why It Matters
Provides a high-quality, ethically sourced dataset for training AI art models, bridging a critical gap between artistic legacy and generative technology.