MPA: Multimodal Prototype Augmentation for Few-Shot Learning
This new multimodal framework crushes benchmarks by merging vision with language models.
Researchers have introduced MPA, a novel Multimodal Prototype Augmentation framework that dramatically improves few-shot learning. It uses Large Language Models to generate diverse text descriptions and multi-view image augmentations to enrich training data from just a few examples. The method achieved state-of-the-art results, outperforming the second-best approach by 12.29% on single-domain and a massive 24.56% on cross-domain benchmarks in the challenging 5-way 1-shot setting.
Why It Matters
It enables AI models to learn complex visual tasks with far less data, accelerating development in medicine and science.