Research & Papers

Planktonzilla-17M: 17.4M-image dataset boosts plankton classification AI

Unifies 13 imaging systems, outperforms BioCLIP—key for ocean health monitoring.

Deep Dive

Researchers created Planktonzilla-17M, a unified dataset of 17.4 million publicly available plankton images from 13 imaging systems, covering 602 taxonomic classes (201 at species level). A controlled comparison on a ViT backbone found that supervised training using taxonomic lineage as text matches or exceeds CLIP-style training, while BioCLIP and BioCLIP2 perform poorly on plankton in zero-shot and few-shot settings. The dataset improves plankton classification performance, highlighting current biological foundation models' limitations in marine imaging.

Key Points
  • Dataset includes 17.4M images from 13 imaging systems, 602 taxonomic classes, 201 at species level
  • Supervised classification with taxonomic lineage text outperforms CLIP-style training on plankton tasks
  • BioCLIP and BioCLIP2 fail in zero-shot and few-shot plankton classification, exposing domain gaps

Why It Matters

Standardizes plankton monitoring AI, critical for ocean health and CO2 sequestration tracking.