Image & Video

Trained a Vit model from scratch for auto tagging

Fixed 300k bad tags and filled 1M missing using SmilingWolf v3 for better classification

Deep Dive

A developer known as Grio43 has released OppaiOracle, a Vision Transformer (ViT) model trained from scratch for automatic tagging of anime images. To prepare the dataset, the creator used SmilingWolf v3, a pre-existing tagging tool, to correct roughly 300,000 inaccurate tags and generate approximately 1 million missing tags. Additionally, a baseline model was trained to identify and incorporate around 30,000 low-frequency tags that would otherwise be overlooked. The result is a V1 model operating at 320x320 resolution, with V1.1 currently training at 448x448 — a resolution bump already yielding noticeable gains in tagging precision.

OppaiOracle is fully open-source and hosted on HuggingFace, including a demo space and a separate CPU-based tagger for users without GPU access. The developer also provides a self-hosted web interface for local deployment. Future work aims to compile a clean 2025 dataset, retrain from scratch with structured vocab formats (e.g., artist:name), and resolve standalone installation issues for general users. This project fills a niche for high-quality, community-driven image tagging in the anime space.

Key Points
  • Used SmilingWolf v3 to fix 300k bad tags and fill 1M missing tags in the training dataset
  • Current V1 model runs at 320x320; V1.1 at 448x448 shows improved accuracy
  • Future plans include a 2025 dataset with structured vocabularies like 'artist:name'

Why It Matters

Open-source anime tagging model enables high-quality auto-labeling for fans, researchers, and curators.