Research & Papers

ACAT: New annotation tool cuts ABSA dataset prep to 31 seconds

Automated ETL pipeline and built-in reliability metrics streamline sentiment dataset creation.

Deep Dive

Aspect-Based Sentiment Analysis (ABSA) requires high-quality, multi-annotator datasets, but existing tools treat output as flat files, forcing researchers to manually consolidate annotations and compute reliability metrics. To address this, a team from Romania—Ana-Maria Luisa Mocanu, Ciprian-Octavian Truica, and Elena-Simona Apostol—has developed ACAT (Aspect-based sentiment analysis Collaborative Annotation Tool). The web-based platform natively supports four core ABSA workflows: Aspect-Category Sentiment Analysis, Clause-Level Segmentation, Aspect-Term Sentiment Analysis with character-level positions, and Aspect Sentiment Triplet Extraction with dual span offsets. Its key innovation is an automated Extract, Transform, Load (ETL) pipeline that aligns collaborative annotations and directly computes Inter-Annotator Agreement (IAA) metrics at export, eliminating custom scripts and manual reconciliation.

In preliminary validation on 1,002 restaurant reviews annotated by two experts of differing skill levels, ACAT achieved a median annotation time of just 31.58 seconds per example. The raw IAA ranged from 0.78 to 0.86 across all tasks, indicating strong reliability. Accepted at DaWak 2026, ACAT offers a streamlined end-to-end solution for building ABSA datasets, reducing friction for NLP researchers and practitioners. By automating data consolidation and quality checks, it enables faster, more reproducible dataset creation—a critical need for training robust sentiment models.

Key Points
  • Supports 4 ABSA workflows including Aspect Sentiment Triplet Extraction with dual span offsets.
  • Automated ETL pipeline aligns multi-annotator data and computes IAA at export, no manual scripting needed.
  • Median annotation time of 31.58 seconds and IAA of 0.78–0.86 on 1,002 restaurant reviews with two annotators.

Why It Matters

ACAT removes manual bottlenecks in ABSA dataset creation, accelerating development of reliable sentiment models.