Research & Papers

RoBERTa-based tool classifies manner/result verbs with 89.6% accuracy

LLM-generated annotations train a classifier to analyze verb semantics at scale.

Deep Dive

A team of linguists and computer scientists developed a computational method to classify manner verbs (describing how an action is performed) and result verbs (describing the outcome) at scale. Traditional annotation for this distinction is labor-intensive and scarce. The researchers used linguistically informed prompts with large language models to generate sentence-level annotations across two corpora (MASC and InterCorp), expanding coverage from previously annotated VerbNet portions to 436 semantic classes.

They then fine-tuned a RoBERTa-based classifier on these synthetic labels and evaluated it against three held-out gold-standard datasets, including a new expert-annotated set. The model achieved up to 89.6% average accuracy, showing promise as a scalable measurement tool. The authors note that further validation is needed for borderline cases and mixed manner/result verbs before downstream developmental applications can rely on it.

Key Points
  • LLM-generated prompts created sentence-level annotations for 436 verb classes from MASC and InterCorp datasets.
  • RoBERTa classifier achieved average accuracy up to 89.6% on three held-out gold-standard evaluations.
  • Tool supports automated verb semantic analysis for developmental language research, reducing manual annotation bottlenecks.

Why It Matters

Automates a previously manual linguistic classification, enabling large-scale studies of child verb learning and event semantics.