Research & Papers

DiscoExplorer offers open interface for 16-language discourse relations

Explore cause and concession across languages with new open source tool

Deep Dive

Studying how ideas connect across languages—like causal relations (A because B) or concessions (A although B)—is critical for computational linguistics but notoriously difficult due to fragmented datasets and complex querying. Amir Zeldes introduces DiscoExplorer, an open source web interface that runs on local machines, to solve this. The tool integrates datasets from the DISRPT Shared Task on discourse relation classification, covering 16 languages. It features a custom query language for searching relations, visualization dashboards for signaling devices (e.g., connectives), and supports comparative analysis across languages. Researchers can now explore how different languages express similar discourse structures without writing custom scripts or wrestling with incompatible data formats.

DiscoExplorer's key contribution is lowering the barrier to multilingual discourse research. By packaging standardized data with a powerful yet intuitive interface, it enables studies on cross-linguistic pragmatics, discourse parsing evaluation, and connectives usage patterns. The tool is fully open source, encouraging community contributions and reproducibility. Example studies included in the paper demonstrate querying for specific relations and visualizing their distribution. For NLP professionals working on discourse-aware models or linguists examining rhetorical structures, DiscoExplorer turns a previously arduous task into a few clicks—making it a practical asset for both research and education.

Key Points
  • DiscoExplorer covers 16 languages from the DISRPT Shared Task discourse relation datasets
  • Features a custom query language and visualization for relations and signaling devices like connectives
  • Runs as an open source web interface on local machines, eliminating dependency on cloud services

Why It Matters

Democratizes cross-linguistic discourse analysis, accelerating research in multilingual NLP and pragmatics.