[P] citracer: a small CLI tool to trace where a concept comes from in a citation graph
Open-source tool parses PDFs with GROBID, traces keyword citations across papers, and creates interactive visualizations.
Developer Marc Pinet has released citracer, an open-source command-line tool designed to automate one of the most tedious aspects of academic research: tracing where specific concepts originate in citation networks. The tool takes a research PDF and a keyword as input, uses GROBID (a machine learning-based PDF parser) to extract the bibliography and text, then identifies which references are cited near each occurrence of the keyword. It automatically downloads those referenced papers from arXiv or OpenReview when available, and recursively walks through the resulting citation graph to build a comprehensive map of concept evolution.
The output is an interactive HTML visualization that researchers can explore to understand how ideas propagate through literature. A unique "reverse" mode leverages Semantic Scholar's citation contexts API to find papers that cite a given work specifically about a keyword, without requiring PDF downloads. While the tool currently works best with machine learning and computer science papers due to GROBID's domain strengths, Pinet acknowledges limitations including dependency on Semantic Scholar's coverage and API rate limits for free users.
Built as a personal project to solve the frustration of manually clicking through Google Scholar, citracer represents a niche but powerful addition to the researcher's toolkit. Unlike broader tools like Connected Papers, it focuses specifically on answering "which papers introduced this concept mentioned in passing?" The project is actively seeking community contributions for bug reports, parser improvements, and feature development, positioning it as a potentially valuable open-source resource for accelerating literature reviews and academic discovery.
- Parses PDF bibliographies using GROBID and traces citations near specific keywords automatically
- Generates interactive HTML visualizations of citation graphs and offers reverse lookup via Semantic Scholar API
- Currently optimized for ML/CS papers with dependencies on external services that may have coverage gaps
Why It Matters
Automates time-consuming literature review tasks, helping researchers trace concept evolution across papers efficiently.