Research & Papers

Researchers compress massive 2B-citation OpenAIRE graph from TBs to just 32GB

A massive academic dataset just became accessible to anyone with a laptop.

Deep Dive

Researchers have dramatically compressed the massive OpenAIRE academic citation graph, making it accessible for standard computers. The original dataset contains over 200 million publications and 2 billion citations, typically requiring terabytes of storage. The new processed version shrinks it to just 32GB while preserving the full network structure. They also provide a simple data format and a Python pipeline for easy community use and future updates to the graph.

Why It Matters

This unlocks large-scale network analysis for researchers and developers without access to massive computing infrastructure.

📬 Get the top 10 AI stories daily