AI Safety

Doctoral Theses in France (1985-2025): A Linked Dataset of PhDs, Academic Networks, and Institutions

A new structured dataset maps 40 years of French doctoral theses, supervisors, and institutional links for AI analysis.

Deep Dive

Researchers William Aboucaya and Dastan Jasim have published a landmark academic resource: a structured, linked dataset encompassing all doctoral theses defended in France over a 40-year period from 1985 to 2025. The dataset is constructed by aggregating and reconciling data from the French national thesis platform with additional authority and bibliographic databases. The processing pipeline involved correcting inconsistent identifiers, enriching records for people and institutions, and constructing derived variables that describe academic careers, supervision relationships, jury participation, and institutional affiliations. This creates a unified graph of the French academic system.

The resulting dataset provides structured information at three levels: the thesis, the individual (PhD candidate, supervisor, jury member), and the institution. This multi-level structure is specifically designed to enable both descriptive statistics and sophisticated relational network analyses. Researchers can now computationally study patterns in doctoral education, map the formation and evolution of academic networks, analyze supervision practices, and track institutional collaboration over decades. The paper, shared on arXiv, thoroughly documents the data sources, the complex processing pipeline, and known limitations to facilitate reuse and future extensions by the research community.

Key Points
  • Covers 40 years (1985-2025) of French doctoral theses, aggregated from national metadata sources.
  • Creates a linked graph of theses, individuals (candidates/supervisors/juries), and institutions for network analysis.
  • Enables large-scale research on academic careers, supervision networks, and institutional collaboration over time.

Why It Matters

Provides a foundational dataset for AI-driven analysis of academic systems, career trajectories, and knowledge production at scale.