New paper traces LLMs' role in scientific concept analysis
Researchers review how LLMs both advance and inherit challenges in studying scientific ideas...
A new chapter by Michael Zichert and Arno Simons, published in the book Understanding Science with Large Language Models?, provides a comprehensive review of how LLMs fit into the longer history of computational approaches to concept analysis in the history, philosophy, and sociology of science (HPSS). The paper reconstructs three pre-LLM strands: early digital methods in HPSS, distributional approaches from digital history, and lexical semantic change detection. It then outlines the main challenges in those workflows—corpus construction, operationalization, modeling choices, and evaluation.
In the LLM era, the authors provide a short introduction to LLMs before reviewing LLM-based lexical semantic change detection and relevant HPSS case studies. They revisit the earlier methodological questions, showing how issues of corpus construction, model choice and training data, operationalization trade-offs, and evaluation and interpretation manifest in LLM-based workflows. The paper serves as both a historical survey and a practical guide for researchers using LLMs to study scientific concepts, emphasizing that while LLMs offer powerful new tools, they also inherit and sometimes amplify longstanding methodological problems.
- Paper synthesizes three pre-LLM methodological strands: early digital methods, distributional approaches, and lexical semantic change detection
- LLM-based workflows face persistent challenges in corpus construction, model choice, operationalization trade-offs, and evaluation
- Includes a review of LLM-based case studies in history, philosophy, and sociology of science (HPSS)
Why It Matters
Helps researchers understand both opportunities and pitfalls of using LLMs to analyze scientific concepts.