Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution
A new survey maps how AI can fingerprint developers with 90%+ accuracy using their coding quirks.
A team of researchers has published a comprehensive survey that maps the emerging field of using artificial intelligence to identify software developers based solely on their source code. The paper, 'Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution,' systematically reviews 47 studies published between 2012 and 2025. The analysis reveals that current techniques, often using machine learning models, can achieve high accuracy by analyzing stylistic 'fingerprints' in code, such as variable naming conventions, spacing habits, and structural patterns. This area, known as programmer attribution, has significant applications in software forensics, plagiarism detection, and insider threat identification.
The survey, currently a preprint on arXiv, consolidates research across software engineering, security, and digital forensics into a unified taxonomy. It highlights that the field is dominated by 'closed-world' authorship attribution—identifying an author from a known set—using stylometric features. However, the authors note critical gaps: a heavy reliance on a small number of benchmark datasets, less exploration of behavioral signals (like typing rhythms or IDE interactions), and underdeveloped areas like authorship verification in open-world settings. The study serves as a crucial framework to guide future research toward more reproducible, robust, and ethically considered methods for digital identity analysis through code.
- The survey analyzed 47 research papers from 2012-2025 on identifying programmers via AI and machine learning.
- Finds a strong focus on stylometric features (e.g., code formatting, naming) with high accuracy in controlled tests.
- Highlights major gaps in behavioral biometrics research, authorship verification, and reproducibility of methods.
Why It Matters
This research framework is vital for software security, forensic investigations, and understanding the privacy implications of code authorship.