LLM framework lets anyone query crash data in plain English
A new system validates 29% of queries automatically, ensuring reproducibility.
A new research paper from Mahdi Azhdari and Eric J. Gonzales introduces a generative AI framework that lets community members, local agencies, and school committees query complex transportation safety data using plain English instead of SQL or GIS tools. The system, tested on a statewide Massachusetts database integrating crash records, roadway attributes, and geospatial layers (schools, crosswalks, bus stops, municipal boundaries), uses a large language model to interpret user intent but then hands off to a deterministic rule-based layer that validates and compiles the query into a typed directed acyclic graph of spatial operations executed against a PostGIS database. This bounded design separates flexible language understanding from rigid, reviewable execution.
In evaluations, all queries executed successfully, and the validation layer corrected errors in 29% of the queries — highlighting the gap between natural language ambiguity and the strict schema needed for reliable safety analysis. The framework removes the typical reliance on GIS expertise while preserving reproducibility and governance, a key concern for public-sector AI. The authors argue this is a practical path to democratize access to transportation safety data, enabling stakeholders like town planning boards or parent-teacher groups to ask questions such as “show me all crashes within 500 feet of schools in the last three years” without needing specialized training or software licenses.
- The LLM translates natural language queries into structured semantic frames, then a rule-based validation layer corrects 29% of errors before execution.
- System was evaluated on a Massachusetts database combining crash records, road attributes, schools, bus stops, crosswalks, and municipal boundaries.
- All evaluation queries executed successfully against a PostGIS database, with results fully reproducible and schema-grounded.
Why It Matters
Democratizes access to transportation safety data, letting non-technical stakeholders make data-driven decisions without GIS expertise.