AI Safety

Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control

A new local LLM pipeline creates quizzes from PDFs without sending data to external APIs, ensuring privacy and deterministic quality.

Deep Dive

Researcher Seine A. Shintani has published a novel academic pipeline that enables educators to automatically generate multiple-choice quizzes from lecture materials while keeping all data local. The system, detailed in the arXiv paper 'Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control,' processes PDFs through a local large language model (LLM) without sending any content to external API services. This approach directly addresses growing privacy concerns in educational technology.

In testing, the pipeline processed three short 'dummy lectures' on topics like information theory and thermodynamics, generating 120 accepted MCQ candidates from 122 total attempts. Every accepted question passed rigorous, automated quality control checks including JSON schema validation, verification of a single correct answer, and numeric/constant equivalence testing. An additional warning layer flagged 8 out of the 120 items for residual risks like duplicated answer choices.

The final output is not a black-box AI model but a fully traceable, plain-text question bank in formats like JSONL and CSV, ready for import into platforms like Google Forms. This aligns with the paper's core philosophy of 'black-box minimization'—using AI for drafting, but delivering deterministic, inspectable artifacts. The work is positioned through an 'AI to Learn' (AI2L) rubric, arguing it supports privacy, accountability, and 'Green AI' by reducing cloud compute dependency.

Key Points
  • Generates quizzes entirely locally using a local LLM, eliminating API calls and data privacy risks.
  • Achieved 100% pass rate on hard QC checks (JSON schema, answer validation) for 120 generated questions.
  • Delivers final output as plain-text, traceable question banks (JSONL/CSV) for direct tool integration, removing runtime LLM dependency.

Why It Matters

Enables educational institutions to leverage AI for content creation while maintaining data privacy, reducing costs, and ensuring deterministic, auditable quality.