Research & Papers

slidesqaqa: AI pipeline turns lecture slides into pedagogical Q&A

Four-stage LLM pipeline that analyzes both text and images from PDF slides

Deep Dive

A new open-source tool called slidesqaqa (Slide Deck Q&A Quality Assurance) tackles the challenge of generating meaningful questions from lecture slides. Built by Jim Salsman and described in a recent arXiv paper, the system is a Flask-based app that extracts both text and rendered images from PDF slide decks. It then processes that content through a four-stage large language model pipeline: window planning identifies slide groupings, deck synthesis establishes overarching goals, slide annotation creates per-slide questions and summaries, and reconciliation revises the entire deck-level output to eliminate redundancy and improve coverage.

The pipeline also allocates bounded question budgets per section and filters out non-instructional slides (e.g., title or transition slides). The final output is a structured JSON annotation containing deck-level learning goals, section structure, slide-level summaries, question sets, and automatic evaluation scores. Initial experiments on two technical lecture decks demonstrated that the system can produce high-fidelity, pedagogically coherent questions even for visually complex content. The working system and software repository are publicly available, making this a practical tool for educators and content creators looking to automate quiz generation from presentation materials.

Key Points
  • slidesqaqa is a Flask-based system that extracts text and images from PDF slides for question generation.
  • The pipeline has four stages: window planning, deck synthesis, slide annotation, and reconciliation.
  • Initial tests on two technical decks produced high-fidelity, pedagogically coherent questions with reduced redundancy.

Why It Matters

Automates the creation of scaffolded quiz questions from lecture slides, saving educators time and improving content coverage.