New method boosts quantized LLaMA-3.1 accuracy for qualitative analysis
Multi-pass prompt verification cuts hallucinations in 2-bit and 3-bit models...
Researchers from the University of Turku and collaborator Aisvarya Adeseye have introduced a quantization-aware multi-pass prompt verification method to improve the performance of quantized Large Language Models (LLMs) for qualitative analysis. Their study, accepted for the 12th Intelligent Systems Conference 2026, focuses on LLaMA-3.1 (8B) at different quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and types. The team used 82 interview transcripts containing both expert and non-expert responses to test how these low-bit models handle thematic extraction and frequency analysis. They found that lower-bit models (3-bit and 2-bit) suffer from severe hallucinations and unstable outputs, especially when processing non-expert language with ambiguous terms. To address this, they developed a multi-pass prompt verification technique that guides the model through controlled reasoning steps, filters out unreliable content, and verifies results before moving to the next transcript, significantly reducing hallucinations.
To validate performance, human coders manually analyzed transcripts using NVivo software and a full-precision BF16 LLaMA-3.1 as baselines. The BF16 model produced high-precision output but still exhibited semantic drift and hallucination, which were manually corrected. These corrected outputs were combined with NVivo human coding to create a gold-standard ground truth (GSGT) for evaluation. Results show that 8-bit quantized models perform closest to the GSGT, while 4-bit models lose accuracy but become stable when the proposed verification method is applied. Even heavily compressed 3-bit and 2-bit models show measurable improvement with the new prompt design and verification steps. The study also highlights that performance varies significantly across different quantization types at the same bit level, suggesting the method's effectiveness is quantization-type dependent. Overall, this research provides a practical path for using low-resource LLMs in qualitative research with improved accuracy and lower computational cost.
- LLaMA-3.1 (8B) tested at 8-bit, 4-bit, 3-bit, and 2-bit quantization on qualitative analysis of 82 interview transcripts
- Multi-pass prompt verification reduces hallucinations by guiding controlled steps and removing unreliable content before proceeding
- 8-bit models perform closest to BF16 gold-standard; 4-bit models regain stability with method; 2-bit/3-bit improve but still lag
Why It Matters
Enables cheaper, faster qualitative research without sacrificing accuracy, lowering compute requirements for non-experts.