Research & Papers

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

arXiv cs.CL May 07, 2026

⚡Open-weight models like Gemma misclassify 18.3% of battles as civilian violence.

Deep Dive

A new study from researchers Hoffmann Muki and Olukunle Owolabi rigorously tests the readiness of large language models for conflict monitoring, using a gold-standard ACLED dataset from Nigeria and Cameroon. Four vanilla open-weight models—Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B—were evaluated alongside two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA. The results reveal a troubling bifurcated divergence: open-weight models consistently exhibit a False Illegitimation bias. For example, Gemma misclassified 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. This means these models systematically over-report civilian harm, which could misdirect humanitarian resources or inflame conflict narratives.

Even domain adaptation fails to eliminate deeper biases. While AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality (their Legitimization Bias differences are statistically indistinguishable from zero), they still show significant actor-based selection bias. In Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Additionally, open-weight outputs are fragile to geography-specific lexical framing: delegitimizing phrases cause flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, with confabulated rationales masking normative bias. The authors call for fairness-aware fine-tuning, adversarial robustness evaluation, and context-specific human-in-the-loop oversight before any deployment.

Key Points

Open-weight LLMs (Gemma 3 4B) show 18.29% False Illegitimation bias, misclassifying battles as civilian violence
Domain-adapted models still exhibit actor bias: state actors legitimized 36.5% more than non-state in Nigeria
Geography-specific lexical framing causes output flip rates up to 66.7% in Cameroon, with unfaithful rationales

Why It Matters

Humanitarian organizations cannot yet trust LLMs for conflict monitoring without rigorous bias testing and human oversight.

Read Original Article

Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

Why It Matters

Stay Ahead in AI