Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
Open-weight models like Gemma misclassify 18.3% of battles as civilian violence.
A new study from researchers Hoffmann Muki and Olukunle Owolabi rigorously tests the readiness of large language models for conflict monitoring, using a gold-standard ACLED dataset from Nigeria and Cameroon. Four vanilla open-weight models—Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B—were evaluated alongside two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA. The results reveal a troubling bifurcated divergence: open-weight models consistently exhibit a False Illegitimation bias. For example, Gemma misclassified 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. This means these models systematically over-report civilian harm, which could misdirect humanitarian resources or inflame conflict narratives.
Even domain adaptation fails to eliminate deeper biases. While AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality (their Legitimization Bias differences are statistically indistinguishable from zero), they still show significant actor-based selection bias. In Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Additionally, open-weight outputs are fragile to geography-specific lexical framing: delegitimizing phrases cause flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, with confabulated rationales masking normative bias. The authors call for fairness-aware fine-tuning, adversarial robustness evaluation, and context-specific human-in-the-loop oversight before any deployment.
- Open-weight LLMs (Gemma 3 4B) show 18.29% False Illegitimation bias, misclassifying battles as civilian violence
- Domain-adapted models still exhibit actor bias: state actors legitimized 36.5% more than non-state in Nigeria
- Geography-specific lexical framing causes output flip rates up to 66.7% in Cameroon, with unfaithful rationales
Why It Matters
Humanitarian organizations cannot yet trust LLMs for conflict monitoring without rigorous bias testing and human oversight.