Evaluating different AI's on African livestck knowledge
A niche benchmark reveals AI blind spots in ethnoveterinary knowledge for Nigeria.
Researcher Fatika Umar Ibrahim has released the first evaluation of large language models on African livestock knowledge, using a custom 420-question benchmark covering ethnoveterinary practices, indigenous breed characteristics, disease recognition, and production systems specific to Nigeria. The baseline test of Meta's open-source Llama 3.1 8B model via Groq yielded a 43% accuracy score under a 0/1/2 scoring rubric. Questions were drawn from Nigerian veterinary curriculum, published ethnoveterinary literature, and field practice knowledge, ensuring relevance to real-world conditions. This result is significant because most existing AI evals focus on well-documented Western datasets, leaving critical safety gaps for low-resource regions where AI advisory tools are already being deployed.
Ibrahim notes that if AI advisory tools are deployed at scale in African agricultural contexts—and they already are—the absence of domain-specific evals is a real safety gap. Models can pass standard tests yet fail on knowledge that matters to specific populations. The next phase will compare proprietary models including Claude Sonnet, GPT-4o, and Gemini 1.5 Pro, with a full paper to follow. The project is funded via Manifund. This single data point underscores a broader problem: AI safety efforts must account for niche domains with sparse digital documentation. As these models become integrated into agricultural decision-making, their knowledge blind spots could have direct consequences for livestock health and farmer livelihoods.
- Benchmark consists of 420 questions across 6 categories (ethnoveterinary, breeds, disease recognition, production systems) with a 0/1/2 scoring rubric.
- Meta's Llama 3.1 8B achieved only 43% accuracy, highlighting poor performance on non-Western agricultural knowledge.
- Next phase will compare Claude Sonnet, GPT-4o, and Gemini 1.5 Pro; full paper to follow, funded by Manifund.
Why It Matters
AI advisory tools for African agriculture lack domain-specific evals, risking failures that affect real communities.