An End-to-End Framework for Building Large Language Models for Software Operations
New domain-specific LLM outperforms GPT-4 on root cause analysis tasks
A team of researchers led by Jingkai He has introduced OpsLLM, a framework designed to build large language models specifically for software operations. The approach tackles two key pain points: low-quality operational data and fragmented knowledge. OpsLLM supports both knowledge-based question answering (QA) and root cause analysis (RCA) — the latter being crucial for incident response in cloud and DevOps environments. The framework's core innovation is a domain process reward model (DPRM) that optimizes model outputs during reinforcement learning, making RCA recommendations more reliable.
Experimental results show OpsLLM models (7B, 14B, and 32B parameters) consistently outperform existing open-source and closed-source LLMs. On QA tasks, accuracy improved by 0.2% to 5.7%, while on RCA tasks improvements ranged from 2.7% to an impressive 70.3%. The models also demonstrate strong transferability across different operational scenarios. To accelerate community progress, the team will open-source all three model versions along with a curated 15K fine-tuning dataset, providing a ready-to-use baseline for production operations teams.
- OpsLLM uses a human-in-the-loop pipeline to build high-quality fine-tuning data from raw operational logs
- Domain process reward model (DPRM) boosts root cause analysis accuracy by up to 70.3% over existing LLMs
- Three model sizes (7B, 14B, 32B) and a 15K dataset will be open-sourced
Why It Matters
Gives SRE and DevOps teams a purpose-built, open-source LLM for faster incident diagnosis and resolution