Microsoft at NSDI 2026: Advances in large-scale networked systems
KV cache sharing boosts LLM throughput 4x, LLM testing finds 33 bugs...
Microsoft had 11 papers accepted at NSDI 2026, including DroidSpeak, which enables LLMs to share KV caches for up to 4x higher throughput and faster responses; Eywa, which uses LLMs to automate model-based testing and uncovered 33 bugs (16 previously unknown); and Octopus, which introduces a switch-free CXL memory design—its RPCs are 3.2x faster than in-rack RDMA and 2.4x faster than CXL switches. Other papers like AVA combine event knowledge graphs with agentic retrieval over vision-language models for open-ended video analytics, and Pyrocumulus enables fast, low-overhead live migration for storage-optimized VMs through hardware customizability and efficient network accessibility of the FPGA SmartNIC with LM protocol, architecture, and algorithm designs.
- DroidSpeak: KV cache sharing across fine-tuned LLMs yields 4x higher throughput with minimal quality loss.
- Eywa: Automated model-based testing using LLMs found 33 bugs (16 new) in widely used network protocols.
- Octopus: Switch-free CXL memory pods achieve 3.2x faster RPCs than RDMA, reducing cost and scaling to multi-rack.
Why It Matters
Microsoft drives scalable, efficient infrastructure critical for next-gen AI and cloud services.