Developer Tools

RandSet: Randomized Corpus Reduction for Fuzzing Seed Scheduling

New technique reduces massive seed corpora to just 4-6% of original size while finding 7 more bugs.

Deep Dive

A research team from multiple institutions has introduced RandSet, a breakthrough technique addressing the persistent 'seed explosion' problem in fuzzing—where security testing tools become overwhelmed by massive seed corpora, hindering their ability to select promising test cases. Unlike prior approaches like AFL-Cmin and MinSet that focused on seed prioritization but still suffered from poor diversity or prohibitive overhead, RandSet tackles the issue from a new angle: corpus reduction. The team's key insight was to introduce randomness into the process, formulating it as a set cover problem to compute a small, randomized subset that still covers all features of the entire corpus. This allows fuzzers to schedule seeds from this drastically reduced subset rather than the unwieldy full corpus, effectively mitigating the explosion.

The technical implementation shows remarkable results across three popular fuzzing frameworks: AFL++, LibAFL, and Centipede. Evaluated on standalone programs, FuzzBench, and the Magma benchmark, RandSet achieved average subset ratios of just 4.03% and 5.99%—meaning it reduced corpora to about 1/20th of their original size. Despite this drastic reduction, it delivered significantly more diverse seed selection than other techniques, resulting in a 16.58% coverage gain on standalone programs and up to 3.57% on FuzzBench when using AFL++. Most importantly for security applications, RandSet triggered up to 7 more ground-truth bugs than state-of-the-art methods on Magma, all while introducing minimal runtime overhead of only 1.17%-3.93%. The work is accepted for OOPSLA 2026 and represents a fundamental shift in how fuzzers manage their growing seed collections.

Key Points
  • Reduces fuzzing corpus size to 4.03-5.99% of original while maintaining feature coverage
  • Achieves 16.58% coverage gain and finds up to 7 more bugs than state-of-the-art methods
  • Adds only 1.17-3.93% runtime overhead when implemented on AFL++, LibAFL, and Centipede

Why It Matters

Enables security researchers to fuzz more effectively by managing massive test corpora without sacrificing bug-finding capability.