ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation
New research combines Bayesian optimization with LLMs to find documents traditional vector search misses.
Researchers Anton Korikov and Scott Sanner have introduced ReBOL, a new framework designed to fix a fundamental flaw in modern AI-powered search. Current retrieval-augmented generation (RAG) systems rely on vector similarity to fetch a top-k set of documents, which are then reranked by a large language model (LLM). This approach can miss crucial information if the initial vector search fails. ReBOL tackles this by replacing the simple vector fetch with a more intelligent, iterative process. It starts by using an LLM to generate multiple query reformulations, then employs Bayesian Optimization—a technique for efficiently exploring complex spaces—to model the probability of document relevance.
In practice, ReBOL works in cycles: it selects a diverse batch of documents, uses an LLM (like Gemini-2.5-Flash-Lite or GPT-5.2) to score their relevance to the original query, and updates its internal model to guide the next batch selection. This creates a feedback loop that actively seeks out the most useful information. On five standard BEIR benchmark datasets, ReBOL consistently outperformed state-of-the-art LLM rerankers. For example, on the Robust04 dataset, it achieved a recall@100 of 46.5%, a significant 33% improvement over the best baseline's 35.0%. It also maintained competitive ranking quality (NDCG@10) and was shown to achieve comparable latency, making it a practical upgrade.
The key innovation is moving beyond a single-pass retrieval. By treating document retrieval as an optimization problem to be solved with sequential, LLM-guided decisions, ReBOL can uncover documents that would be buried by simple cosine similarity in a vector database. This makes it particularly valuable for enterprise and research applications where finding every relevant document is critical, such as in legal discovery or comprehensive literature reviews. The method represents a shift from viewing the LLM as just a reranker to using it as the core intelligence driving the entire retrieval exploration process.
- Achieved 46.5% recall@100 on Robust04, beating top LLM reranker baseline by 11.5 percentage points (33% relative improvement).
- Uses Bayesian Optimization guided by LLM (GPT-5.2, Gemini) scoring to iteratively find documents vector search misses.
- Maintains competitive latency compared to standard LLM reranking pipelines, making it a viable performance upgrade for RAG systems.
Why It Matters
Dramatically improves the recall of AI search systems, ensuring RAG applications have access to more complete and accurate information.