CSM scores 342/400 vs Hindsight's 326/400 on BEAM 100K, a 4.9% relative accuracy gain?

CSM scores 342/400 vs Hindsight's 326/400 on BEAM 100K, a 4.9% relative accuracy gain.

CSM uses 38.2% fewer answer-visible context tokens, improving efficiency despite slower retrieval (29.23s vs 6.23s)?

CSM uses 38.2% fewer answer-visible context tokens, improving efficiency despite slower retrieval (29.23s vs 6.23s).

The developer invites methodological critique before pursuing official leaderboard submission?

The developer invites methodological critique before pursuing official leaderboard submission.

Research & Papers

Context Swarm Memory beats Hindsight on BEAM 100K, uses 38% fewer tokens

r/MachineLearning May 28, 2026

⚡Open-source CSM scores 342/400, outperforming Hindsight by 4% accuracy.

Deep Dive

Context Swarm Memory (CSM) is a new open-source agent-memory system that uses bounded read-only memory shards, query routing, probe/recall/synthesis, cited packets, and explicit Committer-gated writes. In a head-to-head comparison against the accepted Hindsight artifact on the BEAM 100K benchmark, CSM achieved an AMB score of 0.757573 (342 out of 400 correct), surpassing Hindsight's 0.733658 (326 out of 400). Additionally, CSM uses 38.2% fewer answer-visible context tokens, which could reduce inference cost and latency in downstream agent calls.

However, the trade-off is speed: CSM's average retrieval time is 29.23 seconds versus 6.23 seconds for Hindsight. The developer acknowledges this is not an official leaderboard result—only a local accepted-artifact comparison at the 100K scale. They are seeking community feedback on how to strengthen the evaluation methodology before attempting independent replication or official submission to BEAM. The repository and reproducibility notes are publicly available on GitHub and a dedicated site.

Key Points

CSM scores 342/400 vs Hindsight's 326/400 on BEAM 100K, a 4.9% relative accuracy gain.
CSM uses 38.2% fewer answer-visible context tokens, improving efficiency despite slower retrieval (29.23s vs 6.23s).
The developer invites methodological critique before pursuing official leaderboard submission.

Why It Matters

Better agent memory means more accurate, cost-efficient AI workflows—but speed trade-offs need real-world validation.

Read Original Article

Context Swarm Memory beats Hindsight on BEAM 100K, uses 38% fewer tokens

Why It Matters

Related Articles

🚀 Stay Ahead in AI