Caspar GPU solver accelerates bundle adjustment 20x for robotics
A new CUDA library boosts nonlinear optimization speed by 5-20x with less memory.
Researchers from (presumably NTNU or similar) have introduced Caspar, a CUDA accelerator that automatically generates high-performance GPU kernels from symbolic Python expressions. By building on the SymForce library, it allows users to define symbolic residual functions using Python and Lie group operations, and then automatically produces optimized CUDA kernels via symbolic differentiation and adaptive reordering. This bridges the gap between the expressiveness of symbolic programming and the raw speed needed for real-time robotics applications.
In benchmarks on the Bundle Adjustment in the Large (BAL) dataset, Caspar demonstrated 5-20x speedup over existing state-of-the-art solvers while using less memory and maintaining comparable accuracy. The adaptive reordering technique optimizes memory access patterns for GPU parallelism, making it ideal for large-scale nonlinear optimization problems common in SLAM, structure from motion, and robot perception. Accepted at ICRA 2026, Caspar is released as an open-source component of the SymForce ecosystem, lowering the barrier for robotics engineers to leverage GPU acceleration without writing low-level CUDA code.
- Caspar auto-generates CUDA kernels from symbolic Python expressions using SymForce
- Achieves 5-20x speedup on bundle adjustment (BAL dataset) vs best alternatives
- Uses adaptive reordering and symbolic differentiation for efficient GPU-based nonlinear optimization
Why It Matters
Speeds robot perception and mapping by making GPU-accelerated symbolic optimization accessible from Python.