SYCL-based heterogeneous solvers beat GPU-only by up to 32%
New research shows CPU+GPU combo accelerates linear solvers 32% faster than GPU alone.
Solving large symmetric positive-definite linear systems is critical in applications like Gaussian Process regression. Traditional GPU-only approaches leave CPU resources idle. This work from the University of Stuttgart introduces heterogeneous solvers for both the iterative CG method and direct Cholesky decomposition, built with SYCL for cross-vendor portability. By splitting work between CPU and GPU, the solvers exploit all available compute.
Benchmarked on systems with NVIDIA, AMD, and Intel GPUs, the heterogeneous implementations show up to 32% faster runtime for CG and 29% for Cholesky on large matrices. Even across different GPU vendors, Cholesky runs at least 12% faster than GPU-only baselines. The results demonstrate a practical path to better HPC utilization without proprietary code, using open SYCL standards.
- Heterogeneous CG solver runs up to 32% faster than GPU-only version on large matrices.
- Cholesky decomposition achieves up to 29% speedup and at least 12% across NVIDIA, AMD, and Intel GPUs.
- Uses SYCL for single-source, multi-vendor heterogeneous programming without vendor lock-in.
Why It Matters
Enables faster, vendor-agnostic linear algebra for HPC workloads like machine learning and simulations.