First PIM-based approach for the memory-bound spMTTKRP operation in tensor decomposition?

First PIM-based approach for the memory-bound spMTTKRP operation in tensor decomposition

Up to 2.37x speedup with PIM-only and 2.64x with heterogeneous CPU+PIM vs. state-of-the-art CPU?

Up to 2.37x speedup with PIM-only and 2.64x with heterogeneous CPU+PIM vs. state-of-the-art CPU

Achieves higher resource efficiency (peak performance fraction) than both CPU and GPU implementations?

Achieves higher resource efficiency (peak performance fraction) than both CPU and GPU implementations

Research & Papers

PRISM uses Processing-In-Memory to accelerate tensor decomposition 2.64x

arXiv cs.DC May 29, 2026

⚡New PIM method speeds up sparse tensor decomposition by over 2.6x on UPMEM hardware.

Deep Dive

Sparse tensor decomposition is fundamental to many machine learning pipelines, but its core operation—spMTTKRP (Sparse Matricized Tensor Times Khatri-Rao Product)—is notoriously memory-bound, limiting performance on conventional processors. PRISM, presented by researchers from Universidade de Lisboa, is the first work to tackle this bottleneck using Processing-In-Memory (PIM) technology, specifically UPMEM's distributed memory system. The approach includes careful exploration of partitioning strategies, number formats (e.g., mixed precision), and kernel optimizations, plus a heterogeneous collaboration mode that splits work between PIM and CPU cores.

On the UPMEM platform, PRISM delivers up to 2.37x speedup over the best CPU implementations when running purely in memory, and up to 2.64x when using heterogeneous CPU+PIM execution. The team also reports that resource consumption efficiency—measured as peak performance fraction usage—is significantly higher than both CPU and GPU alternatives. However, the UPMEM distributed memory system can degrade performance on certain workloads due to data movement overheads. Accepted at IISWC '25, PRISM opens a promising direction for accelerating tensor algebra in memory-constrained AI workloads.

Key Points

First PIM-based approach for the memory-bound spMTTKRP operation in tensor decomposition
Up to 2.37x speedup with PIM-only and 2.64x with heterogeneous CPU+PIM vs. state-of-the-art CPU
Achieves higher resource efficiency (peak performance fraction) than both CPU and GPU implementations

Why It Matters

Tensor decomposition is critical for large-scale ML; PRISM shows PIM can dramatically accelerate memory-bound operations, paving the way for faster AI training.

Read Original Article

PRISM uses Processing-In-Memory to accelerate tensor decomposition 2.64x

Why It Matters

Related Articles

🚀 Stay Ahead in AI