Research & Papers

Virtual Processor Auto-Parallelizes Code Without Developer Effort

A decentralized runtime automatically parallelizes array programs for free performance gains.

Deep Dive

Haymo Kutschbach's virtual processor (VP) automatically parallelizes numerical array programs using a decentralized network of cooperative execution segments—no central scheduler or developer annotations required. Each segment independently decides task placement, data movement, and kernel preparation. The VP targets low-latency strong scaling on local heterogeneous hardware and is currently implemented for a domain-specific language's array instruction set. It exploits parallelism across large program regions, not just individual loop bodies or marked sections.

Key Points
  • Decentralized runtime uses cooperative execution segments for local decisions on task placement and data movement.
  • No central scheduler; parallelism is distributed asynchronously and dependency-driven.
  • Currently implemented for Fortran DSL, targeting low-latency strong scaling on heterogeneous hardware (10 pages + 7 figures).

Why It Matters

Automatic parallelization could let developers write sequential code and gain free performance on modern hardware.