Developer Tools

trunk/7a197d9f9a48f6c924bf87768fb05cb5ee1879ba: [mem viz] use segment size for envelope to show fragmentation (#180515)

New visualization now accurately shows how fragmentation wastes GPU memory, preventing misleading diagnostics.

Deep Dive

The PyTorch team has released a significant update to their memory visualization tool, fixing a critical issue in how GPU memory fragmentation is displayed. Previously, the "Allocated Memory (incl. Private Pools)" tab showed private memory pools as gray envelopes whose height was determined by peak active allocations. This approach was fundamentally flawed when memory fragmentation occurred—it failed to show when the allocator had to map new physical pages (segment_map events) even though the pool technically had enough total free space. The visualization misleadingly suggested efficient memory use when the actual GPU footprint (reserved memory) was much larger than active allocations.

This commit (7a197d9) changes the envelope height calculation to reflect reserved memory from segment events, accurately showing each pool's true GPU memory consumption. The update modifies process_alloc_data.js to capture segment_map/segment_unmap events during Phase 1 processing and interleaves these events during replay to grow pool envelopes based on max(pool.active, pool.reserved) rather than just pool.active. MemoryViz.js also receives visual enhancements with 3px black borders on pool envelopes (5px when hovered) and fixes stream-elision logic that previously caused display issues. Comprehensive tests verify the new behavior handles edge cases including initial reserved memory, prevents double-counting, and ensures default pools remain unaffected.

The practical impact is substantial: AI researchers and engineers training large models can now accurately visualize when memory fragmentation is causing inefficient GPU utilization. Before this fix, developers might see a memory visualization suggesting their 16GB GPU had 4GB free when fragmentation actually meant only 2GB was usable for large tensors. The corrected visualization helps optimize memory allocation strategies, potentially reducing out-of-memory errors and improving training efficiency for frameworks like PyTorch that power everything from Llama training to Stable Diffusion fine-tuning.

Key Points
  • Fixes envelope height calculation to use segment size instead of peak allocations, showing true GPU memory consumption
  • Captures segment_map/segment_unmap events to accurately reflect when fragmentation causes new physical page mapping
  • Adds visual enhancements with 3px borders and fixes stream-elision logic that caused display issues

Why It Matters

AI developers can now accurately diagnose memory waste, preventing misleading optimizations and reducing out-of-memory errors during training.