Developer Tools

trunk/69418fa06845ce0f364d990bf16060a3419ee013: Fix OOB read in MemoryReadAdapter::read (#181193)

Crafted zip archives could silently corrupt tensor data via memory over-read.

Deep Dive

PyTorch has patched a critical out-of-bounds (OOB) read vulnerability in its MemoryReadAdapter::read() function, which could be exploited by maliciously crafted zip-archive metadata. The bug, identified in commit 69418fa, allowed miniz/PyTorchStreamReader to read past the end of the backing buffer when processing archive entries with manipulated sizes (e.g., comp_size = 0xFFFFFFFF) or local-header filename/extra_len offsets. Because the function always returned the requested byte count without bounds checking, the over-read silently flowed into tensor storage, bypassing miniz's short-read detection.

The fix, authored with Claude and merged in PR #181193, clamps both the position (`pos`) and the read length (`n`) against the buffer's size, matching the contract already implemented in FileAdapter::read. The same correction was applied to a duplicate MemoryReadAdapter in the Android JNI shim. A regression test was added to prevent recurrence. The vulnerability was reported internally via T259151654 and approved by atalman. The patch follows a related fix in PR #181192.

Key Points
  • MemoryReadAdapter::read() had no bounds check, allowing OOB reads via crafted zip metadata
  • Attacker could specify comp_size = 0xFFFFFFFF to trigger over-read past buffer end
  • Fix clamps pos and n against size_, matching FileAdapter's contract, with regression test added

Why It Matters

This fix prevents silent data corruption in PyTorch, critical for users loading untrusted model files.