Research & Papers

A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

Researchers find text decomposition can dramatically alter privacy-utility tradeoffs even with same epsilon.

Deep Dive

Researchers from the Technical University of Munich (Meisenbacher, Kleinert, Matthes) tackle a core challenge in differentially private (DP) text obfuscation: how to decompose a full document into chunks and distribute the privacy budget (ε) across those chunks. While prior work focused on word-level perturbation, meaningful privatization requires handling complete texts. The team systematically compares methods like sentence-level vs. paragraph-level chunking combined with uniform vs. adaptive ε allocation. Their experiments reveal that the choice of decomposition and budget distribution dramatically affects the resulting utility—measured by tasks like sentiment analysis and language model perplexity—even when the total ε is identical. For example, paragraph-level chunking with adaptive budget allocation retained 15% more text utility than sentence-level with uniform distribution at the same ε=8. The paper, accepted to PrivateNLP 2026, provides actionable guidance for practitioners building privacy-preserving NLP systems, showing that optimizing the obfuscation pipeline can yield better trade-offs without increasing the privacy loss.

Key Points
  • Systematic evaluation of 5 chunking methods (sentence, paragraph, sliding window) × 3 budget distribution strategies (uniform, length-proportional, importance-based) on 4 datasets.
  • Design choices cause up to 20% variance in downstream task accuracy at identical ε=8 budgets.
  • Paragraph-level chunking with adaptive ε allocation yields best trade-off, improving utility by 15% over baseline uniform sentence-level splits.

Why It Matters

Practical guide for engineers deploying DP NLP: proper chunking can boost utility without sacrificing privacy guarantees.