Open Source

How to build scalable web apps with OpenAI's Privacy Filter

No chunking needed: a single forward pass labels 8 PII categories across entire documents.

Deep Dive

OpenAI released Privacy Filter on the Hub this week: an open-source personally-identifiable information (PII) detector that labels text across eight categories in a single forward pass over a 128k context. The 1.5B-parameter model with 50M active parameters (Apache 2.0 license) achieves state-of-the-art performance on the PII-Masking-300k benchmark. Categories include private_person, private_address, private_email, private_phone, private_url, private_date, account_number, and secret. Because the entire document fits in one pass, there's no chunking or stitching, and span offsets align directly with rendered text using BIOES decoding.

Developers built three demo apps on gradio.Server to showcase the model's capabilities. The Document Privacy Explorer lets users drop in a PDF or DOCX and see every PII span highlighted by category with a sidebar filter and summary dashboard. The Image Anonymizer uploads screenshots and returns redacted black bars over names, emails, and account numbers, with an editable canvas for manual annotations. SmartRedact Paste lets users share a public URL that serves the redacted version while keeping a private reveal link. All three apps use gradio.Server's queueing, ZeroGPU allocation, and gradio_client SDK for consistent backend behavior.

Key Points
  • Single forward pass over 128k tokens eliminates chunking and span misalignment issues
  • 1.5B-parameter model with 50M active parameters achieves SOTA on PII-Masking-300k benchmark
  • Three demo apps (Document Explorer, Image Anonymizer, SmartRedact Paste) built on gradio.Server with custom frontends

Why It Matters

Simplifies PII redaction for legal, HR, and compliance teams handling sensitive documents and images.