How to build scalable web apps with OpenAI's Privacy Filter
No chunking needed: a single forward pass labels 8 PII categories across entire documents.
OpenAI released Privacy Filter on the Hub this week: an open-source personally-identifiable information (PII) detector that labels text across eight categories in a single forward pass over a 128k context. The 1.5B-parameter model with 50M active parameters (Apache 2.0 license) achieves state-of-the-art performance on the PII-Masking-300k benchmark. Categories include private_person, private_address, private_email, private_phone, private_url, private_date, account_number, and secret. Because the entire document fits in one pass, there's no chunking or stitching, and span offsets align directly with rendered text using BIOES decoding.
Developers built three demo apps on gradio.Server to showcase the model's capabilities. The Document Privacy Explorer lets users drop in a PDF or DOCX and see every PII span highlighted by category with a sidebar filter and summary dashboard. The Image Anonymizer uploads screenshots and returns redacted black bars over names, emails, and account numbers, with an editable canvas for manual annotations. SmartRedact Paste lets users share a public URL that serves the redacted version while keeping a private reveal link. All three apps use gradio.Server's queueing, ZeroGPU allocation, and gradio_client SDK for consistent backend behavior.
- Single forward pass over 128k tokens eliminates chunking and span misalignment issues
- 1.5B-parameter model with 50M active parameters achieves SOTA on PII-Masking-300k benchmark
- Three demo apps (Document Explorer, Image Anonymizer, SmartRedact Paste) built on gradio.Server with custom frontends
Why It Matters
Simplifies PII redaction for legal, HR, and compliance teams handling sensitive documents and images.