Single forward pass over 128k tokens eliminates chunking and span misalignment issues?

Single forward pass over 128k tokens eliminates chunking and span misalignment issues

1.5B-parameter model with 50M active parameters achieves SOTA on PII-Masking-300k benchmark?

1.5B-parameter model with 50M active parameters achieves SOTA on PII-Masking-300k benchmark

Three demo apps (Document Explorer, Image Anonymizer, SmartRedact Paste) built on gradio.Server with custom frontends?

Three demo apps (Document Explorer, Image Anonymizer, SmartRedact Paste) built on gradio.Server with custom frontends

Open Source

OpenAI's open-source Privacy Filter detects PII in 128k tokens at once

Hugging Face Blog April 27, 2026

⚡No chunking needed: a single forward pass labels 8 PII categories across entire documents.

Deep Dive

OpenAI released Privacy Filter on the Hub this week: an open-source personally-identifiable information (PII) detector that labels text across eight categories in a single forward pass over a 128k context. The 1.5B-parameter model with 50M active parameters (Apache 2.0 license) achieves state-of-the-art performance on the PII-Masking-300k benchmark. Categories include private_person, private_address, private_email, private_phone, private_url, private_date, account_number, and secret. Because the entire document fits in one pass, there's no chunking or stitching, and span offsets align directly with rendered text using BIOES decoding.

Developers built three demo apps on gradio.Server to showcase the model's capabilities. The Document Privacy Explorer lets users drop in a PDF or DOCX and see every PII span highlighted by category with a sidebar filter and summary dashboard. The Image Anonymizer uploads screenshots and returns redacted black bars over names, emails, and account numbers, with an editable canvas for manual annotations. SmartRedact Paste lets users share a public URL that serves the redacted version while keeping a private reveal link. All three apps use gradio.Server's queueing, ZeroGPU allocation, and gradio_client SDK for consistent backend behavior.

Key Points

Single forward pass over 128k tokens eliminates chunking and span misalignment issues
1.5B-parameter model with 50M active parameters achieves SOTA on PII-Masking-300k benchmark
Three demo apps (Document Explorer, Image Anonymizer, SmartRedact Paste) built on gradio.Server with custom frontends

Why It Matters

Simplifies PII redaction for legal, HR, and compliance teams handling sensitive documents and images.

Read Original Article

OpenAI's open-source Privacy Filter detects PII in 128k tokens at once

Why It Matters

Related Articles

🚀 Stay Ahead in AI