arXiv's HTML Papers project hits 75% error-free math with MathML 4
6,000 user reports later, arXiv makes math accessible via HTML and speech.
arXiv's HTML Papers project, which converts every new TeX/LaTeX submission into accessible HTML, has released an update covering developments from 2025 and early 2026. The team—Deyan Ginev, Brian Caruso, Bruce Miller, Jeff Sank, and Jacob Weiskoff—reports community-driven improvements that resolved roughly half of 6,000 user reports, enhancing fidelity and service health. The conversion pipeline now achieves 75% error-free HTML, moving toward a 90% target. A significant new feature is initial MathML 4 Intent annotations, which provide semantic metadata for mathematical expressions, enabling more accurate speech output for visually impaired readers via screen readers.
Under the hood, arXiv is porting its LaTeXML processor from Perl to Rust. This rewrite reduces compute costs and enables faster previews upon submission—critical for a repository serving over 2 million papers. The HTML Papers offering remains experimental but is maturing as the team balances technical opportunities (new standards, AI-assisted conversion) with community needs. The project directly addresses accessibility gaps in STEM, where complex equations have historically been inaccessible to non-visual readers.
- Resolved ~3,000 of 6,000 user reports, improving HTML fidelity and service reliability.
- Reached 75% error-free HTML conversion, targeting 90% for mathematical papers.
- Introduced MathML 4 Intent annotations for accessible speech output, plus a Rust port of LaTeXML for cheaper, faster previews.
Why It Matters
Makes STEM papers accessible to researchers using screen readers, reducing barriers to scientific literature.