SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients
New libraries replace 'hard' functions like sorting with differentiable versions, enabling gradient flow through previously blocked operations.
Researchers from institutions including the Max Planck Institute have introduced SoftJAX and SoftTorch, two new open-source libraries designed to solve a fundamental problem in automatic differentiation (AD). Frameworks like JAX and PyTorch hit a wall with 'hard' primitives—operations like thresholding, Boolean logic, discrete indexing, and sorting. These operations produce zero or undefined gradients, creating 'dead zones' in the optimization landscape that prevent gradient-based learning from working effectively. While individual research papers have proposed various 'soft' relaxations for specific functions, these solutions have remained scattered and difficult to implement together.
SoftJAX and SoftTorch consolidate this research into feature-complete, practical toolkits. They offer four main categories of drop-in replacements: elementwise operators (e.g., clip, abs), utility methods for fuzzy logic and index manipulation, axiswise operators (e.g., sort, rank) based on techniques like optimal transport, and full support for straight-through gradient estimation. By providing a unified API, the libraries allow developers to seamlessly swap out problematic functions in their existing JAX or PyTorch code, enabling gradients to flow through computational graphs that were previously broken. A practical case study and benchmarking in the paper demonstrate the libraries' effectiveness in real-world differentiable programming scenarios.
- Provides drop-in 'soft' replacements for 'hard' JAX/PyTorch functions like sort, clip, and Boolean logic that yield usable gradients.
- Unifies fragmented research on differentiable relaxations into two accessible, open-source libraries (SoftJAX and SoftTorch).
- Enables gradient-based optimization through previously non-differentiable code blocks, expanding what can be learned end-to-end.
Why It Matters
Enables more complex, end-to-end differentiable programs in ML research and engineering, moving beyond gradient-blocking operations.