Image & Video

[New Optimizer] 🌹 Rose: low VRAM, easy to use, great results, Apache 2.0

⚡New stateless PyTorch optimizer uses less memory than AdamW8bit while resisting overfitting.

Deep Dive

Independent developer MatthewK78 has publicly released Rose, a novel PyTorch optimizer named in memory of his mother, following two years of research and development. The optimizer's key architectural claim is being stateless, which allows it to use less working memory than even the memory-efficient AdamW8bit. According to the developer, its memory footprint approaches that of plain Stochastic Gradient Descent (SGD) when excluding working memory. Rose is designed for fast convergence and includes features like gradient centralization and stabilization to resist overfitting and improve generalization, though the developer notes benchmarks can be misleading and encourages community testing.

Released under the permissive Apache 2.0 license, Rose is available on GitHub for immediate integration. The package includes support for modern training techniques like BF16 stochastic rounding and decoupled weight decay (AdamW-style). The developer provides a straightforward installation method via pip and example configuration for use with toolkits like ostris/ai-toolkit, suggesting users start by sampling outputs every 128 steps to assess behavior. By forgoing published benchmarks, MatthewK78 is positioning Rose as a practical tool where the quality of the final trained model's output is the ultimate metric, inviting the open-source community to validate its performance across diverse tasks.

Key Points
  • Stateless design uses less VRAM than AdamW8bit, approaching plain SGD memory use.
  • Includes built-in features for overfitting resistance and gradient centralization out of the box.
  • Open-sourced under Apache 2.0 license with easy pip install for immediate community testing.

Why It Matters

Offers a potential drop-in replacement to reduce memory costs and improve model generalization for researchers and engineers training LLMs.