Research & Papers

[New Optimizer] 🌹 Rose: low VRAM, easy to use, great results, Apache 2.0 [P]

⚡A new stateless optimizer matches AdamW accuracy with near-zero memory overhead.

Deep Dive

Matthew K. has released Rose, a new PyTorch optimizer developed over two years and named in memory of his mother. Rose is stateless, meaning it uses 0x memory overhead for optimizer state—less than 8-bit AdamW and as low as plain SGD without momentum. On MNIST benchmarks, Rose achieves 99.34% accuracy in 11 epochs (lr=3e-3), compared to AdamW's 99.30% in 14 epochs (lr=2.5e-3). Notably, Rose shows higher training loss but lower validation loss, indicating better generalization.

Rose's low VRAM footprint makes it ideal for fine-tuning large models on consumer GPUs. It's released under Apache 2.0, and the creator invites community testing to validate results. Early benchmarks suggest Rose converges faster and generalizes better than AdamW, though users should test on their own tasks. The optimizer is available now on GitHub.

Key Points
  • Stateless design: Rose uses 0x memory overhead for optimizer state, beating 8-bit AdamW.
  • On MNIST, Rose hits 99.34% accuracy in 11 epochs vs AdamW's 99.30% in 14 epochs.
  • Apache 2.0 license; available on GitHub for community testing and contributions.

Why It Matters

Rose could democratize large model training by drastically reducing VRAM requirements for fine-tuning.