Introducing AutoMuon, a one line drop in for AdamW [P]
Replace AdamW with Muon optimizer in one line for 2D weight matrices.
AutoMuon, a new Python package by SkyeGunasekaran, simplifies adopting the Muon optimizer as a drop-in replacement for AdamW in PyTorch training pipelines. The Muon optimizer, originally designed for 2D weight matrices like linear projections and convolutional layers, offers potential efficiency gains but requires manual tuning for other parameter types. AutoMuon automates this by scanning the model at initialization, assigning Muon to 2D matrices and AdamW to embeddings, norms, and biases, enabling seamless integration without code changes beyond a single import.
Currently optimized for transformers and CNNs, AutoMuon may struggle with custom architectures like flash-linear-attention, but the developer welcomes PRs to expand module-type exclusions. Future plans include testing on time series, genomics, and language modeling to gauge Muon's generalizability. The package is installable via pip from GitHub, offering a straightforward path for researchers to experiment with Muon's performance benefits in diverse deep learning tasks.
- AutoMuon automates optimizer assignment: Muon for 2D weight matrices, AdamW for others.
- Designed as a drop-in replacement for AdamW in PyTorch, requiring no code changes.
- Open-source project seeking PRs for broader architecture support and testing.
Why It Matters
AutoMuon lowers the barrier to using Muon, potentially accelerating training for PyTorch models.