MuxTune: Efficient Multi-Task LLM Fine-Tuning in Multi-Tenant Datacenters via Spatial-Temporal Backbone Multiplexing
New research shows how to run multiple PEFT tasks concurrently, cutting memory use by 5.29x.
A research team from Shanghai Jiao Tong University and Nanyang Technological University has introduced MuxTune, a novel system designed to dramatically improve the efficiency of running multiple Parameter-Efficient Fine-Tuning (PEFT) tasks concurrently in cloud datacenters. Current approaches deploy separate instances for each fine-tuning job, leading to significant GPU underutilization from small-scale PEFT operators and device stalls from communication delays. MuxTune addresses these inefficiencies through a core innovation: spatial-temporal multiplexing of the LLM backbone across independent tasks, allowing the same model parameters to be shared and reused rather than duplicated.
MuxTune's technical architecture employs a hierarchical co-scheduling scheme with optimizations at the task, operator, and data levels. It fuses tasks using a hybrid of spatial (parallel) and temporal (sequential) multiplexing and orchestrates execution with two-tiered hybrid parallelism. A key technique is chunk-based data alignment, which mitigates the performance impact of inter-task 'ineffective tokens.' In experiments, the system demonstrated a 2.33x throughput increase and a 5.29x reduction in memory usage compared to state-of-the-art baselines. This breakthrough could significantly lower the cost and increase the capacity of fine-tuning-as-a-service offerings from cloud providers like AWS, Azure, and Google Cloud, making customized LLMs more accessible.
- Achieves 2.33x higher throughput for concurrent PEFT tasks by multiplexing the model backbone
- Reduces memory usage by 5.29x compared to deploying separate instances per task
- Uses chunk-based data alignment and hierarchical scheduling to minimize GPU stalls and inefficiency
Why It Matters
Lowers cloud fine-tuning costs and increases capacity, making customized LLMs more scalable for businesses.