Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
A new hybrid AI method combines reinforcement learning with exact optimization to balance delivery workloads.
A team of researchers from Universitat Politècnica de Catalunya and the Spanish National Research Council has published a novel hybrid algorithm, RL-CMSA (Construct, Merge, Solve & Adapt with Reinforcement Learning), designed to solve the challenging min-max variant of the Multiple Traveling Salesman Problem (mTSP). This optimization problem involves routing multiple salesmen from a common depot to visit all customers while minimizing the longest single tour, a critical metric for workload balance in logistics and delivery services. The proposed method represents a significant advance in combinatorial optimization by effectively merging machine learning guidance with traditional mathematical programming.
The RL-CMSA algorithm operates through an iterative four-phase process: it first constructs diverse initial solutions using probabilistic clustering informed by learned q-values (reinforcement learning signals), merges these routes into a solution pool, then solves a restricted set-covering Mixed-Integer Linear Program (MILP) for exact optimization, and finally adapts both the q-values and the pool through reinforcement and pruning. Computational experiments on random and standard TSPLIB benchmarks demonstrate that RL-CMSA consistently finds near-optimal solutions and surpasses the performance of a state-of-the-art hybrid genetic algorithm within comparable time limits. The performance gap widens as problem scale (number of cities) and complexity (number of salesmen) increase, showcasing the method's robustness and scalability for real-world, large-scale routing applications.
- Hybrid RL-CMSA algorithm combines reinforcement learning-guided construction with exact MILP optimization for the min-max mTSP.
- Outperforms a state-of-the-art hybrid genetic algorithm, with advantages growing on larger instances and with more salesmen.
- Iteratively refines solutions using learned q-values and a self-adapting solution pool through ageing and pruning mechanisms.
Why It Matters
Enables more efficient and balanced logistics planning for delivery fleets, reducing maximum route times and improving operational fairness.