Combining Trained Models in Reinforcement Learning
589 papers screened, only 15 met criteria for pretrained knowledge reuse in DRL...
A rigorous systematic review published on arXiv (arXiv:2605.02159) by researchers Ujjwal Patil and Javad Ghofrani from Hochschule Bonn-Rhein-Sieg tackles the fragmented literature on knowledge reuse in deep reinforcement learning (DRL). Following PRISMA guidelines, they started with 589 records from IEEE Xplore, ACM Digital Library, and citation tracing, ultimately narrowing to 15 empirical studies that met all eligibility criteria. The review categorizes approaches into transfer learning, distillation, ensemble methods, and federated training, analyzing them across three key factors: source-target similarity, diversity among reused models, and fairness of comparisons.
The review's qualitative synthesis reveals three recurring patterns. First, positive results from reusing pretrained models are mostly confined to scenarios where source and target tasks share substantial structure or where explicit gating or alignment mechanisms are used. Second, evidence for ensembles and federated aggregation is encouraging but sparse, limited to narrow settings and lacking broad validation. Third, compute-matched comparisons against from-scratch single-agent baselines are extremely rare, making it difficult to substantiate claims of efficiency gains. The paper contributes a narrow but internally consistent review scope and proposes a provisional independence spectrum as a hypothesis for future benchmarking, highlighting the urgent need for standardized evaluation protocols in this area.
- Only 15 of 589 initial studies survived strict screening criteria in this PRISMA-guided review.
- Positive transfer results are concentrated in tasks with high structural similarity or explicit gating/alignment mechanisms.
- Compute-matched comparisons against single-agent baselines are rare, undermining claims of efficiency improvements.
Why It Matters
Highlights critical gaps in evaluating pretrained knowledge reuse, urging standardized benchmarks for fairer RL comparisons.