Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning
A new co-training method uses RL to make LLMs and classical models like Random Forests improve each other.
A team of researchers including Yunshuo Tian and Yijun Zhao has published a paper on Reciprocal Co-Training (RCT), a novel framework designed to solve a core integration problem in AI. Large language models (LLMs) like GPT-4 and classical machine learning models like Random Forests (RF) have complementary strengths but fundamentally different, incompatible training methods. RCT bridges this gap by using reinforcement learning to create an iterative feedback loop between the two model types, allowing them to leverage each other's predictive power.
In practice, RCT reformats tabular data into text for the LLM. The LLM then generates embeddings that are added to the feature space of the Random Forest. In turn, the RF produces calibrated probability estimates that serve as reward signals to guide the reinforcement learning updates for the LLM. This bidirectional adaptation was tested on three medical datasets, where it led to consistent performance gains for both models, with the LLM showing particularly strong improvements. The success hinges on three key components: iterative refinement of the feedback loop, a carefully designed hybrid reward system, and control over the dimensionality of the shared feature space.
The proposed RCT framework is significant because it provides a generalizable mechanism for combining disparate AI architectures. This opens the door to creating more powerful, hybrid AI systems that can tackle complex, real-world problems—like medical diagnosis or financial forecasting—by synthesizing the pattern recognition of LLMs with the robust, structured data analysis of classical models. It represents a step toward a more unified and collaborative AI ecosystem.
- Creates a reinforcement learning feedback loop between an LLM and a Random Forest classifier.
- Converts tabular data to text for the LLM and uses RF probability estimates as RL rewards.
- Demonstrated performance gains on three medical datasets, with the LLM seeing the strongest benefit.
Why It Matters
Enables hybrid AI systems that combine LLM reasoning with classical ML robustness for better predictions in fields like healthcare.