Cooperative Coevolution versus Monolithic Evolutionary Search for Semi-Supervised Tabular Classification
New evolutionary AI method matches performance of monolithic models while using cooperative coevolution for semi-supervised learning.
A new research paper by Jamal Toutouh introduces CC-SSL (Cooperative Coevolution for Semi-Supervised Learning), an evolutionary AI method designed for tabular data classification when labeled examples are extremely scarce. The approach uses cooperative coevolution to simultaneously evolve two feature-subset views and a pseudo-labeling policy, creating a more modular alternative to traditional monolithic evolutionary algorithms. In experiments across 25 diverse OpenML datasets with only 1%, 5%, and 10% labeled data fractions, CC-SSL demonstrated its effectiveness in the challenging low-label regime.
Both CC-SSL and its monolithic counterpart EA-SSL significantly outperformed three lightweight semi-supervised learning baselines, with the largest performance separation occurring at just 1% labeled data. While final test performance between the two evolutionary approaches was statistically comparable, EA-SSL showed advantages in search diversity and generations-to-target in multiclass settings. The research provides detailed diagnostics on pseudo-label volume and validation metrics, offering insights into how evolutionary methods handle the semi-supervised learning challenge.
The study, accepted for presentation at the Genetic and Evolutionary Computation Conference 2026, represents an important step toward more efficient machine learning on tabular data—the most common data type in business applications. By achieving strong performance with minimal labeled examples, these evolutionary methods could reduce the data annotation burden that currently limits AI adoption in domains like healthcare, finance, and manufacturing where labeled data is expensive or difficult to obtain.
- CC-SSL method achieves higher median test MacroF1 than lightweight baselines using only 1-10% labeled data on 25 datasets
- Largest performance advantage over baselines occurs at just 1% labeled data, showing effectiveness in extreme low-label regimes
- Evolutionary methods (CC-SSL and EA-SSL) show comparable final performance despite different architectural approaches
Why It Matters
Reduces data labeling costs for tabular AI applications in business, healthcare, and finance where labeled examples are scarce.