Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction
New framework beats state-of-the-art methods on seven benchmark datasets, improving Q8 accuracy for drug discovery.
A research team has introduced MOGP-MMF, a novel AI framework that reformulates protein secondary structure prediction (PSSP) as an automated optimization task. The system employs multi-objective genetic programming (GP) to tackle the core challenges of feature selection and fusion, which are critical for modeling the intricate relationship between protein sequences and their 3D shapes. Unlike traditional deep learning approaches, MOGP-MMF introduces a multi-view, multi-level representation strategy that integrates three distinct data perspectives: evolutionary information (from related proteins), semantic context (from the amino acid sequence), and newly introduced structural views. This comprehensive approach aims to capture the complex logic of protein folding more effectively.
To build an optimal predictive model, the framework leverages an enriched set of genetic programming operators to evolve both linear and nonlinear functions that fuse these multi-view features, capturing high-order interactions while managing complexity. A key innovation is an improved multi-objective GP algorithm that incorporates a knowledge transfer mechanism. This mechanism uses prior evolutionary experience to guide the population toward global optima, effectively navigating the trade-off between prediction accuracy and model complexity. Extensive validation on seven benchmark datasets shows that MOGP-MMF surpasses current state-of-the-art methods, delivering superior performance in metrics like Q8 accuracy (which classifies structures into 8 categories) and structural integrity.
The practical output of MOGP-MMF is not a single model but a diverse set of non-dominated solutions from its evolutionary process. This provides researchers and developers with a flexible toolkit, allowing them to select models optimized for different real-world scenarios, whether prioritizing speed, accuracy, or interpretability. The availability of the source code on GitHub opens the door for the bioinformatics and AI communities to build upon this work, potentially accelerating applications in protein engineering and therapeutic design.
- Integrates evolutionary, semantic, and structural data views for comprehensive protein feature representation.
- Uses an improved multi-objective genetic programming algorithm with knowledge transfer to optimize accuracy-complexity trade-off.
- Outperforms existing methods on seven benchmarks and provides a flexible set of solutions for practical application.
Why It Matters
More accurate protein structure prediction accelerates drug discovery and protein design, directly impacting biotechnology and medicine.