Improving Generative Adversarial Network Generalization for Facial Expression Synthesis
New 'RegGAN' model beats six state-of-the-art competitors in human evaluations for realism and identity preservation.
A research team led by Arbish Akram has introduced a novel AI architecture called Regression GAN (RegGAN) that significantly advances the field of facial expression synthesis. The core problem they address is the poor generalization of existing conditional Generative Adversarial Networks (GANs), which often fail when presented with images outside their training distribution, such as celebrity portraits or digital avatars. RegGAN's innovation is a two-stage design: first, a regression layer with local receptive fields learns precise expression details by minimizing reconstruction error, and second, a refinement network trained adversarially enhances the final image's realism. This hybrid approach allows the model to better capture and transfer nuanced facial movements.
Trained on the CFEE dataset, RegGAN was rigorously tested against six state-of-the-art models using four key metrics: Expression Classification Score (ECS), Face Similarity Score (FSS), QualiCLIP, and Fréchet Inception Distance (FID). It achieved top rankings in ECS, FID, and QualiCLIP, and came second in FSS. Most impressively, in human evaluations, RegGAN was judged to surpass the best competing model by 25% in expression quality, 26% in identity preservation, and 30% in overall realism. This demonstrates a major leap in creating AI-generated facial expressions that are both highly expressive and faithful to the original person's identity, even on challenging, unseen data.
- The 'RegGAN' model uses a novel two-component architecture: a regression layer for detail and an adversarial network for realism.
- In human evaluations, it beat the best competitor by 25% in expression quality and 30% in realism.
- It shows strong generalization, performing well on out-of-distribution images like statues and avatar renderings.
Why It Matters
This enables more reliable and realistic AI-generated facial expressions for film, gaming, and virtual communication, even with unfamiliar faces.