Research & Papers

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

r/MachineLearning May 04, 2026

⚡An RL agent trained on 5 minutes of human gameplay learns to navigate Final Fight...

Deep Dive

A developer has trained a reinforcement learning agent on the beat-em-up classic Final Fight using Behavior Cloning (BC) from human demonstrations. The agent was trained purely from 5-minute human gameplay traces, without any reward shaping, and can now clear most of the first stage. However, consistent survival remains a challenge—the agent often dies early or gets stuck on specific enemy patterns. The project highlights several practical hurdles: action space remapping from the agent's MultiBinary output to emulator inputs, trajectory alignment bugs where observation/action offsets caused mismatches, and an LSTM policy that behaved differently during evaluation runs versus manual rollouts. Memory management was also tricky, with the developer implementing efficient rollout handling to avoid loading entire trajectories into RAM.

The developer plans to extend the approach by combining Generative Adversarial Imitation Learning (GAIL) with Proximal Policy Optimization (PPO) to overcome the limitations of pure imitation, aiming for a more robust policy that generalizes beyond the demonstrations. The code is available on GitHub, and the developer is seeking community feedback on improving BC with limited trajectories, best practices for transitioning from BC to PPO, and handling partial observability in arcade environments. For AI researchers and game AI enthusiasts, this project offers a tangible look at the challenges of applying modern imitation learning to a classic game with a complex action space and high frame-to-frame variability.

Key Points

Agent trained via Behavior Cloning from 5 minutes of human demonstrations on Final Fight clears first stage but lacks consistency
Challenges include MultiBinary-to-emulator action mapping, trajectory offset bugs, and LSTM policy divergence between eval and manual rollout
Planned extension to GAIL + PPO to improve generalization beyond imitation; code open-sourced on GitHub

Why It Matters

Shows practical limits of pure imitation on complex arcade games, guiding RL practitioners toward hybrid approaches.

Read Original Article

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Why It Matters

Stay Ahead in AI