I Trained an AI to Beat Final Fight… Here’s What Happened [p]
An RL agent trained on 5 minutes of human gameplay learns to navigate Final Fight...
A developer has trained a reinforcement learning agent on the beat-em-up classic Final Fight using Behavior Cloning (BC) from human demonstrations. The agent was trained purely from 5-minute human gameplay traces, without any reward shaping, and can now clear most of the first stage. However, consistent survival remains a challenge—the agent often dies early or gets stuck on specific enemy patterns. The project highlights several practical hurdles: action space remapping from the agent's MultiBinary output to emulator inputs, trajectory alignment bugs where observation/action offsets caused mismatches, and an LSTM policy that behaved differently during evaluation runs versus manual rollouts. Memory management was also tricky, with the developer implementing efficient rollout handling to avoid loading entire trajectories into RAM.
The developer plans to extend the approach by combining Generative Adversarial Imitation Learning (GAIL) with Proximal Policy Optimization (PPO) to overcome the limitations of pure imitation, aiming for a more robust policy that generalizes beyond the demonstrations. The code is available on GitHub, and the developer is seeking community feedback on improving BC with limited trajectories, best practices for transitioning from BC to PPO, and handling partial observability in arcade environments. For AI researchers and game AI enthusiasts, this project offers a tangible look at the challenges of applying modern imitation learning to a classic game with a complex action space and high frame-to-frame variability.
- Agent trained via Behavior Cloning from 5 minutes of human demonstrations on Final Fight clears first stage but lacks consistency
- Challenges include MultiBinary-to-emulator action mapping, trajectory offset bugs, and LSTM policy divergence between eval and manual rollout
- Planned extension to GAIL + PPO to improve generalization beyond imitation; code open-sourced on GitHub
Why It Matters
Shows practical limits of pure imitation on complex arcade games, guiding RL practitioners toward hybrid approaches.