Research & Papers

[P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated

University project combines pixel and frequency analysis with GradCAM heatmaps to show manipulated facial regions.

Deep Dive

A university research team has open-sourced VeridisQuo, a novel deepfake detection system that addresses a key limitation in current methods by analyzing both spatial and frequency domain artifacts. Most detectors focus solely on pixel-level inconsistencies, but deepfake generation and video compression leave distinct traces in the frequency spectrum. VeridisQuo runs two parallel analysis streams: a standard EfficientNet-B4 convolutional neural network for spatial features and a custom frequency module that applies Fast Fourier Transform (FFT) and Discrete Cosine Transform (DCT) to uncover compression artifacts and spectral inconsistencies. The model then fuses these 2,816-dimensional feature vectors for final classification.

The team trained the 25-million-parameter model on the FaceForensics++ dataset, covering four manipulation methods, using 716,000 face images extracted with YOLOv11n. A key innovation is the integration of GradCAM, which produces visual heatmaps overlaid on video frames to show precisely which facial regions—often jawlines and blending boundaries—triggered the detection. This provides crucial interpretability, allowing users to see the 'why' behind a prediction. While the frequency stream alone underperforms spatial analysis, their fusion proves particularly effective against high-quality fakes where visual artifacts are minimal. The project, completed in about four hours of training on an RTX 3090, is now publicly available on GitHub for community testing and improvement, with plans for evaluation on tougher datasets like Celeb-DF.

Key Points
  • Dual-stream architecture fuses spatial (EfficientNet-B4) and frequency (FFT/DCT) analysis for 2,816 total features
  • Generates interpretable GradCAM heatmaps to visually highlight manipulated facial regions like jawlines
  • Trained on 716K faces from FaceForensics++ in 4 hours, showing improved detection on high-quality fakes

Why It Matters

Provides an open-source, interpretable tool to combat increasingly sophisticated deepfakes, crucial for media verification and trust.