Research & Papers

On the role of memorization in learned priors for geophysical inverse problems

New research shows AI models for seismic imaging can just memorize data instead of learning geology.

Deep Dive

A new study by researchers Ali Siahkoohi and Davide Sabeddu tackles a critical, often overlooked problem in applying AI to geophysics. The paper, 'On the role of memorization in learned priors for geophysical inverse problems,' investigates how deep generative models—specifically diffusion models—used to regularize complex tasks like seismic inversion can fail. Because these models are trained via maximum likelihood on inherently scarce datasets of subsurface models, they risk simply memorizing the training examples. This memorization means the model converges to the empirical data distribution rather than learning the broader, underlying geological principles.

The authors demonstrate that when this memorization occurs, the model's posterior distribution—its refined guess after seeing new data—collapses into a reweighted version of the training set. For diffusion models, they derive a closed-form Gaussian mixture prior from memorization. More practically, when applied to a real-world technique like full waveform inversion (FWI), this flaw means the AI's output is essentially performing a 'likelihood-weighted lookup' among its memorized examples, governed by the local Jacobian of the forward operator. The team validated these theoretical predictions on a stylized inverse problem, showing the tangible consequences for the accuracy and generalizability of AI-powered subsurface imaging.

Key Points
  • Deep generative models for seismic inversion can memorize scarce training data instead of learning geology.
  • Memorization leads to a posterior that is just a reweighted lookup of training examples, proven for diffusion models.
  • The flaw was validated on full waveform inversion, impacting reliability of AI-driven subsurface imaging.

Why It Matters

This exposes a fundamental reliability issue for AI in critical fields like oil/gas exploration and carbon storage, where data is limited.