Research & Papers

Markov Chain Decoders Fix Heavy-Tail Blindness in VAEs

Standard Gaussian decoders fail on rare events – new PH distribution cuts errors 10x.

Deep Dive

A new paper from researchers at MICS and UNITO identifies a fundamental limitation of modern deep generative models: they struggle to produce heavy-tailed outputs, which are critical in domains like network traffic analysis, risk modeling, and performance evaluation. The team shows that standard Variational Autoencoders (VAEs) using Gaussian decoder likelihoods combined with Lipschitz-constrained neural networks are structurally incapable of generating heavy-tailed distributions. The Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space. This is not just a practical weakness but a theoretical guarantee, proven across a grid of tail indices α ∈ {2,3,5,30} and dimensions d ∈ {1,5,10} using synthetic Pareto data.

As a solution, the authors replace the Gaussian decoder with a Phase-Type (PH) distribution built on Markov chains. PH distributions can approximate any positive-valued distribution, including heavy-tailed families, arbitrarily precisely. The encoder, latent space, and training procedure remain unchanged. In experiments, the PH-based model reduced the tail Kolmogorov-Smirnov distance by up to 6x and extreme quantile error by up to 10x compared to the Gaussian baseline. This principled approach offers a practical path for generative models to handle the rare-but-critical events that often dominate real-world risk and performance scenarios.

Key Points
  • Standard VAEs with Gaussian decoders and Lipschitz constraints cannot produce heavy-tailed outputs due to exponential tail decay and inability to amplify rare events.
  • The proposed Phase-Type decoder is based on Markov chains and can approximate any positive-valued distribution, including heavy-tailed families.
  • Experiments on synthetic Pareto data (tail indices α=2,3,5,30; dimensions d=1,5,10) show PH-based model reduces tail KS distance by 6x and extreme quantile error by 10x.

Why It Matters

Enables generative models for risk modeling, network traffic, and finance where extreme events drive outcomes.