Videos

Kinetic Theory: Novel Statistical, Stochastic and Analytical Methods: Explicit construction of zero loss minimizers and the interpretability problem in Deep Learning

Presenter
October 21, 2025
Keywords:
  • Kinetic theory and Stochastic particle systems
  • mean field plasma and radiation dynamics
  • Boltzmann and Landau type equations and systems
  • hydrodynamic limits
  • enhanced dissipation
  • quasi-neutral limits
  • swarming and flocking
  • mean-field games
MSC:
  • 35Bxx - Qualitative properties of solutions to partial differential equations
  • 35Lxx - Hyperbolic equations and hyperbolic systems {For global analysis
  • see 58J45}
  • 35Q20 - Boltzmann equations {For fluid mechanics
  • see 76P05
  • for statistical mechanics
  • see 82B40
  • 82C40
  • 82D05}
  • 35Q35 - PDEs in connection with fluid mechanics
  • 35Q40 - PDEs in connection with quantum mechanics
  • 35Q49 - Transport equations {For calculus of variations and optimal control
  • see 49Q22
  • for fluid mechanics
  • see 76F25
  • see 82C70
  • 82D75
  • for operations research
  • see 90B06
  • for mathematical programming
  • see 90C08}
  • 35Q70 - PDEs in connection with mechanics of particles and systems of particles
  • 35Q82 - PDEs in connection with statistical mechanics
  • 35Q83 - Vlasov equations {For statistical mechanics
  • 82D75}
  • 35Q84 - Fokker-Planck equations {For fluid mechanics
  • see 76X05
  • 76W05
  • see 82C31}
  • 35Q89 - PDEs in connection with mean field game theory {For calculus of variations and optimal control
  • see 49N80
  • for game theory
  • see 91A16}
  • 35Q91 - PDEs in connection with game theory
  • economics
  • social and behavioral sciences
  • 35Q92 - PDEs in connection with biology
  • chemistry and other natural sciences
  • 60Gxx - Stochastic processes
  • 60Hxx - Stochastic analysis [See also 58J65]
  • 70Fxx - Dynamics of a system of particles
  • including celestial mechanics
  • 70Lxx - Random and stochastic aspects of the mechanics of particles and systems
  • 82D05 - Statistical mechanics of gases
  • 82D10 - Statistical mechanics of plasmas
Abstract
In this talk, we present some recent results aimed at the rigorous mathematical understanding of how and why supervised learning works. We point out genericness conditions related to reachability of zero loss minimization, in underparametrized versus overparametrized Deep Learning (DL) networks. For underparametrized DL networks, we explicitly construct global, zero loss cost minimizers for sufficiently clustered data. In addition, we derive effective equations governing the cumulative biases and weights, and show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. For overparametrized DL networks, we prove that the gradient descent flow is homotopy equivalent to a geometrically adapted flow that induces a (constrained) Euclidean gradient flow in output space. If a certain rank condition holds, the latter is, upon reparametrization of the time variable, equivalent to simple linear interpolation. This in turn implies zero loss minimization and the phenomenon known as “Neural Collapse”. Moreover, we derive zero loss guarantees, and construct explicit global minimizers for overparametrized deep networks, given generic training data. This is applied to derive deterministic generalization bounds that depend on the geometry of the training and test data, but not on the network architecture. The work presented includes collaborations with Patricia Munoz Ewald, Andrew G. Moore, and C.-K. Kevin Chien.