Minimax optimality and generalization mechanism of foundation models

Time: 2024-04-08 Views: Published By: CMLR

Speaker(s): Taiji Suzuki (The University of Tokyo)

Time: 14:00-15:00 April 8, 2024

Venue: Zoom


In this presentation, I will discuss the learning ability of foundation models such as diffusion models and Transformers from a nonparametric estimation perspective. In the first half, I will present the estimation ability of diffusion models as a distribution estimator. We show that the empirical score matching estimator obtained in the class of deep neural networks achieves the nearly minimax optimal rates in terms of both the total variation distance and the Wasserstein distance, assuming the true density function belongs to the Besov space. Furthermore, we also consider a situation where the support of density lies in a low-dimensional subspace, and then show that the estimator is adaptive to the low dimensionality and achieves the minimax optimal rate corresponding to the intrinsic dimensionality. In the latter half, I will present a nonparametric convergence analysis of transformer networks in a sequence-to-sequence problem. Transformer networks are the fundamental model for recent large language models. They can handle long input sequences and avoid the curse of dimensionality with variable input dimensions. We show that they can adapt to the smoothness property of the true function, even when the smoothness towards each coordinate depends on each different input.



ID:830 7673 7435

Passcode: 145696

Brief bio:

Taiji Suzuki.jpg

Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of “Deep learning theory” team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He worked as an assistant professor in the department of mathematical informatics, the University of Tokyo between 2009 and 2013, and then he was an associate professor in the department of mathematical and computing science, Tokyo Institute of Technology between 2013 and 2017.  He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists’ Prize, and Outstanding Achievement Award in 2017 from the Japan Statistical Society. He is interested in deep learning theory, nonparametric statistics, high dimensional statistics, and stochastic optimization. In particular, he is mainly working on deep learning theory from several aspects such as representation ability, generalization ability and optimization ability.