Power law covariance, the relation to scaling laws, and how to profit
Speaker(s): Elliot Paquette (McGill University)
Time: 10:00–11:00 May 14, 2026
Venue: Zoom ID: 843 6538 7306(Passcode:794302)
Abstract:
One of the foundational ideas in modern machine learning is the scaling hypothesis: that machine learning models will improve in a predictable manner, with each doubling of resources leading to a commensurate improvement in abilities. These were formalized for large language models in the Kaplan et al. scaling laws.
This is an almost entirely empirically observed law, which motivates the development probabilistic models that can explain these laws and to ultimately inform how to answer fundamental questions, such as: what can improve these laws? Or what causes them to break?
One of the ingredients in these scaling laws appears to be intrinsic power law statistical properties of the input data distributions, which are common in both language and vision settings (among others). We’ll discuss some simple, solvable mathematical models exhibiting power law covariance, and show some theoretical work establishing how this propagates through neural networks. After, we’ll look at how different optimization choices respond in these settings (including Nesterov acceleration and muon), and how some scheduling choices, based on the underlying covariance, can be used to improve optimization performance.
Bio:
Elliot Paquette is a researcher in probability theory, and especially theory of high dimensional optimization and machine learning. He received his PhD in 2013 at the University of Washington, in random matrix theory. He was an NSF postdoctoral researcher at Weizmann Institute of Science, working at the interface of random matrix theory and branching processes. In 2016, he joined Ohio State University as an assistant professor. In 2020, he moved to McGill University, department of Mathematics, where he is now associate professor. His work in machine learning theory is founded on rigorous mathematical analysis of simplified models, especially those combining random matrix theory and exact asymptotics of optimization problems.
Join Zoom Meeting
https://us02web.zoom.us/j/84365387306?pwd=ZmiXYiXyqfAsPuKbmLaU4XKzqbN1sl.1
Meeting ID: 843 6538 7306
Passcode: 794302
