Can large language models solve compositional tasks? A study of out-of-distribution generalization
Speaker(s): Yiqiao Zhong (University of Wisconsin—Madison)
Time: 15:00-16:00 June 5, 2025
Venue: 智华楼四元厅225
Abstract:
Large language models (LLMs) such as GPT-4 sometimes appeared to be creative, solving novel tasks with a few demonstrations in the prompt. These tasks require the pre-trained models to generalize on distributions different from those from training data---which is known as out-of-distribution generalization. For example, in "symbolized language reasoning" where names/labels are replaced by arbitrary symbols, yet the model can infer the names/labels without any finetuning.
In this talk, I will focus on a pervasive structure within LLMs known as induction heads. By experimenting on a variety of LLMs, I will empirically demonstrate that compositional structure is crucial for Transformers to learn the rules behind training instances and generalize on OOD data. Further, I propose the "common bridge representation hypothesis" where a key intermediate subspace in the embedding space connects components of early layers and those of later layers as a mechanism of composition.
Bio:
Yiqiao Zhong is currently an assistant professor at the University of Wisconsin—Madison, Department of Statistics. Prior to joining UW Madison, Yiqiao was a postdoc at Stanford University, advised by Prof. Andrea Montanari and Prof. David Donoho. His research interest includes analysis of large language models, deep learning theory, and high-dimensional statistics. Yiqiao Zhong obtained his Ph.D. in 2019 from Princeton University, where he was advised by Prof. Jianqing Fan.
Join Tencent Meeting
https://meeting.tencent.com/dm/mM97rQbn9tW9
Meeting ID: 379-171-999