Can large language models solve compositional tasks? A study of out-of-distribution generalization

Time: 2025-06-05 Views: Published By: CMLR

Speaker(s): Yiqiao Zhong (University of Wisconsin—Madison)

Time: 15:00-16:00 June 5, 2025

Venue: 智华楼四元厅225

Abstract:

Large language models (LLMs) such as GPT-4 sometimes appeared to be creative, solving novel tasks with a few demonstrations in the prompt. These tasks require the pre-trained models to generalize on distributions different from those from training data---which is known as out-of-distribution generalization. For example, in "symbolized language reasoning" where names/labels are replaced by arbitrary symbols, yet the model can infer the names/labels without any finetuning.

In this talk, I will focus on a pervasive structure within LLMs known as induction heads. By experimenting on a variety of LLMs, I will empirically demonstrate that compositional structure is crucial for Transformers to learn the rules behind training instances and generalize on OOD data. Further, I propose the "common bridge representation hypothesis" where a key intermediate subspace in the embedding space connects components of early layers and those of later layers as a mechanism of composition.


Bio:

Yiqiao Zhong is currently an assistant professor at the University of Wisconsin—Madison, Department of Statistics. Prior to joining UW Madison, Yiqiao was a postdoc at Stanford University, advised by Prof. Andrea Montanari and Prof. David Donoho. His research interest includes analysis of large language models, deep learning theory, and high-dimensional statistics. Yiqiao Zhong obtained his Ph.D. in 2019 from Princeton University, where he was advised by Prof. Jianqing Fan.


cdcfbe4453f04f0b87d05a115b215f6f.png



Join Tencent Meeting

https://meeting.tencent.com/dm/mM97rQbn9tW9

Meeting ID: 379-171-999