Optimization in Deep Learning/深度学习中的最优化方法
Opening semester: 2023 Fall Semester
Instructor: Kun Yuan/袁坤 (kunyuan@pku.edu.cn)
Course Code: 08403979
Course Credit: 3 points
Overview
Deep learning is pivotal to Artificial Intelligence. The effectiveness of deep learning models relies significantly on the optimization algorithms utilized during their training. This course delves into optimization algorithms in deep learning, organized into four segments: fundamentals (including gradient descent, proximal gradient descent, zeroth-order methods, etc.), basic deep learning algorithms (such as stochastic gradient descent, momentum methods, adaptive SGD, etc.), advanced deep learning algorithms (covering robust deep learning, mixed-precision training, meta-learning, gradient clipping, etc.), and distributed algorithms tailored for large-scale deep learning tasks.
Throughout the duration of the course, students acquire a comprehensive understanding of mathematical principles, develop proficiency in implementing algorithms using PyTorch, and gain practical skills enabling them to optimize deep learning models for accelerated convergence and enhanced performance.
本课程主要讲授深度学习中使用的最优化技术与方法。深度学习是近年来备受关注的人工智能子领域,它需要使用大量数据来训练神经网络以学习复杂模式并进行预测或分类。深度学习模型的性能很大程度上取决于用于训练模型的优化算法。本课程将讨论深度学习中使用的各种优化技术,其内容分为四个部分。第一部分是优化的基础知识,包括梯度下降法、投影梯度法、临近点算子法、零阶优化方法等。第二部分是深度学习中的各类优化算法,包括随机梯度下降、动量下降法、自适应梯度下降法、方差缩减算法等高级优化算法。第三部分是针对重要场景的深度学习算法介绍,包括鲁棒深度学习算法、混合精度算法、元学习算法等。第四部分是针对大规模深度学习任务的分布式训练算法介绍。在整个课程中,学生将掌握优化方法的数学原理,并学会使用深度学习库(如PyTorch)实现优化算法,并能够将这些方法在各种实际任务中加以应用,有效优化深度学习模型,以达到较快的收敛速度和较好的性能。
Large language models in decision intelligence/大语言模型与信息决策
Opening semester: 2024 Spring Semester
Instructor: Kun Yuan/袁坤 (kunyuan@pku.edu.cn)
Course Code: 00334600
Course Credit: 3 points
Pre-requisites:Linear Algebra, Probability Theory/线性代数、概率论
Overview
In recent years, large language models have achieved tremendous success in various tasks such as recognition, understanding, decision-making, and generation. These models significantly enhance their versatility and generalization by undergoing pre-training on a vast amount of unlabeled data. When applied in practical scenarios, large models only require fine-tuning with a small amount of specific data to excel at downstream tasks. This effectively alleviates issues related to complex downstream task applications and limited data resources, significantly standardizes the development process, and lowers the threshold for artificial intelligence applications.
This course will provide a comprehensive exploration of large language models, primarily consisting of three key components. The first part will delve into the theoretical foundations, training methods, and inference techniques of large language models. We will discuss the architectures and principles of different large models and explore the algorithms and technologies employed during the training and inference phases. The second part will focus on the theory and methods of fine-tuning large language models in vertical domains, discussing how to leverage these models to meet personalized requirements in real-world applications. The third part will introduce prompt engineering, examining the impact of prompts on model performance and teaching how to design and optimize prompts to fully unleash the potential of large language models in intelligent decision-making.
近年来,大语言模型在识别、理解、决策、生成等诸多任务中取得了巨大成功。大模型通过在大量无标注数据上进行预训练,显著提高其通用性和泛化性。在实际应用时,大模型只需要利用少量的特定场景数据进行微调即可出色完成下游任务。这有效缓解了下游任务应用场景复杂、数据资源匮乏等问题,显著提高了研发过程的标准化程度,降低了人工智能应用门槛。本课程将对大语言模型进行系统探讨,主要包括三个重要部分。第一部分将深入讲解大语言模型的理论基础、训练方法及推理技术。我们将讨论不同大模型的架构和原理,并探讨它们在训练和推断阶段的算法和技术。第二部分将聚焦于大语言模型在垂直领域中的精调(Fine-tuning)理论和方法,探讨如何利用大模型来满足实际应用中的个性化需求。第三部分将介绍提示词工程(Prompt Engineering),探讨提示词对模型性能的影响,并教授如何设计和优化提示词以充分发挥大语言模型在智能决策中的潜力。