View on GitHub

CS269 [Spring2022] Foundations of Deep Learning

Overview

Deep learning has achieved great success in many applications such as image processing, speech recognition and Go games. However, the reason why deep learning is so powerful remains elusive. The goal of this course is to understand the successes of deep learning by studying and building the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: optimization for deep learning, generalization error analysis of deep learning, representation learning and benign-overfitting of overparamterized learning models. Instructor will give lectures on the selected topics. Students will do a course project.

Prerequisites

CS 260A, STAT 200A and 200B, ECE 236B and 236C, or equivalent courses.

Logistics

Time: Tuesday and Thursday 2:00PM - 3:50PM
Location: [BOELTER 5422]
Instructor: Quanquan Gu (Email: qgu at cs dot ucla dot edu)
Office hours: Tuesday and Thursday 4:00-4:30PM on [EVI 382]
Course Website: https://uclaml.github.io/CS269-Spring2021/

[T] Matus Telgarsky, Deep learning theory lecture note, 2020
[A] Sanjeev Arora et al., Theory of Deep learning book draft, 2020 （Thank Prof. Sanjeev Arora for sharing the latest version of the book draft!）
[SSBD] Shai Shalev-Shwartz, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
[MRT] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2012.
[GBCB] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning. Vol. 1. Cambridge: MIT press, 2016.
[ZLLS] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Dive into Deep Learning, 2018.

Reference

[BLLT] Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48), 30063-30070.
[TB] Tsigler, A. & Bartlett, (2020). Benign overfitting in ridge regression. arXiv preprint arXiv:2009.14286.
[ZWBGK2021] Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Benign Overfitting of Constant-Stepsize SGD for Linear Regression. In COLT.
[WZBGK] Wu, J., Zou, D., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression. arXiv preprint arXiv:2110.06198.
[ZWBGFK] Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). The Benefits of Implicit Regularization from SGD in Least Squares Problems. In NeurIPS.
[ZWBGK2022] Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2022). Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime. arXiv preprint arXiv:2203.03159.
[ZCLG] Zou, D., Cao, Y., Li, Y., & Gu, Q. (2021). Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization. arXiv preprint arXiv:2108.11371.
[CCBG] Cao, Y., Chen, Z., Belkin, M., & Gu, Q. (2022). Benign Overfitting in Two-layer Convolutional Neural Networks. arXiv preprint arXiv:2202.06526.

Grading Policy

Grades will be computed based on the following factors:

Attendance 30%
Lecture Note Scribe 30%
Project 40%

Schedule

#	Date	Topic	note	scribed note	reading materials
1	3/29	Introduction	note	scribe note
2	3/31	Benign Overfitting in Linear Regression I	note	scribe note	[BLLT]
3	4/5	Benign Overfitting in Linear Regression II	note	scribe note	[BLLT]
4	4/7	Benign Overfitting in Linear Regression III	note	scribe note	[BLLT]
5	4/12	Benign Overfitting in Ridge Regression I	note	scribe note	[TB]
6	4/14	Benign Overfitting in Ridge Regression II	note	scribe note	[TB]
7	4/19	Benign Overfitting of SGD I	note	scribe note	[ZWBGK2021]
8	4/21	Benign Overfitting of SGD II	note		[ZWBGK2021]
9	4/26	Benign Overfitting of SGD III	note	scribe note	[ZWBGK2021]
10	4/28	Last Iterate Bound of SGD I	note		[WZBGK]
11	5/3	Last Iterate Bound of SGD II	note		[WZBGK]
12	5/5	Last Iterate Bound of SGD III	note		[WZBGK]
13	5/10	Ridge Regression vs SGD	note		[ZWBGFK]
14	5/12	Ridge Regression vs SGD	note		[ZWBGFK]
	5/17	Canceled due to NeurIPS	note
15	5/19	Multi-pass SGD I	note		[ZWBGK2022]
16	5/24	Multi-pass SGD II	note		[ZWBGK2022]
17	5/26	Benign Overfitting of CNNs I			[CCBG]
18	5/31	Benign Overfitting of CNNs II			[CCBG]
19	6/2	Benign Overfitting of CNNs III	note		[CCBG]

Academic Integrity Policy

Students are encouraged to read the UCLA Student Conduct Code for Academic Integrity.

Attendance

There will be a signup sheet in each lecture. Each student can skip at most 2 lectures. The student will lose 3 points for each absence.

Lecture Note Scribing

Students are required to scribe one lecture note. The latex template for lecture note will be provided. The scribed lecture notes should be a zip file submitted on CCLE that compiles without errors, and it is due 4 days after the lecture. This note will be graded. For example, if 2 students are assigned to scribe a given lecture, I expect to receive 2 separate notes. The individual notes are primarily for grading purposes (and also to make sure that each student scribes their own lecture notes), while the final version of the lecture note will be posted on the course website, after being proofread and edited by the Instructor.

The signup sheet for lecture note scribing can be found at here.
The Latex template for lecture note scribing can be downloaded at here

Project

Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in optimization or machine learning. Therefore, the project should be related to the course content. An expected project include but not limited to

A novel and sound solution to an interesting problem
Thorough theoretical analysis of existing deep learning approaches

The best outcome of the project is a manuscript that is publishable in major machine learning conferences (COLT, ICML, NeurIPS, ICLR, AISTATS, UAI etc.) or journals (Journal of Machine Learning Research). The detailed course project guideline can be found at here. Students cannot use their own published work as the course project.