View on GitHub

CS269 [Spring2021] Foundations of Deep Learning

Foundations of Deep Learning

Overview

Deep learning has achieved great success in many applications such as image processing, speech recognition and Go games. However, the reason why deep learning is so powerful remains elusive. The goal of this course is to understand the successes of deep learning by studying and building the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: approximation power of neural networks, optimization for deep learning, generalization error analysis of deep learning and benign-overfitting of overparamterized learning models. Instructor will give lectures the selected topics. Students will present and discuss papers on the reading list, and do a course project.

Prerequisites

CS 260A, STAT 200A and 200B, ECE 236B and 236C, or equivalent courses.

Logistics

Time: Monday and Wednesday 2:00PM - 3:50PM
Location: Zoom
Instructor: Quanquan Gu (Email: qgu at cs dot ucla dot edu)
Office hours: Tuesday 1:00-2:00PM on Zoom.
Course Website: https://uclaml.github.io/CS269-Spring2021/
Course Forum: https://piazza.com/ucla/spring2021/cs269/home (If you haven’t already, sign up here.)
Gradescope: Entry Code 86D2JJ

[T] Matus Telgarsky, Deep learning theory lecture note, 2020
[A] Sanjeev Arora et al., Theory of Deep learning book draft, 2020 （Thank Prof. Sanjeev Arora for sharing the latest version of the book draft!）
[SSBD] Shai Shalev-Shwartz, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
[MRT] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2012.
[GBCB] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning. Vol. 1. Cambridge: MIT press, 2016.
[ZLLS] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Dive into Deep Learning, 2018.

Other Reference

[SHNGS] Soudry, D., Hoffer, E., Nacson, M. S., Gunasekar, S., & Srebro, N. (2018). The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1), 2822-2878.
[GLSS] Gunasekar, S., Lee, J., Soudry, D., & Srebro, N. (2018, July). Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning (pp. 1832-1841). PMLR.
[NLGSSS] Nacson, M. S., Lee, J., Gunasekar, S., Savarese, P. H. P., Srebro, N., & Soudry, D. (2019, April). Convergence of gradient descent on separable data. In 22nd International Conference on Artificial Intelligence and Statistics (pp. 3420-3428). PMLR.
[DZPS] Du, S. S., Zhai, X., Poczos, B., & Singh, A. (2019). Gradient descent provably optimizes over-parameterized neural networks. ICLR.
[MMN] Song, M., Montanari, A., & Nguyen, P. (2018). A mean field view of the landscape of two-layers neural networks. Proceedings of the National Academy of Sciences.
[CB] Chizat, L., & Bach, F. (2018). On the global convergence of gradient descent for over-parameterized models using optimal transport. In NeurIPS.
[FDZ] Fang, C., Dong, H., & Zhang, T. (2019). Over parameterized two-level neural networks can learn near optimal feature representations. arXiv preprint arXiv:1910.11508.
[BLLT] Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48), 30063-30070.
[ZWBGK] Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Benign Overfitting of Constant-Stepsize SGD for Linear Regression. In COLT.

Grading Policy

Grades will be computed based on the following factors:

Lecture Note Scribe 10%
Homework 40%
Paper Presentation 10%
Project 40%

Schedule

#	Date	Topic	note	scribed note	reading materials	homework
1	3/29	Introduction	note	scribe note	CH0-1 of [T]
2	3/31	Approximation I	note	scribe note	CH2-3 of [T]
3	4/5	Approximation II	note	scribe note	CH3-4 of [T]
4	4/7	Approximation III	note	scribe note	CH4-5 of [T]	HW1 out
5	4/12	Implicit Bias of Gradient Descent I	note	scribe note	CH9 of [A],[SHNGS]
6	4/14	Implicit Bias of Gradient Descent II	note	scribe note	CH9 of [A],[GLSS,NLGSSS]
7	4/19	Clarke Subdifferential and Positive Homogeneity	note	scribe note	CH14 of [T]	HW1 due
8	4/21	Implicit Bias of Gradient Descent III	note		CH15 of [T]	HW2 out
9	4/26	NTK Analysis of NNs I	note	scribe note	CH10 of [A], [DZPS]
10	4/28	NTK Analysis of NNs II	note		CH10 of [A], [DZPS]
11	5/3	Lazy Training	note		CH13 of [T]	HW2 due
12	5/5	Mean Field Analysis of NNs I	note		[FDZ][MMN]
13	5/10	Mean Field Analysis of NNs II	note		[FDZ][MMN]	HW3 out
14	5/12	Mean Field Analysis of NNs III	note		[FDZ][MMN]
15	5/17	Generalization Bounds of DNNs I	note		CH19 of [T]
16	5/19	Generalization Bounds of DNNs II	note		CH21 of [T]	HW3 due, HW4 out
17	5/24	Generalization Bounds of DNNs III	note		CH21 of [T]
18	5/26	Paper Presenation
	5/31	Memorial Day Holiday				HW4 due, HW5 out
19	6/2	Generalization Bounds of DNNs IV	note		CH21 of [T]
	6/11					HW5 due

Academic Integrity Policy

Students are encouraged to read the UCLA Student Conduct Code for Academic Integrity.

Lecture Note Scribing

Students are required to scribe one lecture note. The latex template for lecture note will be provided. The scribed lecture notes should be a zip file submitted on CCLE that compiles without errors, and it is due 4 days after the lecture. This note will be graded. For example, if 2 students are assigned to scribe a given lecture, I expect to receive 2 separate notes. The individual notes are primarily for grading purposes (and also to make sure that each student scribes their own lecture notes), while the final version of the lecture note will be posted on the course website, after being proofread and edited by the Instructor.

The signup sheet for lecture note scribing can be found at here.
The Latex template for lecture note scribing can be downloaded at here

Homework

There will be about 5 homework assignments. The lowest homework score will be dropped. Homework is required to be written in Latex. Latex homework template will be provided. Unless otherwise indicated, you may talk to other students about the homework problems but each student must hand in their own answers. You also must indicate on each homework with whom you collaborated and cite any other references and sources you use including Internet sites. Homework is worth full credit before the due time. It is worth zero credit after the due time.

The Latex template for homework can be downloaded at here

Paper Presentation

After each lecture, there will be a few recommended readings. Each student is required to select one paper from the list, and prepare a 20 minutes presentation for the class. One paper can only be presented by one student. Students are expected to prepare the slides by themselves, but the original authors’ slides are allowed to be used with proper citation.

The paper presentation will start from week 5.

Both the instructor and other students will grade the presentation (no self-grading). We will provide the detailed grading criteria later.

The list of papers for presentation can be found at here.

Project

Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in optimization or machine learning. Therefore, the project should be related to the course content. An expected project include but not limited to

A novel and sound solution to an interesting problem
Thorough theoretical analysis of existing deep learning approaches

The best outcome of the project is a manuscript that is publishable in major machine learning conferences (COLT, ICML, NeurIPS, ICLR, AISTATS, UAI etc.) or journals (Journal of Machine Learning Research). The detailed course project guideline can be found at here. Students cannot use their own published work as the course project.