Overview
Deep learning has achieved great success in many applications such as image processing, speech recognition and Go games. However, the reason why deep learning is so powerful remains elusive. The goal of this course is to understand the successes of deep learning by studying and building the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: approximation power of neural networks, optimization for deep learning, generalization error analysis of deep learning and benign-overfitting of overparamterized learning models. Instructor will give lectures the selected topics. Students will present and discuss papers on the reading list, and do a course project.
Prerequisites
CS 260A, STAT 200A and 200B, ECE 236B and 236C, or equivalent courses.
Logistics
- Time: Monday and Wednesday 2:00PM - 3:50PM
- Location: Zoom
- Instructor: Quanquan Gu (Email: qgu at cs dot ucla dot edu)
- Office hours: Tuesday 1:00-2:00PM on Zoom.
- Course Website: https://uclaml.github.io/CS269-Spring2021/
- Course Forum: https://piazza.com/ucla/spring2021/cs269/home (If you haven’t already, sign up here.)
- Gradescope: Entry Code 86D2JJ
Recommended Textbook
There is no required textbook. The following are recommended textbooks:
- [T] Matus Telgarsky, Deep learning theory lecture note, 2020
- [A] Sanjeev Arora et al., Theory of Deep learning book draft, 2020 (Thank Prof. Sanjeev Arora for sharing the latest version of the book draft!)
- [SSBD] Shai Shalev-Shwartz, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
- [MRT] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2012.
- [GBCB] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning. Vol. 1. Cambridge: MIT press, 2016.
- [ZLLS] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Dive into Deep Learning, 2018.
Other Reference
- [SHNGS] Soudry, D., Hoffer, E., Nacson, M. S., Gunasekar, S., & Srebro, N. (2018). The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1), 2822-2878.
- [GLSS] Gunasekar, S., Lee, J., Soudry, D., & Srebro, N. (2018, July). Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning (pp. 1832-1841). PMLR.
- [NLGSSS] Nacson, M. S., Lee, J., Gunasekar, S., Savarese, P. H. P., Srebro, N., & Soudry, D. (2019, April). Convergence of gradient descent on separable data. In 22nd International Conference on Artificial Intelligence and Statistics (pp. 3420-3428). PMLR.
- [DZPS] Du, S. S., Zhai, X., Poczos, B., & Singh, A. (2019). Gradient descent provably optimizes over-parameterized neural networks. ICLR.
- [MMN] Song, M., Montanari, A., & Nguyen, P. (2018). A mean field view of the landscape of two-layers neural networks. Proceedings of the National Academy of Sciences.
- [CB] Chizat, L., & Bach, F. (2018). On the global convergence of gradient descent for over-parameterized models using optimal transport. In NeurIPS.
- [FDZ] Fang, C., Dong, H., & Zhang, T. (2019). Over parameterized two-level neural networks can learn near optimal feature representations. arXiv preprint arXiv:1910.11508.
- [BLLT] Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48), 30063-30070.
- [ZWBGK] Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Benign Overfitting of Constant-Stepsize SGD for Linear Regression. In COLT.
Grading Policy
Grades will be computed based on the following factors:
- Lecture Note Scribe 10%
- Homework 40%
- Paper Presentation 10%
- Project 40%
Schedule
# | Date | Topic | note | scribed note | reading materials | homework |
---|---|---|---|---|---|---|
1 | 3/29 | Introduction | note | scribe note | CH0-1 of [T] | |
2 | 3/31 | Approximation I | note | scribe note | CH2-3 of [T] | |
3 | 4/5 | Approximation II | note | scribe note | CH3-4 of [T] | |
4 | 4/7 | Approximation III | note | scribe note | CH4-5 of [T] | HW1 out |
5 | 4/12 | Implicit Bias of Gradient Descent I | note | scribe note | CH9 of [A],[SHNGS] | |
6 | 4/14 | Implicit Bias of Gradient Descent II | note | scribe note | CH9 of [A],[GLSS,NLGSSS] | |
7 | 4/19 | Clarke Subdifferential and Positive Homogeneity | note | scribe note | CH14 of [T] | HW1 due |
8 | 4/21 | Implicit Bias of Gradient Descent III | note | CH15 of [T] | HW2 out | |
9 | 4/26 | NTK Analysis of NNs I | note | scribe note | CH10 of [A], [DZPS] | |
10 | 4/28 | NTK Analysis of NNs II | note | CH10 of [A], [DZPS] | ||
11 | 5/3 | Lazy Training | note | CH13 of [T] | HW2 due | |
12 | 5/5 | Mean Field Analysis of NNs I | note | [FDZ][MMN] | ||
13 | 5/10 | Mean Field Analysis of NNs II | note | [FDZ][MMN] | HW3 out | |
14 | 5/12 | Mean Field Analysis of NNs III | note | [FDZ][MMN] | ||
15 | 5/17 | Generalization Bounds of DNNs I | note | CH19 of [T] | ||
16 | 5/19 | Generalization Bounds of DNNs II | note | CH21 of [T] | HW3 due, HW4 out | |
17 | 5/24 | Generalization Bounds of DNNs III | note | CH21 of [T] | ||
18 | 5/26 | Paper Presenation | ||||
5/31 | Memorial Day Holiday | HW4 due, HW5 out | ||||
19 | 6/2 | Generalization Bounds of DNNs IV | note | CH21 of [T] | ||
6/11 | HW5 due |
Academic Integrity Policy
Students are encouraged to read the UCLA Student Conduct Code for Academic Integrity.
Lecture Note Scribing
Students are required to scribe one lecture note. The latex template for lecture note will be provided. The scribed lecture notes should be a zip file submitted on CCLE that compiles without errors, and it is due 4 days after the lecture. This note will be graded. For example, if 2 students are assigned to scribe a given lecture, I expect to receive 2 separate notes. The individual notes are primarily for grading purposes (and also to make sure that each student scribes their own lecture notes), while the final version of the lecture note will be posted on the course website, after being proofread and edited by the Instructor.
- The signup sheet for lecture note scribing can be found at here.
- The Latex template for lecture note scribing can be downloaded at here
Homework
There will be about 5 homework assignments. The lowest homework score will be dropped. Homework is required to be written in Latex. Latex homework template will be provided. Unless otherwise indicated, you may talk to other students about the homework problems but each student must hand in their own answers. You also must indicate on each homework with whom you collaborated and cite any other references and sources you use including Internet sites. Homework is worth full credit before the due time. It is worth zero credit after the due time.
- The Latex template for homework can be downloaded at here
Paper Presentation
After each lecture, there will be a few recommended readings. Each student is required to select one paper from the list, and prepare a 20 minutes presentation for the class. One paper can only be presented by one student. Students are expected to prepare the slides by themselves, but the original authors’ slides are allowed to be used with proper citation.
The paper presentation will start from week 5.
Both the instructor and other students will grade the presentation (no self-grading). We will provide the detailed grading criteria later.
- The list of papers for presentation can be found at here.
Project
Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in optimization or machine learning. Therefore, the project should be related to the course content. An expected project include but not limited to
- A novel and sound solution to an interesting problem
- Thorough theoretical analysis of existing deep learning approaches
The best outcome of the project is a manuscript that is publishable in major machine learning conferences (COLT, ICML, NeurIPS, ICLR, AISTATS, UAI etc.) or journals (Journal of Machine Learning Research). The detailed course project guideline can be found at here. Students cannot use their own published work as the course project.
Relevant Courses
There are many other great deep learning theory and statistical learning theory courses. To mention a few:
Matus Telgarsky’s deep learning theory course
Sanjeev Arora’s theoretical deep learning course
Peter Bartlett’s statistical learning theory course