View on GitHub

CS269 [Winter2019] Foundations of Deep Learning


Deep learning has achieved great success in many applications such as image processing, speech recognition and Go games. However, the reason why deep learning is so powerful remains elusive. The goal of this course is to understand the successes of deep learning by studying and building the theoretical foundations of deep learning. Topics covered by this course include but are not limited to: expressive power of deep learning, optimization for deep learning, generalization performance of deep learning and robustness of deep learning. Instructor will give lectures on advanced topics of statistical learning theory. Students will present and discuss papers on the selected topics, and do a course project.


Two years of college mathematics, including calculus, linear algebra, probability and statistics, and the ability to write computer programs. CS 260 or an equivalent course.


There is no required textbook. The following are recommended textbooks:

  1. [SSBD] Shai Shalev-Shwartz, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
  2. [MRT] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2012.
  3. [GBCB] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning. Vol. 1. Cambridge: MIT press, 2016.
  4. [ZLLS] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Dive into Deep Learning, 2018.

There are many other great statistical learning theory courses. To mention a few:

Peter Bartlett’s statistical learning theory course

Sham Kakade’s statistical learning theory course

Maxim Raginsky’s statistical learning theory course

Grading Policy

Grades will be computed based on the following factors:


# Date Topic scribed note reading materials
    Part I: Statistical Learning Theory    
1 1/7 Introduction lecture1 Chapter 2 in MRT
2 1/9 Concentration Inequalities lecture2 Appendix B in SSBD
3 1/14 Uniform Convergence lecture3 Chapter 26 in SSBD
4 1/16 Symmetrization and Rademacher Complexity lecture4 Chapter 26 in SSBD
5 1/23 Rademacher Complexity cont’d lecture5 Chapter 26 in SSBD
  1/28 Paper presentation    
  1/30 Paper presentation    
6 2/4 Growth Function and VC Dimension lecture6 Chapter 3 in MRT
7 2/6 Sauer’s Lemma and Covering Number lecture7 Chapter 3 in MRT
8 2/11 Chaining and Dudley’s Entropy Integral lecture8  
9 2/13 Generalization Bounds of DNNs I lecture9  
10 2/20 Generalization Bounds of DNNs II lecture10  
11 2/27 Paper presentation    
12 3/4 Paper presentation    
13 3/6 Paper presentation    
14 3/11 Paper presentation    
15 3/13 Paper presentation    

Academic Integrity Policy

Students are encouraged to read the UCLA Student Conduct Code for Academic Integrity.


There will be 5 in-class pop-up quiz for the purpose of reviewing the newly learned concepts. The quizzes are closed book and closed notes. No electronic aids or cheat sheets are allowed.

Lecture Note Scribing

Students are required to scribe one lecture note. The latex template for lecture note will be provided. The scribed lecture notes should be a zip file submitted on CCLE that compiles without errors, and it is due 4 days after the lecture. This note will be graded. For example, if 2 students are assigned to scribe a given lecture, I expect to receive 2 separate notes. The individual notes are primarily for grading purposes (and also to make sure that each student scribes their own lecture notes), while the final version of the lecture note will be posted on the course website, after being proofread and edited by TA and/or the Instructor.


There will be 3 homework assignments during the semester as we cover the corresponding material. Homework consists of both mathematical derivation and algorithm analysis. Homework is required to be written in Latex. Latex homework template will be provided.

Homework assignments will be submitted through Gradescope. (If you didn’t receive the invitation, email TA with your name, UID and account associated email address.) Login via the invite, and submit the homework assignments on time. Homework is worth full credit before the due date. It is worth zero credit after the due date.

Paper Presentation

After each lecture, there will be a few recommended readings. Each student is required to select one paper from the list, and prepare a 25 minutes presentation for the class. One paper can only be presented by one student. Students are expected to prepare the slides by themselves, but the original authors’ slides are allowed to be used with proper citation.

Paper presentation sign up is due in the end of 3rd week of the semester. The paper presentation will start on the 5th week.

Both the instructor and other students will grade the presentation (no self-grading). You can check the detailed grading criteria on the course syllabus.


Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in optimization or machine learning. Therefore, the project should be related to the course content. An expected project consists of

The best outcome of the project is a manuscript that is publishable in major machine learning conferences (COLT, ICML, NeurIPS, ICLR, AISTATS, UAI etc.) or journals (Journal of Machine Learning Research). Students cannot use their own published work, or work under review by the end of winter quarter as the course project. Detailed instruction is available here.