View on GitHub

CS269 [Spring2023] Optimization for Machine Learning

Optimization for Machine Learning

Overview

Optimization is playing a central role in machine learning and deep learning. The goal of this course is to introduce modern optimization algorithms and theory for machine learning. Topics include but are not limited to gradient descent, accelerated gradient descent, stochastic gradient descent, variance reduction, lower bounds, optimization on manifolds, optimization in probability space, implitic bias of optimization algorithms, etc. Instructor will give lectures on the selected topics. Students will do a course project.

Prerequisites

CS 260A, STAT 200A and 200B, ECE 236B and 236C, or equivalent courses.

Logistics

Time: Tuesday and Thursday 2:00PM - 3:50PM
Location: [BOELTER 5252]
Instructor: Quanquan Gu (Email: qgu at cs dot ucla dot edu)
Office hours: Tuesday and Thursday 4:00-4:30PM on [EVI 382]
Course Website: https://uclaml.github.io/CS269-Spring2023/

[N] Yurii Nesterov, Introductory Lectures on Convex Optimization A Basic Course, 2004
[B] Sébastien Bubeck, Convex Optimization: Algorithms and Complexity, 2014.
[BV] Stephen Boyd, and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[NW] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer press, 2006.

Reference

[JZ] Rie Johnson, Tong Zhang, Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013.
[FLLZ] Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang, SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator, NeurIPS 2018.
[WJZLT] Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh, SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms, NeurIPS 2019.
[CDHS] Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford, Lower Bounds for Finding Stationary Points I, Mathematical Programming, 2020.
[ZS] Hongyi Zhang, Suvrit Sra, First-order Methods for Geodesically Convex Optimization, In COLT, 2016.
[ZRS] Hongyi Zhang, Sashank J. Reddi, Suvrit Sra, Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds, In NIPS 2016.
[KS] Anna Korba1, Adil Salim, Sampling as First-Order Optimization over a space of probability measures, ICML 2022 tutorial
[VW] Santosh Vempala, Andre Wibisono, Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices, NeurIPS 2022
[GLSS] Suriya Gunasekar, Jason Lee, Daniel Soudry, and Nathan Srebro. Characterizing implicit bias in terms of optimization geometry, ICML 2018.
[MWGLSS] Edward Moroshko, Blake E Woodworth, Suriya Gunasekar, Jason D Lee, Nati Srebro, and Daniel Soudry. Implicit bias in deep linear classification: Initial- ization scale vs training accuracy. NeurIPS 2020.

Grading Policy

Grades will be computed based on the following factors:

Lecture Note Scribe 20%
Homework 40%
Project 40%

Tentative Schedule

#	Date	Topic	note	scribed note	reading materials
1	4/4	Gradient Desent	note	scribe note	[B]
2	4/6	Nesterov’s AGD	note	scribe note	[N,B]
3	4/11	SGD	note	scribe note	[B]
4	4/13	Variance Reduction	note	scribe note	[JZ]
5	4/18	Variance Reduction	note	scribe note	[FLLZ,WJZLT]
	4/20	Canceled due to travel	note	scribe note	[]
6	4/25	Lower Bound	note	scribe note	[N]
7	4/27	Lower Bound	note		[CDHS]
8	5/2	Lower Bound	note	scribe note	[CDHS,FLLZ]
9	5/4	Optimization on Riemmanian Manifold	note		[ZS,ZRS]
10	5/9	Optimization on Riemmanian Manifold	note		[ZS,ZRS]
11	5/11	Optimization in Probability Space	note		[KS]
	5/16	Canceled due to NeurIPS	note		[]
12	5/18	Optimization in Probability Space	note		[KS]
13	5/23	Optimization in Probability Space	note		[VW]
14	5/25	Optimization in Probability Space	note		[VW]
15	5/30	Implicit Bias of Logistic Regression	note		[GLSS]
16	6/1	Implicit Bias of Logistic Regression			[GLSS]
17	6/6	Implicit Bias of Deep Linear Networks			[MWGLSS]
18	6/8	Implicit Bias of Deep Linear Networks	note		[MWGLSS]

Academic Integrity Policy

Students are encouraged to read the UCLA Student Conduct Code for Academic Integrity.

Lecture Note Scribing

Students are required to scribe one lecture note. The latex template for lecture note will be provided. The scribed lecture notes should be a zip file submitted on CCLE that compiles without errors, and it is due 4 days after the lecture. This note will be graded. For example, if 2 students are assigned to scribe a given lecture, I expect to receive 2 separate notes. The individual notes are primarily for grading purposes (and also to make sure that each student scribes their own lecture notes), while the final version of the lecture note will be posted on the course website, after being proofread and edited by the Instructor.

The signup sheet for lecture note scribing can be found at here.
The Latex template for lecture note scribing can be downloaded at here

Project

Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in optimization or machine learning. Therefore, the project should be related to the course content. An expected project include but not limited to

A novel and sound solution to an interesting problem
Thorough theoretical analysis of existing deep learning approaches

The best outcome of the project is a manuscript that is publishable in major machine learning conferences (COLT, ICML, NeurIPS, ICLR, AISTATS, UAI etc.) or journals (Journal of Machine Learning Research). The detailed course project guideline can be found at here. Students cannot use their own published work as the course project.

Relevant Courses

There are many other optimization for machine learning courses. To mention a few:

Suvrit Sra’s Optimization for Machine Learning

Moritz Hardt’s Convex Optimization and Approximation

Martin Jaggi and Nicolas Flammarion’s Optimization for Machine Learning