Robust and Risk-Averse Accelerated Gradient Methods

CSML Lunch and Learn Seminar
Date
Nov 21, 2022, 12:30 pm2:30 pm
Location
26 Prospect Ave. Classroom 103
Sponsor
The Center for Statistics and Machine Learning
Event Description

Bio:
Mert Gurbuzbalaban is an associate professor at Rutgers University. Previously, he was an assistant professor at Rutgers University and a postdoctoral associate at the Laboratory for Information and Decision Systems (LIDS) at MIT. He is broadly interested in optimization and computational science, driven by applications in network science and data science. He received his B.Sc. degrees in Electrical Engineering and Mathematics as a valedictorian from Bogazici University, Istanbul, Turkey, the “Diplôme d’ingénieur” degree from École Polytechnique, France, and the M.S. and Ph.D. degrees in Mathematics from the Courant Institute of Mathematical Sciences, New York University.

Dr. Gurbuzbalaban is a recipient of the Dean's Research Award, Dean's Young Researcher Award, and Dean's Summer Fellowship Award at the Rutgers Business School in 2021 and a co-recipient of the Honorable Mention for the Best Paper Award at the International Conference in Machine Learning (ICML 2019). He also received the Kurt Friedrichs Prize (given by the Courant Institute of New York University for an outstanding thesis) in 2013, the Bronze Medal in the École Polytechnique Scientific Project Competition in 2006,  the Nadir Orhan Bengisu Award (given by the electrical-electronics engineering department of Bogazici University to the best graduating undergraduate student) in 2005.

Abstract:

In the context of first-order algorithms subject to random gradient noise, we study the trade-offs between the convergence rate (which quantifies how fast the initial conditions are forgotten) and the "risk" of suboptimality, i.e., deviations from the expected suboptimality. We focus on a general class of momentum methods (GMM) that recover popular methods such as gradient descent (GD), accelerated gradient descent (AGD), and the heavy-ball (HB) method as special cases depending on the choice of GMM parameters. We use well-known risk measures "entropic risk" and "entropic value at risk" to quantify the risk of suboptimality. For strongly convex smooth minimization, we first obtain new convergence rate results for GMM with a unified theory that applies to both AGD and HB, improving some of the existing results for HB. We then provide explicit bounds on the entropic risk and entropic value at risk of suboptimality at a given iterate which also provides explicit bounds on the probability that the suboptimality exceeds a given threshold based on Chernoff's inequality. Our results unveil fundamental trade-offs between the convergence rate and the risk of suboptimality and result in Heisenberg-style uncertainty principles. We then plug the entropic risk and convergence rate estimates we obtained in a computationally tractable optimization framework and propose entropic risk-averse GMM (RA-GMM) and entropic risk-averse AGD (RA-AGD) methods that can select the GMM parameters to systematically trade-off the entropic value at risk with the convergence rate. We show that RA-AGD and RA-GMM improve performance on quadratic optimization and logistic regression problems compared to the standard choice of parameters. To our knowledge, our work is the first to resort to coherent measures to design the parameters of momentum methods systematically. We will then discuss how these ideas can be leveraged to develop efficient algorithms for non-convex optimization and min-max optimization problems, including those arising in distributionally robust stochastic optimization.

Hosted by Peter Ramadge.

Lunch from 12:15 p.m., RSVP to [email protected]