Nonconvex first-order optimization: When can gradient descent escape saddle points in linear time?

Wed, Sep 15, 2021, 4:30 pm
Location: 
Online
Speaker(s): 
Sponsor(s): 
CSML
ECE Korhammer Lecture Series Funds

Please register for this event .

Host: Yuxin Chen and Jason Lee

Abstract:
Many data-driven problems in the modern world involve solving nonconvex optimization problems. The large-scale nature of many of these problems also necessitates the use of first-order optimization methods, i.e., methods that rely only on the gradient information, for computational purposes. But the first-order optimization methods, which include the prototypical gradient descent algorithm, face a major hurdle for nonconvex optimization: since the gradient of a function vanishes at a saddle point, first-order methods can potentially get stuck at the saddle points of the objective function. And while recent works have established that the gradient descent algorithm almost surely escapes the saddle points under some mild conditions, there remains a concern that first-order methods can spend an inordinate amount of time in the saddle neighborhoods. It is in this regard that we revisit the behavior of the gradient descent trajectories within the saddle neighborhoods and ask whether it is possible to provide necessary and sufficient conditions under which these trajectories escape the saddle neighborhoods in linear time. The ensuing analysis relies on precise approximations of the discrete gradient descent trajectories in terms of the spectrum of the Hessian at the saddle point, and it leads to a simple boundary condition that can be readily checked to ensure the gradient descent method escapes the saddle neighborhoods of a class of nonconvex functions in linear time. The boundary condition check also leads to the development of a simple variant of the vanilla gradient descent method, termed Curvature Conditioned Regularized Gradient Descent (CCRGD). The talk concludes with a convergence analysis of the CCRGD algorithm, which includes its rate of convergence to a local minimum of a class of nonconvex optimization problems.

This talk is based on the following two papers of my PhD student Rishabh Dixit:

  1. Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points
  2. Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

 

Bio:
Waheed U. Bajwa, whose research interests include statistical signal processing, high-dimensional statistics, machine learning, inverse problems, and networked systems, is currently a visiting fellow in the Center for Statistics and Machine Learning at Princeton University, and an associate professor in the Department of Electrical and Computer Engineering and an associate member of the graduate faculty of the Department of Statistics at Rutgers University. He has received a number of awards in his career including the US Army Research Office Young Investigator Award (2014), the US National Science Foundation CAREER Award (2015), Rutgers University’s Presidential Merit Award (2016), Rutgers University’s Presidential Fellowship for Teaching Excellence (2017), and Rutgers Engineering Governing Council ECE Professor of the Year Award (2016, 2017, 2019). He is a co-investigator on a work that received the Cancer Institute of New Jersey’s Gallo Award for Scientific Excellence in 2017, a co-author on papers that received Best Student Paper Awards at IEEE IVMSP 2016 and IEEE CAMSAP 2017 workshops, and a Member of the Class of 2015 National Academy of Engineering Frontiers of Engineering Education Symposium. He has also been involved in a number of professional activities, including serving as the Lead Guest Editor for a special issue of IEEE Signal Processing Magazine on “Distributed, Streaming Machine Learning” (2020), a guest editor for a special issue of Proceedings of the IEEE on “Optimization for Data-driven Learning and Control” (2020), a Technical Area Chair of Asilomar Conference on Signals, Systems, and Computers (2018 and 2021), a Technical Co-Chair of IEEE SPAWC 2018 Workshop, the General Chair of 2017 DIMACS Workshop on Distributed Optimization, Information Processing, and Learning, and an Associate Editor of IEEE Signal Processing Letters (2014 - 2017). He is currently serving as a Senior Area Editor for IEEE Signal Processing Letters, an Associate Editor for IEEE Transactions on Signal and Information Processing over Networks, and an elected member of the Sensor Array and Multichannel (SAM) and Signal Processing for Communications and Networking (SPCOM) Technical Committees of the IEEE Signal Processing Society.
 
This seminar is supported by CSML and ECE Korhammer Lecture Series Funds