Noise-Adaptive Optimization Methods and their Robustness Properties

Research Database / FORSCHUNGSDATENBANK

Project 4642767 | Published

Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

Noise-Adaptive Optimization Methods and their Robustness Properties

Third-party funded project
Project title	Noise-Adaptive Optimization Methods and their Robustness Properties
Principal Investigator(s)	Lucchi, Aurelien
Organisation / Research unit	Departement Mathematik und Informatik / Optimization of Machine Learning Systems (Lucchi)
Department	Departement Mathematik und Informatik
Project start	01.06.2022
Probable end	31.05.2026
Status	Active
Abstract	Randomness is regularly present throughout most machine learning (ML) pipelines, including the initialization of the parameters of a model, in the groundtruth labels, or in the optimizer.While these various sources of randomness have been studied in the literature, they are often considered under simplifying assumptions that do not capture the observed empirical evidence in modern deep learning models.In this proposal, we focus on non-convex optimization where a common practice is to assume that the noise is bounded and derive convergence rates to achieve first-order (and sometimes second-order) stationarity. While important, these guarantees have at least three important limitations: i) they do not specifically describe the behavior of the algorithm in some regions of specific interest (e.g. valleys or local minima), ii) they do not fully exploit the properties of the noise to improve convergence, and iii) they do not give a precise characterization of the type of minima that are more likely to be chosen by an optimizer depending, for instance, on its width, height, etc.Crucially, the latter properties are connected to the robustness and generalization of the chosen solution. Understanding such properties is therefore of practical relevance as ML models are now being frequently deployed in real-world applications such as self-driving cars.In this proposal, we define four overarching goals in order to enhance our understanding of the role of stochastic noise in non-convex optimization, as well as to develop new noise-adaptive optimization methods. Each goal corresponds to one work package (WP). i) First, we aim to introduce new tools to characterize the behavior of stochastic methods for optimizing non-convex functions. We will base our analysis on continuous-time representations that have become a popular tool in ML. Our main focus will be to derive analytical expressions for mean exit times from various regions of interest, including saddle points, valleys and local minima.We will address the general noise regime (i.e. not assuming the noise goes to zero as in many existing work). This is of practical relevance in ML, where one can not necessarily assume that the level of noise is small. Instead, large levels of noise have been shown to be beneficial~/citep{zhou2019toward, wei2019noise, xie2021positive}.The tools we will develop are general enough to apply to a broad range of applications in ML as well as other fields of science. We will also conduct an extensive experimental evaluation on modern neural network architectures to validate and orient our theoretical analysis.ii) Second, we will study the effect of changing the properties of the noise, producing novel types of stochastic algorithms that are noise-adaptive. This will include developing new analytical tools for fractional Brownian motion, as well as for self-excited processes where the noise can be driven by a chosen function. Both areas are to the best of our knowledge unexplored in ML.We expect that this direction will lead to new algorithms with faster rates of convergence and more control over the targeted minima.iii) Third, we will use the results developed in i) and ii) to analyze the robustness and generalization properties of stochastic optimization methods.iv) Fourth, we will use the results derived in the first three WPs to analyze the properties of gradient-adaptive methods that are commonly used in ML but also differ from non gradient-adaptive methods in terms of generalization~/citep{wilson2017marginal}. We will also propose a new variant of stochastic optimization methods that are both gradient- and noise-adaptive.Finally, the proposal provides research training opportunities for a diverse group of students. We emphasize it is interdisciplinary in nature as it will involve a collaboration between mathematicians and computer scientists. The tools developed in WP1 and WP2 are of general interest in the mathematical community and will be applied to applications of practical interest in the field of ML. We also aim to organize a conference/workshop mid-proposal to gather mathematicians and computer scientists interested in applications of continuous-time stochastic processes to ML.
Financed by	Swiss National Science Foundation (SNSF)

24/04/2024

Research Database / FORSCHUNGSDATENBANK