Generalized stochastic Frank-Wolfe algorithm with stochastic substitute gradient for structured convex optimization

成果类型:
Article
署名作者:
Lu, Haihao; Freund, Robert M.
署名单位:
University of Chicago; Massachusetts Institute of Technology (MIT)
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-020-01480-7
发表日期:
2021
页码:
317-349
关键词:
subgradient methods 1st-order methods descent regression selection
摘要:
The stochastic Frank-Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the dependence on the optimality tolerance epsilon in the guaranteed convergence rate for stochastic Frank-Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank-Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of linear prediction property (formally defined in the paper), which is typically present in loss minimization problems in practice. Our method also introduces the notion of a substitute gradient that is a not-necessarily-unbiased estimate of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank-Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank-Wolfe methods for a sufficiently small optimality tolerance, consistent with the theory developed herein.