您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematical Programming > 2017 > 1-2期

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

成果类型：

Article

署名作者：

Wang, Mengdi; Fang, Ethan X.; Liu, Han

署名单位：

Princeton University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park

刊物名称：

MATHEMATICAL PROGRAMMING

ISSN/ISSBN：

0025-5610

DOI：

10.1007/s10107-016-1017-3

发表日期：

2017

页码：

419-449

关键词：

approximation algorithms selection optimization

摘要：

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of and use an auxiliary variable to track the unknown quantity . We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of in the general case and in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of in the general case and in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.