A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

成果类型:
Article
署名作者:
Pu, Shi; Olshevsky, Alex; Paschalidis, Ioannis Ch
署名单位:
Boston University; Boston University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2021.3126253
发表日期:
2022
页码:
5900-5915
关键词:
Convex optimization distributed optimization stochastic gradient descent (SGD) Stochastic Programming
摘要:
This article is concerned with minimizing the average of is cost functions over a network, in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a nonasymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network-independent convergence rate compared to centralized stochastic gradient descent. Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a hard optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.