您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 概率 > The Annals of Applied Probability > 2025 > 2期

THE ODE METHOD FOR ASYMPTOTIC STATISTICS IN STOCHASTIC APPROXIMATION AND REINFORCEMENT LEARNING

成果类型：

Article

署名作者：

Borkar, Vivek; Chen, Shuhang; Devraj, Adithya; Kontoyiannis, Ioannis; Meyn, Sean

署名单位：

Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Bombay; State University System of Florida; University of Florida; Stanford University; University of Cambridge; State University System of Florida; University of Florida

刊物名称：

ANNALS OF APPLIED PROBABILITY

ISSN/ISSBN：

1050-5164

DOI：

10.1214/24-AAP2132

发表日期：

2025

页码：

936-982

关键词：

multiclass queuing-networks markov process expectations borkar-meyn theorem SPECTRAL THEORY iterated logarithm CONVERGENCE STABILITY normality noise time

摘要：

The paper concerns the stochastic approximation recursion, theta n+1 = theta n + alpha n+1f (theta n, Phi n+1), n >= 0, where the estimates {theta n} evolve on Ilgd, and Phi := {Phi n} is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated mean flow ddt theta t =f(theta t) is globally asymptotically stable, with stationary point denoted theta*. The main results are established under additional conditions on the mean flow and an extension of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) A Lyapunov function is constructed for the joint process {theta n, Phi n} that implies convergence of the estimates in L4. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance E[znzTn ] to the asymptotic covariance Sigma Theta in the CLT, where zn := (theta n-theta*)/root alpha n. (iii) The CLT holds for the normalized averaged parameters zPR root n(theta PR n := n-theta*), with theta PR n := n-1 Sigma n k=1 theta k, subject to standard assumptions on the step-size. Moreover, the covariance of zPRn converges to Sigma PR Theta , the minimal covariance of Polyak and Ruppert. (iv) An example is given where f and f are linear in theta, and Phi is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of theta n is unbounded and in fact diverges.