THE ODE METHOD FOR ASYMPTOTIC STATISTICS IN STOCHASTIC APPROXIMATION AND REINFORCEMENT LEARNING

成果类型:
Article
署名作者:
Borkar, Vivek; Chen, Shuhang; Devraj, Adithya; Kontoyiannis, Ioannis; Meyn, Sean
署名单位:
Indian Institute of Technology System (IIT System); Indian Institute of Technology (IIT) - Bombay; State University System of Florida; University of Florida; Stanford University; University of Cambridge; State University System of Florida; University of Florida
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
DOI:
10.1214/24-AAP2132
发表日期:
2025
页码:
936-982
关键词:
multiclass queuing-networks markov process expectations borkar-meyn theorem SPECTRAL THEORY iterated logarithm CONVERGENCE STABILITY normality noise time
摘要:
The paper concerns the stochastic approximation recursion, theta n+1 = theta n + alpha n+1f (theta n, Phi n+1), n >= 0, where the estimates {theta n} evolve on Ilgd, and Phi := {Phi n} is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated mean flow ddt theta t =f(theta t) is globally asymptotically stable, with stationary point denoted theta*. The main results are established under additional conditions on the mean flow and an extension of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) A Lyapunov function is constructed for the joint process {theta n, Phi n} that implies convergence of the estimates in L4. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance E[znzTn ] to the asymptotic covariance Sigma Theta in the CLT, where zn := (theta n-theta*)/root alpha n. (iii) The CLT holds for the normalized averaged parameters zPR root n(theta PR n := n-theta*), with theta PR n := n-1 Sigma n k=1 theta k, subject to standard assumptions on the step-size. Moreover, the covariance of zPRn converges to Sigma PR Theta , the minimal covariance of Polyak and Ruppert. (iv) An example is given where f and f are linear in theta, and Phi is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of theta n is unbounded and in fact diverges.