您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 1994 > 2期

ON FINDING OPTIMAL POLICIES FOR MARKOV DECISION CHAINS - A UNIFYING FRAMEWORK FOR MEAN-VARIANCE-TRADEOFFS

成果类型：

Article

署名作者：

HUANG, Y; KALLENBERG, LCM

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.19.2.434

发表日期：

1994

页码：

434-448

关键词：

摘要：

This paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C(xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. This program is in turn reduced to maximizing x(B + thetab) over x in the polytope parametrically in theta. Along the way, under the nonnegative-initial-distribution assumption, we generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.