Mutual, information, metric entropy and cumulative relative entropy risk
成果类型:
Article
署名作者:
Haussler, D; Opper, M
署名单位:
University of California System; University of California Santa Cruz; University of Wurzburg
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
发表日期:
1997
页码:
2451-2492
关键词:
stochastic complexity
DENSITY-ESTIMATION
bounds
approximation
CONVERGENCE
Consistency
redundancy
THEOREM
rates
摘要:
Assume (P-theta: theta epsilon Theta) is a set of probability distributions with a common dominating measure on a complete separable metric space Y. A state theta* epsilon Theta is chosen by Nature. A statistician obtains n independent observations Y-1,...,Y-n from Y distributed according to P-theta.. For each time t between 1 and n, based on the observations Y-1,...,Yt-1, the statistician produces an estimated distribution (P) over cap(t) for P-theta* and suffers a loss L(P-theta., (P) over cap(t)). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P-theta*, (P) over cap(t)) is the relative entropy between the true distribution Pg and the estimated distribution IS,. Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter Theta* and the observations Y-1,...,Y-n. New bounds on this mutual information are given in terms of the Laplace transform of the Hellinger distance between pairs of distributions indexed by parameters in Theta. From these, bounds on the cumulative minimax risk are given in terms of the metric entropy of Theta with respect to the Hellinger distance. The assumptions required for these bounds are very general and do not depend on the choice of the dominating measure. They apply to both finite- and infinite-dimensional Theta. They apply in some cases where Y is infinite dimensional, in some cases where Y is not compact, in some cases where the distributions are not smooth and in some parametric cases where asymptotic normality of the posterior distribution fails.