CHEBYSHEV POLYNOMIALS, MOMENT MATCHING, AND OPTIMAL ESTIMATION OF THE UNSEEN

成果类型:
Article
署名作者:
Wu, Yihong; Yang, Pengkun
署名单位:
Yale University; University of Illinois System; University of Illinois Urbana-Champaign
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/17-AOS1665
发表日期:
2019
页码:
857-883
关键词:
nonparametric-estimation support size number sample coverage entropy
摘要:
We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least 1/k. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of epsilon k with probability at least 0.1 is within universal constant factors of k/log k log(2) 1/epsilon, which improves the state-of-the-art result of k/epsilon(2) log k in [In Advances in Neural Information Processing Systems (2013) 2157-2165]. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in O(n + log(2) k) time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.