ESTIMATING THE NUMBER OF SPECIES - A REVIEW

成果类型:
Review
署名作者:
BUNGE, J; FITZPATRICK, M
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.2307/2290733
发表日期:
1993
页码:
364-373
关键词:
capture recapture experiment population-size nonparametric-estimation parameter-estimation probabilities vary finite population abundance MODEL time VOCABULARY
摘要:
How many kinds are there? Suppose that a population is partitioned into C classes. In many situations interest focuses not on estimation of the relative sizes of the classes, but on estimation of C itself. For example, biologists and ecologists may be interested in estimating the number of species in a population of plants or animals, numismatists may be concerned with estimating the number of dies used to produce an ancient coin issue, and linguists may be interested in estimating the size of an author's vocabulary. In this article we review the problem of statistical estimation of C. Many approaches have been proposed, some purely data-analytic and others based in sampling theory. In the latter case numerous variations have been considered. The population may be finite or infinite. If finite, samples may be taken with replacement (multinomial sampling) or without replacement (hypergeometric sampling), or by Bernoulli sampling; if infinite, sampling may be multinomial or Bernoulli, or the sample may be the result of random Poisson contributions of each class. Given a sampling model, one may approach estimation of C via a parametric or nonparametric formulation; in either case there may be frequentist and Bayesian procedures. We begin by discussing the existing literature on this problem (over 120 references), organizing it by sampling model, population specification, and philosophy of estimation. We find that (a) the problem is quite resistant to statistical solution, essentially because no matter how many classes have been observed, there may still be a large number of very small unobserved classes; (b) many closely related estimation procedures have been developed independently and have not yet been compared; (c) there is not as yet a globally preferable estimator of C, although for some models there is an acceptable estimator (for some not even this is true); and (d) there are promising directions for research to pursue; for example, it appears possible to exploit estimates of the ''coverage'' of the sample (the total proportion of the population represented by the observed classes) to improve the accuracy of estimators of the number of classes. Finally, we make specific recommendations for future research, regarding parametric estimation, coverage-based estimation, resampling methods, Poisson process representation of sampling models, and frequentist decision theory.