Testing homogeneity in a mixture distribution via the L2 distance between competing models

成果类型:
Article
署名作者:
Charnigo, R; Sun, JY
署名单位:
University of Kentucky; University of Kentucky; University System of Ohio; Case Western Reserve University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214504000000494
发表日期:
2004
页码:
488-498
关键词:
likelihood ratio test components parameter number
摘要:
Ascertaining the number of components in a mixture distribution is an interesting and challenging problem for statisticians. Chen, Chen. and Kalbfleisch recently proposed a modified likelihood ratio test (MLRT), which is distribution-free and locally most powerful, asymptotically. In this article we present a new method for testing whether a finite mixture distribution is homogeneous. Our method, the D test, is based on the L-2 distance between a fitted homogeneous model and a fitted heterogeneous model. For mixture components from standard parametric families, the D-test statistic has a closed-form expression in terms of parameter estimators, whereas likelihood ratio-type test statistics do not; the latter test statistics are nontrivial functions of both the parameter estimators and the full dataset. The convergence rates of the D-test statistic under a null hypothesis of homogeneity and an alternative hypothesis of heterogeneity are established. The D test is shown to be competitive with the MLRT when the Mixture components come from a normal location family. However, in the exponential scale and normal location/scale cases, the relative performances of the D test and the MLRT are mixed. In cases such as these two, we propose to use a weighted D test, in which the measure underlying the L-2 distance is changed to accentuate the disparities between the homogeneous and heterogeneous models. Changing the measure is equivalent to computing the D-test statistic using a weighting function or to transforming the data before conducting the D test. Appropriately weighted D tests are competitive in both the exponential scale and normal location/scale cases. After applying the D test to a dataset in which the observations are measurements of firms' financial performances, we conclude with discussion and remarks.