您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 1993 > 2期

GENERALIZED PEARSON-FISHER CHI-SQUARE GOODNESS-OF-FIT TESTS, WITH APPLICATIONS TO MODELS WITH LIFE-HISTORY DATA

成果类型：

Article

署名作者：

LI, G; DOSS, H

署名单位：

Purdue University System; Purdue University; State University System of Florida; Florida State University

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/aos/1176349151

发表日期：

1993

页码：

772-797

关键词：

randomly censored-data truncated data LARGE-SAMPLE

摘要：

Suppose that X1,...,X(n) are i.i.d. approximately F, and we wish to test the null hypothesis that F is a member of the parametric family F = {F(theta)(x); theta is-an-element-of THETA} where THETA is-an-element-of R(q). The classical Pearson-Fisher chi-square test involves partitioning the real axis into k cells I1,...,I(k) and forming the chi-square statistic X2 = SIGMA(i=1)k(O(i)-nF(theta)(I(i)))2/nF(theta)(I(i)), where O(i) is the number of observations falling into cell i and theta is the value of theta minimizing SIGMA(i=1)k(O(i)-nF(theta)(I(i)))2/nF(theta)(I(i)). We obtain a generalization of this test to any situation for which there is available a nonparametric estimator F of F for which n1/2(F-F)-->d W, where W is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate theta by the value theta that minimizes a ''distance'' between the vectors (F(I1),...,F(I(k))) and (F(theta)(I1),...,F(theta)(I(k))), where distance is measured through an arbitrary positive definite quadratic form, and then form a chi-square type test statistic based on the difference between (F(I1),...,F(I(k))) and (F(theta)(I1),...,F(theta)(I(k))). We prove that this test statistic has asymptotically a chi-square distribution with k-q-1 degrees of freedom, and point out some errors in the literature on chi-square tests in survival analysis. Our procedure is very general and applies to a number of well-known models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation period of the AIDS virus using the CDC's data on blood-transfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.