Smoothing spline anova for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy

成果类型:
Article
署名作者:
Wahba, G; Wang, YD; Gu, C; Klein, R; Klein, B
署名单位:
Purdue University System; Purdue University; University of Michigan System; University of Michigan; University of Wisconsin System; University of Wisconsin Madison
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
发表日期:
1995
页码:
1865-1895
关键词:
generalized cross-validation bayesian confidence-intervals adaptive regression splines noisy data Asymptotic Optimality ridge-regression 4-year incidence models progression diagnosis
摘要:
Let y(i), i = 1, ..., n, be independent observations with the density of y(i) of the form h(y(i), f(i)) = exp[y(i)f(i) - b(f(i)) + c(y(i))], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f(i) = f(t(i)), where t = (t(1), ..., t(d)) is an element of J((1)) x ... x J((d)) = J, the J((alpha)) are measurable spaces of rather general form and f is an unknown function on J with some assumed ''smoothness'' properties. Given {y(i), t(i), i = 1, ..., n}, it is desired to estimate f(t) for t in some region of interest contained in J. We develop the fitting of smoothing spline ANOVA models to this data of the form f(t) = C + Sigma(alpha)f(alpha)(t(alpha)) + Sigma(alpha < beta)f(alpha beta)(t(alpha), t(beta)) + .... The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer, in an appropriate function space, of L(y, f) + Sigma(alpha)lambda(alpha)J(alpha)(f(alpha)) + Sigma(alpha < beta)lambda(alpha beta)J(alpha beta)(f(alpha beta)) + ..., where L(y, f) is the negative log likelihood of y = (y(1), ..., y(n))' given f, the J(alpha), J(alpha beta,)... are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool: (1) methods for deciding which terms in the ANOVA decomposition to include (model selection), (2) methods for choosing good Values of the smoothing parameters lambda(alpha), lambda(alpha beta), ..., (3) methods for making confidence statements concerning the estimate, (4) numerical algorithms for the calculations and, finally, (5) public software. In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiologic studies.