您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Biometrika > 2020 > 2期

Ensemble estimation and variable selection with semiparametric regression models

成果类型：

Article

署名作者：

Shin, Sunyoung; Liu, Yufeng; Cole, Stephen R.; Fine, Jason P.

署名单位：

University of Texas System; University of Texas Dallas; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill

刊物名称：

BIOMETRIKA

ISSN/ISSBN：

0006-3444

DOI：

10.1093/biomet/asaa012

发表日期：

2020

页码：

433448

关键词：

PROPORTIONAL HAZARDS MODEL efficient estimation transformation models adaptive lasso likelihood sex men Consistency algorithm partners

摘要：

We consider scenarios in which the likelihood function for a semiparametric regression model factors into separate components, with an efficient estimator of the regression parameter available for each component. An optimal weighted combination of the component estimators, named an ensemble estimator, may be employed as an overall estimate of the regression parameter, and may be fully efficient under uncorrelatedness conditions. This approach is useful when the full likelihood function may be difficult to maximize, but the components are easy to maximize. It covers settings where the nuisance parameter may be estimated at different rates in the component likelihoods. As a motivating example we consider proportional hazards regression with prospective doubly censored data, in which the likelihood factors into a current status data likelihood and a left-truncated right-censored data likelihood. Variable selection is important in such regression modelling, but the applicability of existing techniques is unclear in the ensemble approach. We propose ensemble variable selection using the least squares approximation technique on the unpenalized ensemble estimator, followed by ensemble re-estimation under the selected model. The resulting estimator has the oracle property such that the set of nonzero parameters is successfully recovered and the semiparametric efficiency bound is achieved for this parameter set. Simulations show that the proposed method performs well relative to alternative approaches. Analysis of an AIDS cohort study illustrates the practical utility of the method.