High-Dimensional Variable Selection for Survival Data
成果类型:
Article
署名作者:
Ishwaran, Hemant; Kogalur, Udaya B.; Gorodeski, Eiran Z.; Minn, Andy J.; Lauer, Michael S.
署名单位:
Cleveland Clinic Foundation; Cleveland Clinic Foundation; University of Pennsylvania; National Institutes of Health (NIH) - USA; NIH National Heart Lung & Blood Institute (NHLBI)
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2009.tm08622
发表日期:
2010
页码:
205-217
关键词:
gene-expression profiles
predict survival
CLASSIFICATION
regression
chemotherapy
signature
cancer
MODEL
摘要:
The minimal depth of a maximal subtree IN a dimensionless order statistic measuring the predictiveness of a variable in a survival tree We derive the distribution of the minimal depth and use it lot high-dimensional variable selection using random survival forests In big p and small n problems (where p is the dimension and n Is the sample size). the distribution of the minimal depth reveals a ceiling effect in which a tree simply cannot be grown deep enough to properly identify predictive variables Motivated by this limitation. we develop a new regularized algorithm. termed RSF-Variable Hunting This algorithm exploits maximal subtrees for effective variable selection under such scenarios Several applications are presented demonstrating the methodology. including the problem of gene selection using microarray data In this work we focus only on survival settings. although out methodology also applies to other random forests applications. including regression and classification settings All examples presented here use the R-software package randomSurvivalForest