High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data

成果类型:
Article
署名作者:
Cai, T. Tony; Zhang, Linjun
署名单位:
University of Pennsylvania
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12326
发表日期:
2019
页码:
675-705
关键词:
covariance-matrix estimation CLASSIFICATION regression cancer
摘要:
The paper develops optimality theory for linear discriminant analysis in the high dimensional setting. A data-driven and tuning-free classification rule, which is based on an adaptive constrained l(1)-minimization approach, is proposed and analysed. Minimax lower bounds are obtained and this classification rule is shown to be simultaneously rate optimal over a collection of parameter spaces. In addition, we consider classification with incomplete data under the missingness completely at random model. An adaptive classifier with theoretical guarantees is introduced and the optimal rate of convergence for high dimensional linear discriminant analysis under the missingness completely at random model is established. The technical analysis for the case of missing data is much more challenging than that for complete data. We establish a large deviation result for the generalized sample covariance matrix, which serves as a key technical tool and can be of independent interest. An application to lung cancer and leukaemia studies is also discussed.
来源URL: