Penalized classification using Fisher's linear discriminant

成果类型:
Article
署名作者:
Witten, Daniela M.; Tibshirani, Robert
署名单位:
University of Washington; University of Washington Seattle; Stanford University
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/j.1467-9868.2011.00783.x
发表日期:
2011
页码:
753-772
关键词:
multiclass cancer-diagnosis shrunken centroids variable selection optimization prediction regression
摘要:
We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p >> n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features. We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L-1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.
来源URL: