您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the American Statistical Association > 2009 > 488期

Median-Based Classifiers for High-Dimensional Data

成果类型：

Article

署名作者：

Hall, Peter; Titterington, D. M.; Xue, Jing-Hao

署名单位：

University of Melbourne; University of Glasgow; University of London; University College London

刊物名称：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

ISSN/ISSBN：

0162-1459

DOI：

10.1198/jasa.2009.tm08107

发表日期：

2009

页码：

1597-1608

关键词：

microarray data data depth shrunken centroids class prediction CLASSIFICATION cancer inference variance

摘要：

Conventional distance-based classifiers use standard Euclidean distance, and so can suffer from excessive volatility if vector components have heavy-tailed distributions. This difficulty can be alleviated by replacing the L-2 distance by its L-1 counterpart. For example, the L-1 version of the popular centroid classifier would allocate a new data value to the population to whose centroid it was closest in L-1 terms. However, this approach can lead to inconsistency, because the centroid is defined using L-2, rather than L-1, distance. In particular, by mixing L-1 and L-2 approaches, we produce a classifier that can seriously misidentify data in cases where the means and medians of marginal distributions take different values. These difficulties motivate replacing centroids by medians. However, in the very-high-dimensional settings commonly encountered today, this can be problematic if we attempt to work with a conventional spatial median. Therefore, we suggest using componentwise medians to construct a robust classifier that is relatively insensitive to the difficulties caused by heavy-tailed data and entails straightforward computation. We also consider generalizations and extensions of this approach based on, for example, using data truncation to achieve additional robustness. Using both empirical and theoretical arguments, we explore the properties of these methods, and show that the resulting classifiers can be particularly effective. Supplementary materials are available online.

来源URL：

访问原文