-
作者:Yang, Yun; Wainwright, Martin J.; Jordan, Michael I.
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior ...
-
作者:Jin, Jiashun; Wang, Wanjie
作者单位:Carnegie Mellon University; National University of Singapore
摘要:We consider a clustering problem where we observe feature vectors X-i is an element of R-P, i = 1, 2,..., n, from K possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of p >> n, where classical clustering methods face challenges. We propose Influential Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select a small fraction of features with the largest Kolmogorov Smirnov (KS) scores, obtai...
-
作者:Stepanova, Natalia A.; Tsybakov, Alexandre B.
作者单位:Carleton University; Institut Polytechnique de Paris; ENSAE Paris
-
作者:Devroye, Luc; Lerasle, Matthieu; Lugosi, Gabor; Olivetra, Roberto I.
作者单位:McGill University; Universite Cote d'Azur; Centre National de la Recherche Scientifique (CNRS); Pompeu Fabra University
摘要:We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a nonasymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
-
作者:Fromont, Magalie; Lerasle, Matthieu; Reynaud-Bourett, Patricia
作者单位:Universite de Rennes; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI); Universite Cote d'Azur; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI); Universite de Rennes; Universite Rennes 2
摘要:Starting from a parallel between some minimax adaptive tests of a single null hypothesis, based on aggregation approaches, and some tests of multiple hypotheses, we propose a new second kind error-related evaluation criterion, as the core of an emergent minimax theory for multiple tests. Aggregation based tests, proposed for instance by Baraud [Bernoulli 8 (2002) 577-606], Baraud, Huet and Laurent [Ann. Statist. 31 (2003) 225-251] or Fromont and Laurent [Ann. Statist. 34 (2006) 680-720], are j...
-
作者:Yu, Zhou; Dong, Yuexiao; Shao, Jun
作者单位:East China Normal University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; University of Wisconsin System; University of Wisconsin Madison
摘要:Model-free variable selection has been implemented under the sufficient dimension reduction framework since the seminal paper of Cook [Ann. Statist. 32 (2004) 1062-1092]. In this paper, we extend the marginal coordinate test for sliced inverse regression (SIR) in Cook (2004) and propose a novel marginal SIR utility for the purpose of ultrahigh dimensional feature selection. Two distinct procedures, Dantzig selector and sparse precision matrix estimation, are incorporated to get two versions of...
-
作者:Nadler, Boaz
作者单位:Weizmann Institute of Science
-
作者:Yuan, Ming; Zhou, Ding-Xuan
作者单位:University of Wisconsin System; University of Wisconsin Madison; City University of Hong Kong
摘要:We establish minimax optimal rates of convergence for estimation in a high dimensional additive model assuming that it is approximately sparse. Our results reveal a behavior universal to this class of high dimensional problems. In the sparse regime when the components are sufficiently smooth or the dimensionality is sufficiently large, the optimal rates are identical to those for high dimensional linear regression and, therefore, there is no additional cost to entertain a nonparametric model. ...