-
作者:Blanchard, Gilles; Neuvial, Pierre; Roquain, Etienne
作者单位:University of Potsdam; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI); Universite Federale Toulouse Midi-Pyrenees (ComUE); Universite de Toulouse; Institut National des Sciences Appliquees de Toulouse; Universite Toulouse III - Paul Sabatier; Sorbonne Universite; Centre National de la Recherche Scientifique (CNRS); Universite Paris Cite
摘要:We follow a post hoc, user-agnostic approach to false discovery control in a large-scale multiple testing framework, as introduced by Genovese and Wasserman [J. Amer. Statist. Assoc. 101 (2006) 1408-1417], Goeman and Solari [Statist. Sci. 26 (2011) 584-597]: the statistical guarantee on the number of correct rejections must hold for any set of candidate items, possibly selected by the user after having seen the data. To this end, we introduce a novel point of view based on a family of referenc...
-
作者:Chen, Xi; Zhou, Wen-Xin
作者单位:New York University; University of California System; University of California San Diego
摘要:This paper investigates the theoretical underpinnings of two fundamental statistical inference problems, the construction of confidence sets and large-scale simultaneous hypothesis testing, in the presence of heavy-tailed data. With heavy-tailed observation noise, finite sample properties of the least squares-based methods, typified by the sample mean, are suboptimal both theoretically and empirically. In this paper, we demonstrate that the adaptive Huber regression, integrated with the multip...
-
作者:Abbe, Emmanuel; Fan, Jianqing; Wang, Kaizheng; Zhong, Yiqiao
作者单位:Princeton University; Princeton University; Princeton University
摘要:Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behavi...
-
作者:Cannings, Timothy I.; Berrett, Thomas B.; Samworth, Richard J.
作者单位:University of Edinburgh; University of Cambridge
摘要:We derive a new asymptotic expansion for the global excess risk of a local-k-nearest neighbour classifier, where the choice of k may depend upon the test point. This expansion elucidates conditions under which the dominant contribution to the excess risk comes from the decision boundary of the optimal Bayes classifier, but we also show that if these conditions are not satisfied, then the dominant contribution may arise from the tails of the marginal distribution of the features. Moreover, we p...
-
作者:Xue, Kaijie; Yao, Fang
作者单位:Nankai University; Peking University
摘要:We propose a two-sample test for high-dimensional means that requires neither distributional nor correlational assumptions, besides some weak conditions on the moments and tail properties of the elements in the random vectors. This two-sample test based on a nontrivial extension of the one-sample central limit theorem (Ann. Probab. 45 (2017) 2309-2352) provides a practically useful procedure with rigorous theoretical guarantees on its size and power assessment. In particular, the proposed test...
-
作者:Cai, T. Tony; Wu, Yihong
作者单位:University of Pennsylvania; Yale University
摘要:This paper investigates the fundamental limits for detecting a high-dimensional sparse matrix contaminated by white Gaussian noise from both the statistical and computational perspectives. We consider p x p matrices whose rows and columns are individually k-sparse. We provide a tight characterization of the statistical and computational limits for sparse matrix detection, which precisely describe when achieving optimal detection is easy, hard or impossible, respectively. Although the sparse ma...
-
作者:Li, Haoran; Aue, Alexander; Paul, Debashis; Peng, Jie; Wang, Pei
作者单位:University of California System; University of California Davis; Icahn School of Medicine at Mount Sinai
摘要:We propose a two-sample test for detecting the difference between mean vectors in a high-dimensional regime based on a ridge-regularized Hotelling's T-2. To choose the regularization parameter, a method is derived that aims at maximizing power within a class of local alternatives. We also propose a composite test that combines the optimal tests corresponding to a specific collection of local alternatives. Weak convergence of the stochastic process corresponding to the ridge-regularized Hotelli...
-
作者:Zhao, Qingyuan; Wang, Jingshu; Hemani, Gibran; Bowden, Jack; Small, Dylan S.
作者单位:University of Cambridge; University of Chicago; University of Bristol; University of Exeter; University of Pennsylvania
摘要:Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. MR is being widely used in epidemiology and other related areas of population science. In this paper, we study statistical inference in the increasingly popular two-sample summary-data MR design. We show a linear model for the observed associations approximately holds in a wide variety of settings when all the genetic variants satisfy the exclus...
-
作者:Cai, T. Tony; Han, Xiao; Pan, Guangming
作者单位:University of Pennsylvania; Chinese Academy of Sciences; University of Science & Technology of China, CAS; Nanyang Technological University; Nanyang Technological University
摘要:We study the asymptotic distributions of the spiked eigenvalues and the largest nonspiked eigenvalue of the sample covariance matrix under a general covariance model with divergent spiked eigenvalues, while the other eigenvalues are bounded but otherwise arbitrary. The limiting normal distribution for the spiked sample eigenvalues is established. It has distinct features that the asymptotic mean relies on not only the population spikes but also the nonspikes and that the asymptotic variance in...
-
作者:Kim, Kyongwon; Li, Bing; Yu, Zhou; Li, Lexin
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; East China Normal University; University of California System; University of California Berkeley
摘要:The methodologies of sufficient dimension reduction have undergone extensive developments in the past three decades. However, there has been a lack of systematic and rigorous development of post dimension reduction inference, which has seriously hindered its applications. The current common practice is to treat the estimated sufficient predictors as the true predictors and use them as the starting point of the downstream statistical inference. However, this naive inference approach would gross...