-
作者:Barber, Rina Foygel; Candes, Emmanuel J.
作者单位:University of Chicago; Stanford University
摘要:This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of th...
-
作者:Han, Qiyang; Wang, Tengyao; Chatterjee, Sabyasachi; Samworth, Richard J.
作者单位:University of Washington; University of Washington Seattle; University of Cambridge; University of Chicago; University of Illinois System; University of Illinois Urbana-Champaign; University of Cambridge
摘要:We study the least squares regression function estimator over the class of real-valued functions on [0, 1](d) that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order n(-min{2/(d+2),1/d} ) in the empirical L-2 loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise consta...
-
作者:Song, Yanglei; Fellouris, Georgios
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:The sequential multiple testing problem is considered under two generalized error metrics. Under the first one, the probability of at least k mistakes, of any kind, is controlled. Under the second, the probabilities of at least k(1) false positives and at least k(2) false negatives are simultaneously controlled. For each formulation, the optimal expected sample size is characterized, to a first-order asymptotic approximation as the error probabilities go to 0, and a novel multiple testing proc...
-
作者:Wu, Yihong; Yang, Pengkun
作者单位:Yale University; University of Illinois System; University of Illinois Urbana-Champaign
摘要:We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least 1/k. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of epsilon k with probability at least 0.1 is within universal constant factors of k/log k log(2) 1/epsilon, which improves the state-of-the-art result of k/epsilon(2) log k in [In Advances in Neural Information Processing Systems (2013...
-
作者:Bao, Zhigang
作者单位:Hong Kong University of Science & Technology
摘要:In this paper, we study a high-dimensional random matrix model from nonparametric statistics called the Kendall rank correlation matrix, which is a natural multivariate extension of the Kendall rank correlation coefficient. We establish the Tracy-Widom law for its largest eigenvalue. It is the first Tracy-Widom law for a nonparametric random matrix model, and also the first Tracy-Widom law for a high-dimensional U-statistic.
-
作者:Wei, Yuting; Wainwright, Martin J.; Guntuboyina, Adityanand
作者单位:University of California System; University of California Berkeley
摘要:We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arises in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing and shape-constrained inference in nonparametric statistics. We provide a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in te...
-
作者:Boettcher, Bjoern; Keller-Ressel, Martin; Schilling, Rene L.
作者单位:Technische Universitat Dresden
摘要:We introduce two new measures for the dependence of n >= 2 random variables: distance multivariance and total distance multivariance. Both measures are based on the weighted L-2-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Szekely, Rizzo and Bakirov) from pairs of random variables to n-tuplets of random variables. We show that total distance multivariance can be used to detect the independence...
-
作者:Tong, Xingwei; Gao, Fuqing; Chen, Kani; Cai, Dingjiao; Sun, Jianguo
作者单位:Beijing Normal University; Wuhan University; Hong Kong University of Science & Technology; Henan University of Economics & Law; University of Missouri System; University of Missouri Columbia
摘要:This paper discusses the transformed linear regression with non-normal error distributions, a problem that often occurs in many areas such as economics and social sciences as well as medical studies. The linear transformation model is an important tool in survival analysis partly due to its flexibility. In particular, it includes the Cox model and the proportional odds model as special cases when the error follows the extreme value distribution and the logistic distribution, respectively. Desp...
-
作者:Bachoc, Francois; Leeb, Hannes; Potscher, Benedikt M.
作者单位:Universite de Toulouse; Universite Toulouse III - Paul Sabatier; University of Vienna
摘要:We consider inference post-model-selection in linear regression. In this setting, Berk et al. [Ann. Statist. 41 (2013a) 802-837] recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain nonstandard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to confidence intervals for post-model-selection predictors.
-
作者:Saegusa, Takumi
作者单位:University System of Maryland; University of Maryland College Park
摘要:We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, ...