-
作者:Cao, Hongyuan; Chen, Jun; Zhang, Xianyang
作者单位:State University System of Florida; Florida State University; Mayo Clinic; Texas A&M University System; Texas A&M University College Station
摘要:Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship i...
-
作者:Gao, Chao; Zhang, Anderson Y.
作者单位:University of Chicago; University of Pennsylvania
摘要:We propose a general modeling and algorithmic framework for discrete structure recovery that can be applied to a wide range of problems. Under this framework, we are able to study the recovery of clustering labels, ranks of players, signs of regression coefficients, cyclic shifts and even group elements from a unified perspective. A simple iterative algorithm is proposed for discrete structure recovery, which generalizes methods including Lloyd's algorithm and the power method. A linear conver...
-
作者:Roquain, Etienne; Verzelen, Nicolas
作者单位:Universite Paris Cite; Centre National de la Recherche Scientifique (CNRS); Universite Paris Cite; Sorbonne Universite; INRAE; Universite de Montpellier; Institut Agro
摘要:Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the same data set, when possible. We explore this issue in the setting where the null distributions are Gaussian with unknown rescaling parameters (mean and variance) whereas the alternative di...
-
作者:Bradic, Jelena; Fan, Jianqing; Zhu, Yinchu
作者单位:University of California System; University of California San Diego; University of California System; University of California San Diego; Princeton University; Brandeis University; Brandeis University
摘要:Understanding statistical inference under possibly nonsparse high-dimensional models has gained much interest recently. For a given component of the regression coefficient, we show that the difficulty of the problem depends on the sparsity of the corresponding row of the precision matrix of the covariates, not the sparsity of the regression coefficients. We develop new concepts of uniform and essentially uniform nontestability that allow the study of limitations of tests across a broad set of ...
-
作者:Chan, Kin Wai
作者单位:Chinese University of Hong Kong
摘要:Multiple imputation (MI) is a technique especially designed for handling missing data in public-use datasets. It allows analysts to perform incompletedata inference straightforwardly by using several already imputed datasets released by the dataset owners. However, the existing MI tests require either a restrictive assumption on the missing-data mechanism, known as equal odds of missing information (EOMI), or an infinite number of imputations. Some of them also require analysts to have access ...
-
作者:Zhang, Yuan; Xia, Dong
作者单位:University System of Ohio; Ohio State University; Hong Kong University of Science & Technology
摘要:Network method of moments (Ann. Statist. 39 (2011) 2280-2301) is an important tool for nonparametric network inference. However, there has been little investigation on accurate descriptions of the sampling distributions of network moment statistics. In this paper, we present the first higher-order accurate approximation to the sampling CDF of a studentized network moment by Edgeworth expansion. In sharp contrast to classical literature on noiseless U-statistics, we show that the Edgeworth expa...
-
作者:Bhattacharya, Bhaswar B.; Das, Sayan; Mukherjee, Sumit
作者单位:University of Pennsylvania; Columbia University; Columbia University
摘要:Network sampling is an indispensable tool for understanding features of large complex networks where it is practically impossible to search over the entire graph. In this paper, we develop a framework for statistical inference for counting network motifs, such as edges, triangles and wedges, in the widely used subgraph sampling model, where each vertex is sampled independently, and the subgraph induced by the sampled vertices is observed. We derive necessary and sufficient conditions for the c...
-
作者:Ghosal, Promit; Sen, Bodhisattva
作者单位:Massachusetts Institute of Technology (MIT); Columbia University
摘要:In this paper, we study multivariate ranks and quantiles, defined using the theory of optimal transport, and build on the work of Chernozhukov et al. (Ann. Statist. 45 (2017) 223-256) and Hallin et al. (Ann. Statist. 49 (2021) 1139-1165). We study the characterization, computation and properties of the multivariate rank and quantile functions and their empirical counterparts. We derive the uniform consistency of these empirical estimates to their population versions, under certain assumptions....
-
作者:Zhao, Yue; Gijbels, Irene; Van Keilegom, Ingrid
作者单位:KU Leuven; KU Leuven; KU Leuven; University of York - UK
摘要:We consider a multivariate response regression model where each coordinate is described by a location-scale non- or semiparametric regression and where the dependence structure of the noise term is described by a parametric copula. Our goal is to estimate the associated Euclidean copula parameter, given a sample of the response and the covariate. In the absence of the copula sample, the usual oracle ranks are no longer computable. Instead, we study the normal scores estimator for the Gaussian ...
-
作者:Feng, Long; Jiang, Tiefeng; Liu, Binghui; Xiong, Wei
作者单位:Nankai University; Nankai University; University of Minnesota System; University of Minnesota Twin Cities; Northeast Normal University - China; Northeast Normal University - China; University of International Business & Economics
摘要:We consider a testing problem for cross-sectional independence for high-dimensional panel data, where the number of cross-sectional units is potentially much larger than the number of observations. The cross-sectional independence is described through linear regression models. We study three tests named the sum, the max and the max-sum tests, where the latter two are new. The sum test is initially proposed by Breusch and Pagan (1980). We design the max and sum tests for sparse and nonsparse co...