-
作者:Jiang, Fei; Zhou, Yeqing; Liu, Jianxuan; Ma, Yanyuan
作者单位:University of California System; University of California San Francisco; Tongji University; Syracuse University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:We study estimation and testing in the Poisson regression model with noisy high-dimensional covariates, which has wide applications in analyz-ing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a nonconvex target function to minimize. Treating the high -dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive th...
-
作者:Aragam, Bryon; Yang, Ruiyi
作者单位:University of Chicago; Princeton University
摘要:We study uniform consistency in nonparametric mixture models as well as closely related mixture of regression (also known as mixed regression) models, where the regression functions are allowed to be nonparametric and the error distributions are assumed to be convolutions of a Gaussian density. We construct uniformly consistent estimators under general conditions while simultaneously highlighting several pain points in extending existing point -wise consistency results to uniform results. The ...
-
作者:Bates, Stephen; Candes, Emmanuel; Lei, Lihua; Romano, Yaniv; Sesia, Matteo
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley; Stanford University; Stanford University; Stanford University; Technion Israel Institute of Technology; Technion Israel Institute of Technology; University of Southern California
摘要:This paper studies the construction of p-values for nonparametric out-lier detection, from a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a general framework yielding p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively de-pendent and enable exact false discovery r...
-
作者:Richardson, Thomas S.; Evans, Robin J.; Robins, James M.; Shpitser, Ilya
作者单位:University of Washington; University of Washington Seattle; University of Oxford; Harvard University; Johns Hopkins University
摘要:Conditional independence models associated with directed acyclic graphs (DAGs) may be characterized in at least three different ways: via a factorization, the global Markov property (given by the d-separation crite-rion), and the local Markov property. Marginals of DAG models also imply equality constraints that are not conditional independences; the well-known ???Verma constraint??? is an example. Constraints of this type are used for testing edges, and in a computationally efficient marginal...
-
作者:Belloni, Alexandre; Chen, Mingli; Padilla, Oscar Hernan Madrid; Wang, Zixuan (kevin)
作者单位:Duke University; University of Warwick; University of California System; University of California Los Angeles; Harvard University
摘要:We propose a generalization of the linear panel quantile regression model to accommodate both sparse and dense parts: sparse means that while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approximated by latent factors and their loadings. Such a structure poses problems for traditional sparse estimators, su...
-
作者:Ma, Xinwei; Wang, Jingshen; Wu, Chong
作者单位:University of California System; University of California San Diego; University of California System; University of California Berkeley; University of Texas System; UTMD Anderson Cancer Center
摘要:Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increas-ingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often leads to biased causal effect estimates due to the well-known ???winner???s curse??? phenomenon. To a...
-
作者:Doss, Natalie; Wu, Yihong; Yang, Pengkun; Zhou, Harrison H.
作者单位:Yale University; Tsinghua University
摘要:This paper studies the optimal rate of estimation in a finite Gaussian location mixture model in high dimensions without separation conditions. We assume that the number of components k is bounded and that the centers lie in a ball of bounded radius, while allowing the dimension d to be as large as the sample size n. Extending the one-dimensional result of Heinrich and Kahn (Ann. Statist. 46 (2018) 2844-2870), we show that the minimax rate of estimating the mixing distribution in Wasserstein d...
-
作者:Butucea, Cristina; Mammen, Enno; Ndaoud, Mohamed; Tsybakov, Alexandre B.
作者单位:Institut Polytechnique de Paris; ENSAE Paris; Ruprecht Karls University Heidelberg; ESSEC Business School
摘要:In the pivotal variable selection problem, we derive the exact nonasymptotic minimax selector over the class of all s-sparse vectors, which is also the Bayes selector with respect to the uniform prior. While this optimal selector is, in general, not realizable in polynomial time, we show that its tractable counterpart (the scan selector) attains the minimax expected Hamming risk to within factor 2, and is also exact minimax with respect to the probability of wrong recovery. As a consequence, w...
-
作者:Chandrasekher, Kabir Aladin; Pananjady, Ashwin; Thrampoulidis, Christos
作者单位:Stanford University; University System of Georgia; Georgia Institute of Technology; University System of Georgia; Georgia Institute of Technology; University System of Georgia; Georgia Institute of Technology; University of British Columbia
摘要:We consider a general class of regression models with normally dis-tributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian compari-son theorems to derive a deterministic seque...
-
作者:Bu, Zhiqi; Klusowski, Jason M.; Rush, Cynthia; Su, Weijie J.
作者单位:University of Pennsylvania; Princeton University; Columbia University; University of Pennsylvania
摘要:Sorted l(1) regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this rel-atively new regularization technique improves variable selection by charac-terizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I error and power. Assuming a regime of linea...