-
作者:Paul, Subhadeep; Chen, Yuguo
作者单位:University System of Ohio; Ohio State University; University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital
摘要:We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these intermediate fusion methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In t...
-
作者:Mogensen, Soren Wengel; Hansen, Niels Richard
作者单位:University of Copenhagen
摘要:Symmetric independence relations are often studied using graphical representations. Ancestral graphs or acyclic directed mixed graphs with m-separation provide classes of symmetric graphical independence models that are closed under marginalization. Asymmetric independence relations appear naturally for multivariate stochastic processes, for instance, in terms of local independence. However, no class of graphs representing such asymmetric independence relations, which is also closed under marg...
-
作者:Bertsimas, Dimitris; Van Parys, Bart
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT)
摘要:We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transi...
-
作者:Hong, Han; Li, Jessie
作者单位:Stanford University; University of California System; University of California Santa Cruz
摘要:This paper proposes a numerical bootstrap method that is consistent in many cases where the standard bootstrap is known to fail and where the m-out-of-n bootstrap and subsampling have been the most commonly used inference approaches. We provide asymptotic analysis under both fixed and drifting parameter sequences, and we compare the approximation error of the numerical bootstrap with that of the m-out-of-n bootstrap and subsampling. Finally, we discuss applications of the numerical bootstrap, ...
-
作者:Tang, Niansheng; Yan, Xiaodong; Zhao, Xingqiu
作者单位:Yunnan University; Shandong University; Hong Kong Polytechnic University
摘要:This article considers simultaneous variable selection and parameter estimation as well as hypothesis testing in censored survival models where a parametric likelihood is not available. For the problem, we utilize certain growing dimensional general estimating equations and propose a penalized generalized empirical likelihood, where the general estimating equations are constructed based on the semiparametric efficiency bound of estimation with given moment conditions. The proposed penalized ge...
-
作者:Candes, Emmanuel J.; Sur, Pragya
作者单位:Stanford University; Harvard University
摘要:This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp phase transition. We introduce an explicit boundary curve h(MLE), parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes n and number of features p proportioned in such a way that p/n -> kappa...
-
作者:Bunea, Florentina; Giraud, Christophe; Luo, Xi; Royer, Martin; Verzelen, Nicolas
作者单位:Cornell University; Centre National de la Recherche Scientifique (CNRS); Universite Paris Saclay; University of Texas System; University of Texas Health Science Center Houston; University of Texas School Public Health; Universite de Montpellier; Institut Agro; Montpellier SupAgro; INRAE
摘要:The problem of variable clustering is that of estimating groups of similar components of a p-dimensional vector X = (X- 1, ..., X- p) from n independent copies of X. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tai...
-
作者:Cox, Gregory
作者单位:Columbia University
摘要:This paper establishes the argmin of a random objective function to be unique almost surely. This paper first formulates a general result that proves almost sure uniqueness without convexity of the objective function. The general result is then applied to a variety of applications in statistics. Four applications are discussed, including uniqueness of M-estimators, both classical likelihood and penalized likelihood estimators, and two applications of the argmin theorem, threshold regression an...
-
作者:Dobriban, Edgar; Leeb, William; Singer, Amit
作者单位:University of Pennsylvania; University of Minnesota System; University of Minnesota Twin Cities; Princeton University
摘要:We consider the linearly transformed spiked model, where the observations Y-i are noisy linear transforms of unobserved signals of interest X-i: Y-i = A(i)X(i) + epsilon(i), for i = 1, ..., n. The transform matrices A(i) are also observed. We model the unobserved signals (or regression coefficients) X-i as vectors lying on an unknown low-dimensional space. Given only Y-i and A(i) how should we predict or recover their values? The naive approach of performing regression for each observation sep...
-
作者:Chen, Xi; Lee, Jason D.; Tong, Xin T.; Zhang, Yichen
作者单位:New York University; University of Southern California; National University of Singapore
摘要:The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are two...