-
作者:Schiebinger, Geoffrey; Wainwright, Martin J.; Yu, Bin
作者单位:University of California System; University of California Berkeley
摘要:Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap...
-
作者:Armstrong, Timothy
作者单位:Yale University
摘要:We consider the problem of inference on a regression function at a point when the entire function satisfies a sign or shape restriction under the null. We propose a test that achieves the optimal minimax rate adaptively over a range of Holder classes, up to a log log n term, which we show to be necessary for adaptation. We apply the results to adaptive one-sided tests for the regression discontinuity parameter under a monotonicity restriction, the value of a monotone regression function at the...
-
作者:Li, Kang; Zheng, Wei; Ai, Mingyao
作者单位:Peking University; Peking University; Purdue University System; Purdue University; Purdue University in Indianapolis
摘要:The interference model has been widely used and studied in block experiments where the treatment for a particular plot has effects on its neighbor plots. In this paper, we study optimal circular designs for the proportional interference model, in which the neighbor effects of a treatment are proportional to its direct effect. Kiefer's equivalence theorems for estimating both the direct and total treatment effects are developed with respect to the criteria of A, D, E and T. Parallel studies are...
-
作者:Lee, Young K.; Mammen, Enno; Nielsen, Jens P.; Park, Byeong U.
作者单位:Kangwon National University; Ruprecht Karls University Heidelberg; City St Georges, University of London; Seoul National University (SNU)
摘要:This paper generalizes recent proposals of density forecasting models and it develops theory for this class of models. In density forecasting, the density of observations is estimated in regions where the density is not observed. Identification of the density in such regions is guaranteed by structural assumptions on the density that allows exact extrapolation. In this paper, the structural assumption is made that the density is a product of one-dimensional functions. The theory is quite gener...
-
作者:Chatterjee, Sourav
作者单位:Stanford University
摘要:Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has a little bit of structure. Surprisingly, this simple e...
-
作者:Jin, Jiashun
作者单位:Carnegie Mellon University
摘要:Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose a new approach to community detection which we call the Spectral Clustering On Ratios-of-Eigenvectors (SC...
-
作者:McGoff, Kevin; Mukherjee, Sayan; Nobel, Andrew; Pillai, Natesh
作者单位:Duke University; Duke University; Duke University; University of North Carolina; University of North Carolina Chapel Hill; Harvard University
摘要:We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Our proof involves ideas from both information theory and dynamical systems. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimatio...
-
作者:Ma, Shujie; Carroll, Raymond J.; Liang, Hua; Xu, Shizhong
作者单位:University of California System; University of California Riverside; Texas A&M University System; Texas A&M University College Station; University of Technology Sydney; George Washington University; University of California System; University of California Riverside
摘要:In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the large p small n setting...
-
作者:Fan, Jianqing; Ke, Zheng Tracy; Liu, Han; Xia, Lucy
作者单位:Princeton University; University of Chicago
摘要:We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method-named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization)-for analyzing high-dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests. One major c...
-
作者:Fan, Yingying; Kong, Yinfei; Li, Daoji; Zheng, Zemin
作者单位:University of Southern California; University of Southern California; University of Southern California
摘要:This paper is concerned with the problems of interaction screening and nonlinear classification in a high-dimensional setting. We propose a two-step procedure, IIS-SQDA, where in the first step an innovated interaction screening (ITS) approach based on transforming the original p-dimensional feature vector is proposed, and in the second step a sparse quadratic discriminant analysis (SQDA) is proposed for further selecting important interactions and main effects and simultaneously conducting cl...