-
作者:Chen, Yen-Chi
作者单位:University of Washington; University of Washington Seattle
摘要:In this paper we study the alpha-cluster tree (alpha-tree) under both singular and nonsingular measures. The alpha-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an alpha-tree under nonsingular measures. For singular measures, we study how the kern...
-
作者:Carpentier, Alexandra; Verzelen, Nicolas
作者单位:Otto von Guericke University; INRAE; Institut Agro; Montpellier SupAgro; Universite de Montpellier
摘要:Consider the Gaussian vector model with mean value.. We study the twin problems of estimating the number parallel to theta parallel to(0) of nonzero components of. and testing whether parallel to theta parallel to(0) is smaller than some value. For testing, we establish the minimax separation distances for this model and introduce a minimax adaptive test. Extensions to the case of unknown variance are also discussed. Rewriting the estimation of parallel to theta parallel to(0) as a multiple te...
-
作者:Fan, Jianqing; Wang, Dong; Wang, Kaizheng; Zhu, Ziwei
作者单位:Princeton University; University of Michigan System; University of Michigan
摘要:Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top K eigenvectors and transmits them to the centr...
-
作者:Rinaldo, Alessandro; Wasserman, Larry; G'Sell, Max
作者单位:Carnegie Mellon University
摘要:Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootst...
-
作者:Alquier, Pierre; Cottet, Vincent; Lecue, Guillaume
作者单位:Universite Paris Saclay; Institut Polytechnique de Paris; ENSAE Paris; Centre National de la Recherche Scientifique (CNRS); Institut Polytechnique de Paris; ENSAE Paris
摘要:We obtain estimation error rates and sharp oracle inequalities for regularization procedures of the form (f ) over cap is an element of argmin(f is an element of F) (1/N Sigma(N )(i=1)l(f) (X-i, Y-i) + lambda parallel to f parallel to) when parallel to . parallel to is any norm, F is a convex class of functions and l is a Lipschitz loss function satisfying a Bernstein condition over F. We explore both the bounded and sub-Gaussian stochastic frameworks for the distribution of the f (X-i)'s, wit...
-
作者:Chen, Xi; Liu, Weidong; Zhang, Yichen
作者单位:New York University; Shanghai Jiao Tong University; Shanghai Jiao Tong University
摘要:This paper studies the inference problem in quantile regression (QR) for a large sample size n but under a limited memory constraint, where the memory can only store a small batch of data of size m. A natural method is the naive divide-and-conquer approach, which splits data into batches of size m, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when n = o(m(2)) and is computationally expensive. This paper propose...
-
作者:Tan, Falong; Zhu, Lixing
作者单位:Hunan University; Beijing Normal University; Hong Kong Baptist University
摘要:In this paper, we construct an adaptive-to-model residual-marked empirical process as the base of constructing a goodness-of-fit test for parametric single-index models with diverging number of predictors. To study the relevant asymptotic properties, we first investigate, under the null and alternative hypothesis, the estimation consistency and asymptotically linear representation of the nonlinear least squares estimator for the parameters of interest and then the convergence of the empirical ...
-
作者:Volgushev, Stanislav; Chao, Shih-Kang; Cheng, Guang
作者单位:University of Toronto; Purdue University System; Purdue University
摘要:The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our ...
-
作者:Chang, Ming-Chung; Cheng, Shao-Wei; Cheng, Ching-Shui
作者单位:National Central University; National Tsing Hua University; Academia Sinica - Taiwan; University of California System; University of California Berkeley
摘要:Signal aliasing is an inevitable consequence of using fractional factorial designs. Unlike linear models with fixed factorial effects, for Gaussian random field models advocated in some Bayesian design and computer experiment literature, the issue of signal aliasing has not received comparable attention. In the present article, this issue is tackled for experiments with qualitative factors. The signals in a Gaussian random field can be characterized by the random effects identified from the co...
-
作者:Drton, Mathias; Fox, Christopher; Wang, Y. Samuel
作者单位:University of Washington; University of Washington Seattle; University of Chicago
摘要:Software for computation of maximum likelihood estimates in linear structural equation models typically employs general techniques from nonlinear optimization, such as quasi-Newton methods. In practice, careful tuning of initial values is often required to avoid convergence issues. As an alternative approach, we propose a block-coordinate descent method that cycles through the considered variables, updating only the parameters related to a given variable in each step. We show that the resultin...