-
作者:Cai, T. Tony; Ma, Jing; Zhang, Linjun
作者单位:University of Pennsylvania
摘要:Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME ...
-
作者:Ma, Shujie; Zhu, Liping; Zhang, Zhiwei; Tsai, Chih-Ling; Carroll, Raymond J.
作者单位:University of California System; University of California Riverside; Renmin University of China; University of California System; University of California Davis; Texas A&M University System; Texas A&M University College Station; University of Technology Sydney
摘要:A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most exi...
-
作者:Hu, Jiang; Li, Weiming; Liu, Zhi; Zhou, Wang
作者单位:Northeast Normal University - China; Northeast Normal University - China; Shanghai University of Finance & Economics; University of Macau; National University of Singapore
摘要:This paper discusses fluctuations of linear spectral statistics of high-dimensional sample covariance matrices when the underlying population follows an elliptical distribution. Such population often possesses high order correlations among their coordinates, which have great impact on the asymptotic behaviors of linear spectral statistics. Taking such kind of dependency into consideration, we establish a new central limit theorem for the linear spectral statistics in this paper for a class of ...
-
作者:Rohe, Karl
作者单位:University of Wisconsin System; University of Wisconsin Madison
摘要:Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the s...
-
作者:Tewes, Johannes; Politis, Dimitris N.; Nordman, Daniel J.
作者单位:Ruhr University Bochum; University of California System; University of California San Diego; Iowa State University
摘要:The block bootstrap approximates sampling distributions from dependent data by resampling data blocks. A fundamental problem is establishing its consistency for the distribution of a sample mean, as a prototypical statistic. We use a structural relationship with subsampling to characterize the bootstrap in a new and general manner. While subsampling and block bootstrap differ, the block bootstrap distribution of a sample mean equals that of a k-fold self-convolution of a subsampling distributi...
-
作者:Sadhanala, Veeranjaneyulu; Tibshirani, Ryan J.
作者单位:Carnegie Mellon University; Carnegie Mellon University
摘要:We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their kth (discrete) derivative, for a chosen integer k >= 0. This results in kth degree piecewise polynomial components, (e.g., k = 0 gives piecewise constant components, k = 1 gives piecewise linear, k = 2 gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical ...
-
作者:Berthet, Quentin; Rigollet, Philippe; Srivastava, Piyush
作者单位:University of Cambridge; Massachusetts Institute of Technology (MIT); Tata Institute of Fundamental Research (TIFR)
摘要:We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, ...
-
作者:Koike, Yuta
作者单位:University of Tokyo; Japan Science & Technology Agency (JST)
摘要:This paper establishes an upper bound for the Kolmogorov distance between the maximum of a high-dimensional vector of smooth Wiener functionals and the maximum of a Gaussian random vector. As a special case, we show that the maximum of multiple Wiener-Ito integrals with common orders is well approximated by its Gaussian analog in terms of the Kolmogorov distance if their covariance matrices are close to each other and the maximum of the fourth cumulants of the multiple Wiener-Ito integrals is ...
-
作者:Steinberger, Lukas; Leeb, Hannes
作者单位:University of Freiburg; University of Vienna
摘要:We study linear subset regression in the context of a high-dimensional linear model. Consider y = v + theta' z + epsilon with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x = M' z, for some d x p matrix M. Here, high-dimensional means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we pr...
-
作者:Lopes, Miles E.
作者单位:University of California System; University of California Davis
摘要:Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is large enough-so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of algorithmic varianc...