-
作者:Rohe, Karl
作者单位:University of Wisconsin System; University of Wisconsin Madison
摘要:Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the s...
-
作者:Tewes, Johannes; Politis, Dimitris N.; Nordman, Daniel J.
作者单位:Ruhr University Bochum; University of California System; University of California San Diego; Iowa State University
摘要:The block bootstrap approximates sampling distributions from dependent data by resampling data blocks. A fundamental problem is establishing its consistency for the distribution of a sample mean, as a prototypical statistic. We use a structural relationship with subsampling to characterize the bootstrap in a new and general manner. While subsampling and block bootstrap differ, the block bootstrap distribution of a sample mean equals that of a k-fold self-convolution of a subsampling distributi...
-
作者:Sadhanala, Veeranjaneyulu; Tibshirani, Ryan J.
作者单位:Carnegie Mellon University; Carnegie Mellon University
摘要:We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their kth (discrete) derivative, for a chosen integer k >= 0. This results in kth degree piecewise polynomial components, (e.g., k = 0 gives piecewise constant components, k = 1 gives piecewise linear, k = 2 gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical ...
-
作者:Berthet, Quentin; Rigollet, Philippe; Srivastava, Piyush
作者单位:University of Cambridge; Massachusetts Institute of Technology (MIT); Tata Institute of Fundamental Research (TIFR)
摘要:We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, ...
-
作者:Koike, Yuta
作者单位:University of Tokyo; Japan Science & Technology Agency (JST)
摘要:This paper establishes an upper bound for the Kolmogorov distance between the maximum of a high-dimensional vector of smooth Wiener functionals and the maximum of a Gaussian random vector. As a special case, we show that the maximum of multiple Wiener-Ito integrals with common orders is well approximated by its Gaussian analog in terms of the Kolmogorov distance if their covariance matrices are close to each other and the maximum of the fourth cumulants of the multiple Wiener-Ito integrals is ...
-
作者:Steinberger, Lukas; Leeb, Hannes
作者单位:University of Freiburg; University of Vienna
摘要:We study linear subset regression in the context of a high-dimensional linear model. Consider y = v + theta' z + epsilon with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x = M' z, for some d x p matrix M. Here, high-dimensional means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we pr...
-
作者:Lopes, Miles E.
作者单位:University of California System; University of California Davis
摘要:Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is large enough-so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of algorithmic varianc...
-
作者:Cuesta-Albertos, Juan A.; Garcia-Portugues, Eduardo; Febrero-Bande, Manuel; Gonzalez-Manteiga, Wenceslao
作者单位:Universidad de Cantabria; Universidade de Santiago de Compostela
摘要:We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst th...
-
作者:Chen, Xiaohui; Kato, Kengo
作者单位:University of Illinois System; University of Illinois Urbana-Champaign; Cornell University
摘要:This paper studies inference for the mean vector of a high-dimensional U -statistic. In the era of big data, the dimension d of the U-statistic and the sample size n of the observations tend to be both large, and the computation of the U -statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for U -statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete U-statistics obtained by sampling few...
-
作者:Dette, Holger; Wu, Weichi
作者单位:Ruhr University Bochum; Ruhr University Bochum; Tsinghua University
摘要:This paper considers the problem of testing if a sequence of means (mu(t))(t=1, ...,n) of a nonstationary time series (X-t)(t=1, )(...,n) is stable in the sense that the difference of the means mu(1) and mu(t )between the initial time t = 1 and any other time is smaller than a given threshold, that is vertical bar mu(1) - mu(t)vertical bar <= c for all t = 1, ..., n. A test for hypotheses of this type is developed using a bias corrected monotone rearranged local linear estimator and asymptotic...