-
作者:Volgushev, Stanislav; Chao, Shih-Kang; Cheng, Guang
作者单位:University of Toronto; Purdue University System; Purdue University
摘要:The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our ...
-
作者:Chen, Song Xi; Li, Jun; Zhong, Ping-Shou
作者单位:Peking University; Peking University; University System of Ohio; Kent State University; Kent State University Salem; Kent State University Kent; Michigan State University
摘要:This paper considers testing the equality of two high dimensional means. Two approaches are utilized to formulate L-2-type tests for better power performance when the two high dimensional mean vectors differ only in sparsely populated coordinates and the differences are faint. One is to conduct thresholding to remove the nonsignal bearing dimensions for variance reduction of the test statistics. The other is to transform the data via the precision matrix for signal enhancement. It is shown tha...
-
作者:Shu, Hai; Nan, Bin
作者单位:University of Michigan System; University of Michigan; University of California System; University of California Irvine
摘要:We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thr...
-
作者:Williams, Jonathan P.; Hannig, Jan
作者单位:University of North Carolina; University of North Carolina Chapel Hill
摘要:Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design matrix is collinear. To overcome this challenge, an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. This new procedure is able to effectively account for linear dependencies among subsets of cova...
-
作者:Bobkov, Sergey G.
作者单位:University of Minnesota System; University of Minnesota Twin Cities; HSE University (National Research University Higher School of Economics)
摘要:Let F-n denote the distribution function of the normalized sum of n i.i.d. random variables. In this paper, polynomial rates of approximation of F n by the corrected normal laws are considered in the model where the underlying distribution has a convolution structure. As a basic tool, the convergence part of Khinchine's theorem in metric theory of Diophantine approximations is extended to the class of product characteristic functions.
-
作者:Chen, Ningyuan; Lee, Donald K. K.; Negahban, Sahand N.
作者单位:Hong Kong University of Science & Technology; Hong Kong University of Science & Technology; Yale University; Yale University; Yale University
摘要:Exploiting the fact that most arrival processes exhibit cyclic behaviour, we propose a simple procedure for estimating the intensity of a nonhomogeneous Poisson process. The estimator is the super-resolution analogue to Shao (2010) and Shao and Lii [J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 (2011) 99-122], which is a sum of p sinusoids where p and the amplitude and phase of each wave are not known and need to be estimated. This results in an interpretable yet flexible specification that is s...
-
作者:Song, Yanglei; Fellouris, Georgios
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:The sequential multiple testing problem is considered under two generalized error metrics. Under the first one, the probability of at least k mistakes, of any kind, is controlled. Under the second, the probabilities of at least k(1) false positives and at least k(2) false negatives are simultaneously controlled. For each formulation, the optimal expected sample size is characterized, to a first-order asymptotic approximation as the error probabilities go to 0, and a novel multiple testing proc...
-
作者:Bachoc, Francois; Leeb, Hannes; Potscher, Benedikt M.
作者单位:Universite de Toulouse; Universite Toulouse III - Paul Sabatier; University of Vienna
摘要:We consider inference post-model-selection in linear regression. In this setting, Berk et al. [Ann. Statist. 41 (2013a) 802-837] recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain nonstandard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to confidence intervals for post-model-selection predictors.
-
作者:Saegusa, Takumi
作者单位:University System of Maryland; University of Maryland College Park
摘要:We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, ...
-
作者:Wang, Nanwei; Rauh, Johannes; Massam, Helene
作者单位:York University - Canada; York University - Canada; Max Planck Society
摘要:The existence of the maximum likelihood estimate in a hierarchical log-linear model is crucial to the reliability of inference for this model. Determining whether the estimate exists is equivalent to finding whether the sufficient statistics vector t belongs to the boundary of the marginal polytope of the model. The dimension of the smallest face F-t containing t determines the dimension of the reduced model which should be considered for correct inference. For higher-dimensional problems, it ...