-
作者:Chen, Xi; Jing, Wenbo; Liu, Weidong; Zhang, Yichen
作者单位:New York University; Shanghai Jiao Tong University; Purdue University System; Purdue University
摘要:The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semiparametric binary choice model under a distributed computing environment without prespecifying the noise distribution. An intuitive divideand-conquer estimator is computationally expensive and restricted by a nonregular constraint on the number of machines, due to the hig...
-
作者:Ascolani, Filippo; Zanella, Giacomo
作者单位:Duke University; Bocconi University
摘要:Gibbs samplers are popular algorithms to approximate posterior distributions arising from Bayesian hierarchical models. Despite their popularity and good empirical performance, however, there are still relatively few quantitative results on their convergence properties, for example, much less than for gradient-based sampling methods. In this work, we analyse the behaviour of total variation mixing times of Gibbs samplers targeting hierarchical models using tools from Bayesian asymptotics. We o...
-
作者:Manole, Tudor; Balakrishnan, Sivaraman; Niles-Weed, Jonathan; Wasserman, Larry
作者单位:Carnegie Mellon University; New York University; New York University
摘要:We analyze a number of natural estimators for the optimal transport map between two distributions and show that they are minimax optimal. We adopt the plugin approach: our estimators are simply optimal couplings between measures derived from our observations, appropriately extended so that they define functions on Rd. d . When the underlying map is assumed to be Lipschitz, we show that computing the optimal coupling between the empirical measures, and extending it using linear smoothers, alrea...
-
作者:Song, Yanglei; Fellouris, Georgios
作者单位:Queens University - Canada; University of Illinois System; University of Illinois Urbana-Champaign
摘要:A novel sequential change detection problem is proposed, in which the goal is to not only detect but also accelerate the change. Specifically, it is assumed that the sequentially collected observations are responses to treatments selected in real time. The assigned treatments determine the pre-change and post-change distributions of the responses and also influence when the change happens. The goal is to find a treatment assignment rule and a stopping rule that minimize the expected total numb...
-
作者:Zrnic, Tijana; Fithian, William
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, , which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inferenc...
-
作者:Xu, Haotian; Wang, Daren; Zhao, Zifeng; Yu, Yi
作者单位:University of Warwick; University of Notre Dame; University of Notre Dame
摘要:This paper concerns the limiting distributions of change-point estimators, in a high-dimensional linear regression time-series context, where a regression object (y(t), X-t) is an element of R x R-p is observed at every time point t is an element of{1 , ... , n}. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in l(2)-norm. We provide limiting distributions of the change-point estimators in the regimes where the minimal jump size v...
-
作者:Li, Huiqin; Pan, Guangming; Yin, Yanqing; Zhou, Wang
作者单位:Chongqing University; Nanyang Technological University; National University of Singapore
摘要:Motivated by the statistical inference using the Gram matrix in the context of missing at random observations, this paper investigates the spectral resents a Hadamard random matrix with entries determined by independent Bernoulli variables D. Operating within the high-dimensional framework, we establish the convergence of the empirical spectral distribution of Sn to a well-defined limiting distribution. In addition, we explore the impact of the missing mechanism on the second-order properties ...
-
作者:Sell, Torben; Berrett, Thomas b.; Cannings, Timothy i.
作者单位:University of Edinburgh; Heriot Watt University; University of Edinburgh; University of Warwick
摘要:We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate ...
-
作者:Yu, Haihan; Kaiser, Mark S.; Nordman, Daniel J.
作者单位:University of Rhode Island; Iowa State University
摘要:Frequency domain analysis of time series is often difficult, as periodogram-based statistics involve non-linear averages with complicated variances. Due to the latter, nonparametric approximations from resampling or empirical likelihood (EL) are useful. However, current versions of periodogram-based EL for time series are highly restricted: these are valid only for linear processes and for special parameters (i.e., ratios). For general frequency domain inference with stationary, weakly depende...
-
作者:Zhao, Bingxin; Zheng, Shurong; Zhu, Hongtu
作者单位:University of Pennsylvania; Northeast Normal University - China; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine
摘要:Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external refe...