-
作者:Dau, Hai-dang; Chopin, Nicolas
作者单位:Institut Polytechnique de Paris; ENSAE Paris
摘要:In the context of state-space models, skeleton-based smoothing algo-rithms rely on a backward sampling step, which by default, has a O(N-2) complexity (where N is the number of particles). Existing improvements in the literature are unsatisfactory: a popular rejection sampling-based approach, as we shall show, might lead to badly behaved execution time; another rejec-tion sampler with stopping lacks complexity analysis; yet another MCMC-inspired algorithm comes with no stability guarantee. We ...
-
作者:Liu, Zhijun; Hu, Jiang; Bai, Zhidong; Song, Haiyan
作者单位:Northeast Normal University - China
摘要:In this paper, we establish the central limit theorem (CLT) for linear spectral statistics (LSSs) of a large-dimensional sample covariance matrix when the population covariance matrices are involved with diverging spikes. This constitutes a nontrivial extension of the Bai-Silverstein theorem (BST) (Ann. Probab. 32 (2004) 553-605), a theorem that has strongly influenced the development of high-dimensional statistics, especially in the applications of random matrix theory to statistics. Recently...
-
作者:Huang, Tzu-jung; Luedtke, Alex; Mckeague, Ian w.
作者单位:Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle; Columbia University
摘要:This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation b...
-
作者:Klopp, Olga; Panov, Maxim; Sigalla, Suzanne; Tsybakov, Alexandre B.
作者单位:ESSEC Business School; Institut Polytechnique de Paris; ENSAE Paris
摘要:Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various fields, such as image analysis, e-commerce, social networks and population genetics. Since the number of topics is typically substantially smaller than the size of the corpus and of the dictionary, the methods of topic modeling can lead ...
-
作者:Chen, Song xi; Qiu, Yumou; Zhang, Shuyi
作者单位:Peking University; Peking University; Peking University; East China Normal University
摘要:This paper considers one-sample testing of a high-dimensional covariance matrix by deriving the detection boundary as a function of the signal sparsity and signal strength under the sparse alternative hypotheses. It first shows that the optimal detection boundary for testing sparse means is the minimax detection lower boundary for testing the covariance matrix. A multilevel thresholding test is proposed and is shown to be able to attain the detection lower boundary over a substantial range of ...
-
作者:Fan, Jianqing; Lou, Zhipeng; Yu, Mengxin
作者单位:Princeton University; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; University of Pennsylvania
摘要:A stylized feature of high-dimensional data is that many variables have heavy tails, and robust statistical inference is critical for valid large-scale statistical inference. Yet, the existing developments such as Winsorization, Huberization and median of means require the bounded second moments and involve variable-dependent tuning parameters, which hamper their fidelity in applications to large-scale problems. To liberate these constraints, this paper revisits the celebrated Hodges-Lehmann (...