-
作者:Donoho, David; Gavish, Matan; Romanov, Elad
作者单位:Stanford University; Hebrew University of Jerusalem
摘要:We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nomi-nally involves unobservables, we show how to apply it even where the noise covariance structure is not a priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague scree plot heuristic from 1966. ScreeNOT has a surprising oracle property: it...
-
作者:Huang, Tzu-jung; Luedtke, Alex; Mckeague, Ian w.
作者单位:Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle; Columbia University
摘要:This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation b...
-
作者:Klopp, Olga; Panov, Maxim; Sigalla, Suzanne; Tsybakov, Alexandre B.
作者单位:ESSEC Business School; Institut Polytechnique de Paris; ENSAE Paris
摘要:Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various fields, such as image analysis, e-commerce, social networks and population genetics. Since the number of topics is typically substantially smaller than the size of the corpus and of the dictionary, the methods of topic modeling can lead ...
-
作者:Spencer, Neil A.; Shalizi, Cosma Rohilla
作者单位:University of Connecticut; Carnegie Mellon University
摘要:When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption-that the latent positions are generated according to a Poisson point process-and show that it is compatible with various levels of sparsity. Unli...
-
作者:Chen, Song xi; Qiu, Yumou; Zhang, Shuyi
作者单位:Peking University; Peking University; Peking University; East China Normal University
摘要:This paper considers one-sample testing of a high-dimensional covariance matrix by deriving the detection boundary as a function of the signal sparsity and signal strength under the sparse alternative hypotheses. It first shows that the optimal detection boundary for testing sparse means is the minimax detection lower boundary for testing the covariance matrix. A multilevel thresholding test is proposed and is shown to be able to attain the detection lower boundary over a substantial range of ...
-
作者:Roycraft, Benjamin; Krebs, Johannes; Polonik, Wolfgang
作者单位:University of California System; University of California Davis
摘要:We investigate multivariate bootstrap procedures for general stabilizing statistics, with specific application to topological data analysis. The work relates to other general results in the area of stabilizing statistics, including central limit theorems for geometric and topological functionals of Poisson and binomial processes in the critical regime, where limit theorems prove difficult to use in practice, motivating the use of a bootstrap approach. A smoothed bootstrap procedure is shown to...
-
作者:Chernozhukov, Victor; Hansen, Christian; Liao, Yuan; Zhu, Yinchu
作者单位:Massachusetts Institute of Technology (MIT); University of Chicago; Rutgers University System; Rutgers University New Brunswick; Brandeis University
摘要:This paper studies inference in linear models with a high-dimensional parameter matrix that can be well approximated by a spiked low-rank matrix. A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent variables, which can accommodate matrix completion problems, factor models, varying coefficient models and heterogeneous treatment effects. For infere...
-
作者:Han, Xiao; Yang, Qing; Fan, Yingying
作者单位:Chinese Academy of Sciences; University of Science & Technology of China, CAS; University of Southern California
摘要:Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual sub-sampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual ...
-
作者:Steinberger, Lukas; Leeb, Hannes
作者单位:University of Vienna
摘要:We investigate generically applicable and intuitively appealing predic-tion intervals based on k-fold cross-validation. We focus on the conditional coverage probability of the proposed intervals, given the observations in the training sample (hence, training conditional validity), and show that it is close to the nominal level, in an appropriate sense, provided that the underlying algorithm used for computing point predictions is sufficiently stable when feature-response pairs are omitted. Our...
-
作者:Komarova, Tatiana; Hidalgo, Javier
作者单位:University of Manchester; University of London; London School Economics & Political Science
摘要:We describe and examine a test for a general class of shape constraints, such as signs of derivatives, U-shape, quasi-convexity, log-convexity, among others, in a nonparametric framework using partial sums empirical processes. We show that, after a suitable transformation, its asymptotic distribution is a functional of a Brownian motion index by the c.d.f. of the regressor. As a result, the test is distribution-free and critical values are readily available. However, due to the possible poor a...