-
作者:Moscovich, Amit; Rosset, Saharon
作者单位:Tel Aviv University
摘要:Cross-validation is the de facto standard for predictive model evaluation and selection. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo various forms of data-dependent preprocessing, such as mean-centring, rescaling, dimensionality reduction and outlier removal. It is often believed that such preprocessing stages, if done in an unsupervised manner (that does not incorporate the class labels or response values) are generally...
-
作者:Rubin-Delanchy, Patrick; Cape, Joshua; Tang, Minh; Priebe, Carey E.
作者单位:University of Bristol; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; North Carolina State University; Johns Hopkins University
摘要:Spectral embedding is a procedure which can be used to obtain vector representations of the nodes of a graph. This paper proposes a generalisation of the latent position network model known as the random dot product graph, to allow interpretation of those vector representations as latent position estimates. The generalisation is needed to model heterophilic connectivity (e.g. 'opposites attract') and to cope with negative eigenvalues more generally. We show that, whether the adjacency or norma...
-
作者:Avella-Medina, Marco
作者单位:Columbia University
-
作者:Dong, Jinshuo; Roth, Aaron; Su, Weijie J.
作者单位:University of Pennsylvania
-
作者:Follain, Bertille; Wang, Tengyao; Samworth, Richard J.
作者单位:University of Cambridge; Inria; Universite PSL; Ecole Normale Superieure (ENS); University of London; London School Economics & Political Science; University of London; University College London
摘要:We propose a new method for changepoint estimation in partially observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a 'MissCUSUM' transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to ...
-
作者:Ogburn, Elizabeth L.; Cai, Junhui; Kuchibhotla, Arun K.; Berk, Richard A.; Buja, Andreas
作者单位:Johns Hopkins University; Johns Hopkins Bloomberg School of Public Health; University of Pennsylvania; Carnegie Mellon University; University of Pennsylvania; Simons Foundation; Flatiron Institute
-
作者:de Fondeville, Raphael; Davison, Anthony C.
作者单位:Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne
摘要:Peaks-over-threshold analysis using the generalised Pareto distribution is widely applied in modelling tails of univariate random variables, but much information may be lost when complex extreme events are studied using univariate results. In this paper, we extend peaks-over-threshold analysis to extremes of functional data. Threshold exceedances defined using a functional r are modelled by the generalised r-Pareto process, a functional generalisation of the generalised Pareto distribution tha...
-
作者:Rosset, Saharon; Heller, Ruth; Painsky, Amichai; Aharoni, Ehud
作者单位:Tel Aviv University; Tel Aviv University; International Business Machines (IBM); IBM ISRAEL
摘要:Multiple testing problems (MTPs) are a staple of modern statistical analysis. The fundamental objective of MTPs is to reject as many false null hypotheses as possible (that is, maximize some notion of power), subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). In this paper we provide generalizations to MTPs of the optimal Neyman-Pearson test for a single hypothesis. We show that for simple hypotheses, for both FWER an...
-
作者:Choi, Anna; Wong, Weng Kee
作者单位:Stanford University; University of California System; University of California Los Angeles
-
作者:Wang, Zhonglei; Peng, Liuhua; Kim, Jae Kwang
作者单位:Xiamen University; Xiamen University; University of Melbourne; Iowa State University
摘要:Bootstrap is a useful computational tool for statistical inference, but it may lead to erroneous analysis under complex survey sampling. In this paper, we propose a unified bootstrap method for stratified multi-stage cluster sampling, Poisson sampling, simple random sampling without replacement and probability proportional to size sampling with replacement. In the proposed bootstrap method, we first generate bootstrap finite populations, apply the same sampling design to each bootstrap populat...