-
作者:McShane, Blakeley B.; Wyner, Abraham J.
作者单位:Northwestern University; University of Pennsylvania
摘要:Predicting historic temperatures based on tree rings, ice cores, and other natural proxies is a difficult endeavor. The relationship between proxies and temperature is weak and the number of proxies is far larger than the number of target data points. Furthermore, the data contain complex spatial and temporal dependence structures which are not easily captured with simple models. In this paper, we assess the reliability of such reconstructions and their statistical significance against various...
-
作者:Wang, Sijian; Nan, Bin; Rosset, Saharon; Zhu, Ji
作者单位:University of Wisconsin System; University of Wisconsin Madison; Tel Aviv University; University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of randomly selected covariates. A measure of importance is yielded from this step for each covariate. In step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected ...
-
作者:Jung, Sungkyu; Foskey, Mark; Marron, J. S.
作者单位:University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill
摘要:We propose a new approach to analyze data that naturally lie on manifolds. We focus on a special class of manifolds, called direct product manifolds, whose intrinsic dimension could be very high. Our method finds a low-dimensional representation of the manifold that can be used to find and visualize the principal modes of variation of the data, as Principal Component Analysis (PCA) does in linear spaces. The proposed method improves upon earlier manifold extensions of PCA by more concisely cap...
-
作者:Sain, Stephan R.; Furrer, Reinhard; Cressie, Noel
作者单位:National Center Atmospheric Research (NCAR) - USA; University of Zurich; University System of Ohio; Ohio State University
摘要:Climate models have become an important tool in the study of climate and climate change, and ensemble experiments consisting of multiple climate-model runs are used in studying and quantifying the uncertainty in climate-model output. However, there are often only a limited number of model runs available for a particular experiment, and one of the statistical challenges is to characterize the distribution of the model output. To that end, we have developed a multivariate hierarchical approach, ...
-
作者:Latouche, Pierre; Birmele, Etienne; Ambroise, Christophe
作者单位:Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI); Universite Paris Saclay; INRAE
摘要:Complex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained ...
-
作者:Sutton, Charles; Jordan, Michael I.
作者单位:University of Edinburgh; University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where each queue models one of the computers in the system. A key challenge is that the data are incomplete, because recording detailed information abou...