-
作者:Meinshausen, Nicolai
作者单位:University of Oxford
摘要:When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as ...
-
作者:Bissantz, Nicolai; Holzmann, Hajo; Pawlak, Miroslaw
作者单位:Ruhr University Bochum; Philipps University Marburg; University of Manitoba
摘要:A method for estimating the axis of reflectional symmetry of an image f (x, y) on the unit disc D = {(x, y) : x(2) + y(2) <= 1} is proposed, given that noisy data of f (x, y) are observed on a discrete grid of edge width Delta. Our estimation procedure is based on minimizing over beta is an element of [0, pi) the L-2 distance between empirical versions of f and tau(beta)f, the image of f after reflection at the axis along (cos beta, sin beta). Here, f and tau(beta)f are estimated using truncat...
-
作者:Kriegler, Brian; Berk, Richard
作者单位:University of Pennsylvania
摘要:In many metropolitan areas efforts are made to count the homeless to ensure proper provision of social services. Some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. Counts are observed in sampled regions but must be imputed in un-visited areas. Along with the imputation process, the costs of underestimating and overestimating may be different. For example, if precise estimation in areas with large homeless counts is critical, th...
-
作者:Sakov, Anat; Golani, Ilan; Lipkind, Dina; Benjamini, Yoav
作者单位:Tel Aviv University; Tel Aviv University
摘要:In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present insights and solutions in the context of behavior genetics, where data consists of a time series of l...
-
作者:Rizzo, Maria L.; Szekely, Gabor J.
作者单位:University System of Ohio; Bowling Green State University
摘要:In classical analysis of variance, dispersion is measured by considering squared distances of sample elements from the sample mean. We consider a measure of dispersion for univariate or multivariate response based on all pairwise distances between-sample elements, and derive an analogous distance components (DISCO) decomposition for powers of distance in (0, 2]. The ANOVA F statistic is obtained when the index (exponent) is 2. For each index in (0, 2), this decomposition determines a nonparame...
-
作者:Francis, Brian; Dittrich, Regina; Hatzinger, Reinhold
作者单位:Lancaster University; Vienna University of Economics & Business
摘要:This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incor...
-
作者:Mannshardt-Shamseldin, Elizabeth C.; Smith, Richard L.; Sain, Stephan R.; Mearns, Linda O.; Cooley, Daniel
作者单位:Duke University; University of North Carolina; University of North Carolina Chapel Hill; National Center Atmospheric Research (NCAR) - USA; National Center Atmospheric Research (NCAR) - USA; Colorado State University System; Colorado State University Fort Collins
摘要:There is substantial empirical and climatological evidence that precipitation extremes have become more extreme during the twentieth century, and that this trend is likely to continue as global warming becomes more intense. However, understanding these issues is limited by a fundamental issue of spatial scaling: most evidence of past trends comes from rain gauge data, whereas trends into the future are produced by climate models, which rely on gridded aggregates. To study this further, we fit ...
-
作者:Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E.
作者单位:University College Dublin; University of Glasgow; University of Washington; University of Washington Seattle
摘要:Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classificati...
-
作者:Jang, Woncheol; Loh, Ji Meng
作者单位:University System of Georgia; University of Georgia; Columbia University
摘要:Line transect sampling is a method used to estimate wildlife populations, with the resulting data often grouped in intervals. Estimating the density from grouped data can be challenging. In this paper we propose a kernel density estimator of wildlife population density for such grouped data. Our method uses a combined cross-validation and smoothed bootstrap approach to select the optimal bandwidth for grouped data. Our simulation study shows that with the smoothing parameter selected with this...
-
作者:Wilson, Melanie A.; Iversen, Edwin S.; Clyde, Merlise A.; Schmidler, Scott C.; Schildkraut, Joellen M.
作者单位:Duke University; Duke University
摘要:Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data....