-
作者:Mariadassou, Mahendra; Robin, Stephane; Vacher, Corinne
作者单位:Universite Paris Saclay; AgroParisTech; INRAE; Universite de Bordeaux; INRAE
摘要:As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case. We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational to...
-
作者:Sakov, Anat; Golani, Ilan; Lipkind, Dina; Benjamini, Yoav
作者单位:Tel Aviv University; Tel Aviv University
摘要:In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present insights and solutions in the context of behavior genetics, where data consists of a time series of l...
-
作者:Rizzo, Maria L.; Szekely, Gabor J.
作者单位:University System of Ohio; Bowling Green State University
摘要:In classical analysis of variance, dispersion is measured by considering squared distances of sample elements from the sample mean. We consider a measure of dispersion for univariate or multivariate response based on all pairwise distances between-sample elements, and derive an analogous distance components (DISCO) decomposition for powers of distance in (0, 2]. The ANOVA F statistic is obtained when the index (exponent) is 2. For each index in (0, 2), this decomposition determines a nonparame...
-
作者:Jang, Woncheol; Loh, Ji Meng
作者单位:University System of Georgia; University of Georgia; Columbia University
摘要:Line transect sampling is a method used to estimate wildlife populations, with the resulting data often grouped in intervals. Estimating the density from grouped data can be challenging. In this paper we propose a kernel density estimator of wildlife population density for such grouped data. Our method uses a combined cross-validation and smoothed bootstrap approach to select the optimal bandwidth for grouped data. Our simulation study shows that with the smoothing parameter selected with this...
-
作者:Silva, Ricardo; Heller, Katherine; Ghahramani, Zoubin; Airoldi, Edoardo M.
作者单位:University of London; University College London; University of Cambridge; Harvard University
摘要:Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop all approach to relational learning which, given a set of pairs of objects S = {A((1)) : B-(1), A((2)) : B-(2), ..., A((N)) : B-(N)), measures how well other pairs A : B fit in with the set S. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S? Such questions are particularly relevant in information...
-
作者:Xu, Ya; Dyer, Justin S.; Owen, Art B.
作者单位:Stanford University
摘要:In semi-supervised learning on graphs, response variables observed at one node are used to estimate missing values at other nodes. The methods exploit correlations between nearby nodes in the graph. In this paper we prove that many such proposals are equivalent to kriging predictors based on a fixed covariance matrix driven by the link structure of the graph. We then propose a data-driven estimator of the correlation structure that exploits patterns among the observed response values. By incor...
-
作者:Zanghi, Hugo; Picard, Franck; Miele, Vincent; Ambroise, Christophe
作者单位:Dassault Systemes; VetAgro Sup; Universite Claude Bernard Lyon 1; INRAE; Universite Paris Saclay; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI)
摘要:In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off be...