-
作者:Silva, Ricardo; Heller, Katherine; Ghahramani, Zoubin; Airoldi, Edoardo M.
作者单位:University of London; University College London; University of Cambridge; Harvard University
摘要:Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop all approach to relational learning which, given a set of pairs of objects S = {A((1)) : B-(1), A((2)) : B-(2), ..., A((N)) : B-(N)), measures how well other pairs A : B fit in with the set S. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S? Such questions are particularly relevant in information...
-
作者:Chipman, Hugh A.; George, Edward I.; McCulloch, Robert E.
作者单位:Acadia University; University of Pennsylvania; University of Texas System; University of Texas Austin
摘要:We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical mode...
-
作者:Liang, Hua; Miao, Hongyu; Wu, Hulin
作者单位:University of Rochester
摘要:Modeling viral dynamics in HIV/AIDS studies has resulted in a deep understanding of pathogenesis of HIV infection from which novel antiviral treatment guidance and strategies have been derived. Viral dynamics models based on nonlinear differential equations have been proposed and well developed over the past few decades. However, it is quite challenging to use experimental or clinical data to estimate the unknown parameters (both constant and time-varying parameters) in complex nonlinear diffe...
-
作者:Kolar, Mladen; Song, Le; Ahmed, Amr; Xing, Eric P.
作者单位:Carnegie Mellon University
摘要:Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two new machine learning methods for estimating time-varying networks, which both build on a temporally...
-
作者:Ghosh, Samiran
作者单位:Purdue University System; Purdue University; Purdue University in Indianapolis
摘要:This paper describes a novel approach based on proportional imputation when identical units produced in a batch have random but independent installation and failure times. The current problem is motivated by a real life industrial production-delivery supply chain where identical units are shipped after production to a third party warehouse and then sold at a future date for possible installation. Due to practical limitations, at any given time point, the exact installation as well as the failu...
-
作者:Xu, Ya; Dyer, Justin S.; Owen, Art B.
作者单位:Stanford University
摘要:In semi-supervised learning on graphs, response variables observed at one node are used to estimate missing values at other nodes. The methods exploit correlations between nearby nodes in the graph. In this paper we prove that many such proposals are equivalent to kriging predictors based on a fixed covariance matrix driven by the link structure of the graph. We then propose a data-driven estimator of the correlation structure that exploits patterns among the observed response values. By incor...
-
作者:Zanghi, Hugo; Picard, Franck; Miele, Vincent; Ambroise, Christophe
作者单位:Dassault Systemes; VetAgro Sup; Universite Claude Bernard Lyon 1; INRAE; Universite Paris Saclay; Centre National de la Recherche Scientifique (CNRS); CNRS - National Institute for Mathematical Sciences (INSMI)
摘要:In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off be...
-
作者:Chatterjee, Snigdhansu; Qiu, Peihua
作者单位:University of Minnesota System; University of Minnesota Twin Cities
摘要:This paper deals with phase II, univariate, statistical process control when a set of in-control data is available, and when both the in-control and out-of-control distributions of the process are unknown. Existing process control techniques typically require substantial knowledge about the in-control and out-of-control distributions of the process, which is often difficult to obtain in practice. We propose (a) using a sequence of control limits for the cumulative sum (CUSUM) control charts, w...
-
作者:Finley, Andrew O.; Banerjee, Sudipto; McRoberts, Ronald E.
作者单位:Michigan State University; Michigan State University; University of Minnesota System; University of Minnesota Twin Cities; United States Department of Agriculture (USDA); United States Forest Service
摘要:Spatially explicit data layers of tree species assemblages, referred to as forest types or forest type groups, are a key component in large-scale assessments of forest sustainability, biodiversity, timber biomass, carbon sinks and forest health monitoring. This paper explores the utility of coupling georeferenced national forest inventory (NFI) data with readily available and spatially complete environmental predictor variables through spatially-varying multinomial logistic regression models t...
-
作者:Chernoff, Herman; Lo, Shaw-Hwa; Zheng, Tian
作者单位:Harvard University; Columbia University
摘要:A trend in all scientific disciplines, based on advances in technology, is the increasing availability of high dimensional data in which are buried important information. A current urgent challenge to statisticians is to develop effective methods of finding the useful information from the vast amounts of messy and noisy data available, most of which are noninformative. This paper presents a general computer intensive approach, based on a method pioneered by Lo and Zheng for detecting which, of...