-
作者:Tang, Yunfan; Ma, Li; Nicolae, Dan L.
作者单位:University of Chicago; Duke University
摘要:In this paper, we introduce the phylogenetic scan test (PhyloScan) for investigating cross-group differences in microbiome compositions using the Dirichlet-tree multinomial (DTM) model. DTM models the microbiome data through a cascade of independent local DMs on the internal nodes of the phylogenetic tree. Each of the local DMs captures the count distributions of a certain number of operational taxonomic units at a given resolution. Since distributional differences tend to occur in clusters al...
-
作者:Chiquet, Julien; Mariadassou, Mahendra; Robin, Stephane
作者单位:Universite Paris Saclay; INRAE; AgroParisTech; INRAE; Universite Paris Saclay
摘要:Many application domains, such as ecology or genomics, have to deal with multivariate non-Gaussian observations. A typical example is the joint observation of the respective abundances of a set of species in a series of sites aiming to understand the covariations between these species. The Gaussian setting provides a canonical way to model such dependencies but does not apply in general. We consider here the multivariate exponential family framework for which we introduce a generic model with ...
-
作者:Mankad, Shawn; Hu, Shengli; Gopal, Anandasivam
作者单位:Cornell University; University System of Maryland; University of Maryland College Park
摘要:Mobile apps are one of the building blocks of the mobile digital economy. A differentiating feature of mobile apps to traditional enterprise software is online reviews, which are available on app marketplaces and represent a valuable source of consumer feedback on the app. We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a const...
-
作者:Nalenz, Malte; Villani, Mattias
作者单位:Linkoping University
摘要:We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu [Ann. Appl. Stat. 2 (2008) 916-954] where rules from decision trees and linear terms are used in a Ll -regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictors while leaving the important signal essentially untouc...
-
作者:Li, Xin; Belianinov, Alex; Dyck, Ondrej; Jesse, Stephen; Park, Chiwoo
作者单位:State University System of Florida; Florida State University; United States Department of Energy (DOE); Oak Ridge National Laboratory; Center for Nanophase Materials Sciences
摘要:This paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. ...
-
作者:Cheng, Yicheng; Dundar, Murat; Mohler, George
作者单位:Purdue University System; Purdue University; Purdue University in Indianapolis
摘要:Epidemic-type aftershock sequence (ETAS) point process is a common model for the occurrence of earthquake events. The ETAS model consists of a stationary background Poisson process modeling spontaneous earthquakes and a triggering kernel representing the space-time-magnitude distribution of aftershocks. Popular nonparametric methods for estimation of the background intensity include histograms and kernel density estimators. While these methods are able to capture local spatial heterogeneity in...
-
作者:Balakrishnan, Sivaraman; Wasserman, Larry
作者单位:Carnegie Mellon University
摘要:The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson. In this survey we review some recently developed methods for testing hypotheses about high-dimensional multinomials. Traditional tests like the chi(2)-test and the likelihood ratio test can have poor power in the high-dimensional setting. Much of the research in this area has focused on finding tests with asymptotically normal limits and developing (stringent) co...
-
作者:Meng, Xiao-Li
作者单位:Harvard University
摘要:Statisticians are increasingly posed with thought-provoking and even paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. By developing measures for data quality, this article suggests a framework to address such a question: Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population? A 5-element Eulerformula-like identity shows that for any dataset of siz...
-
作者:Golchi, Shirin; Lockhart, Richard
作者单位:Simon Fraser University
摘要:The statistical procedure used in the search for new particles is investigated in this paper. The discovery of the Higgs particles is used to lay out the problem and the existing procedures. A Bayesian hierarchical model is proposed to address inference about the parameters of interest while incorporating uncertainty about the nuisance parameters into the model. In addition to inference, a decision making procedure is proposed. A loss function is introduced that mimics the important features o...
-
作者:Ertefaie, Ashkan; Anh Nguyen; Harding, David J.; Morenoff, Jeffrey D.; Yang, Wei
作者单位:University of Rochester; University of Michigan System; University of Michigan; University of California System; University of California Berkeley; University of Pennsylvania
摘要:This article discusses an instrumental variable approach for analyzing censored data that includes many instruments that are weakly associated with the endogenous variable. We study the effect of imprisonment on time to employment using an administrative data on all individuals sentenced for felony in Michigan in the years 2003-2006. Despite the large body of research on the effect of prison on employment, this is still a controversial topic, especially since some of the studies could have bee...