-
作者:Mankad, Shawn; Hu, Shengli; Gopal, Anandasivam
作者单位:Cornell University; University System of Maryland; University of Maryland College Park
摘要:Mobile apps are one of the building blocks of the mobile digital economy. A differentiating feature of mobile apps to traditional enterprise software is online reviews, which are available on app marketplaces and represent a valuable source of consumer feedback on the app. We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a const...
-
作者:Nalenz, Malte; Villani, Mattias
作者单位:Linkoping University
摘要:We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu [Ann. Appl. Stat. 2 (2008) 916-954] where rules from decision trees and linear terms are used in a Ll -regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictors while leaving the important signal essentially untouc...
-
作者:Li, Xin; Belianinov, Alex; Dyck, Ondrej; Jesse, Stephen; Park, Chiwoo
作者单位:State University System of Florida; Florida State University; United States Department of Energy (DOE); Oak Ridge National Laboratory; Center for Nanophase Materials Sciences
摘要:This paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. ...
-
作者:Cheng, Yicheng; Dundar, Murat; Mohler, George
作者单位:Purdue University System; Purdue University; Purdue University in Indianapolis
摘要:Epidemic-type aftershock sequence (ETAS) point process is a common model for the occurrence of earthquake events. The ETAS model consists of a stationary background Poisson process modeling spontaneous earthquakes and a triggering kernel representing the space-time-magnitude distribution of aftershocks. Popular nonparametric methods for estimation of the background intensity include histograms and kernel density estimators. While these methods are able to capture local spatial heterogeneity in...
-
作者:Balakrishnan, Sivaraman; Wasserman, Larry
作者单位:Carnegie Mellon University
摘要:The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson. In this survey we review some recently developed methods for testing hypotheses about high-dimensional multinomials. Traditional tests like the chi(2)-test and the likelihood ratio test can have poor power in the high-dimensional setting. Much of the research in this area has focused on finding tests with asymptotically normal limits and developing (stringent) co...
-
作者:Meng, Xiao-Li
作者单位:Harvard University
摘要:Statisticians are increasingly posed with thought-provoking and even paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. By developing measures for data quality, this article suggests a framework to address such a question: Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population? A 5-element Eulerformula-like identity shows that for any dataset of siz...
-
作者:Golchi, Shirin; Lockhart, Richard
作者单位:Simon Fraser University
摘要:The statistical procedure used in the search for new particles is investigated in this paper. The discovery of the Higgs particles is used to lay out the problem and the existing procedures. A Bayesian hierarchical model is proposed to address inference about the parameters of interest while incorporating uncertainty about the nuisance parameters into the model. In addition to inference, a decision making procedure is proposed. A loss function is introduced that mimics the important features o...
-
作者:Ertefaie, Ashkan; Anh Nguyen; Harding, David J.; Morenoff, Jeffrey D.; Yang, Wei
作者单位:University of Rochester; University of Michigan System; University of Michigan; University of California System; University of California Berkeley; University of Pennsylvania
摘要:This article discusses an instrumental variable approach for analyzing censored data that includes many instruments that are weakly associated with the endogenous variable. We study the effect of imprisonment on time to employment using an administrative data on all individuals sentenced for felony in Michigan in the years 2003-2006. Despite the large body of research on the effect of prison on employment, this is still a controversial topic, especially since some of the studies could have bee...
-
作者:Griffin, Maryclare; Gile, Krista J.; Fredricksen-Goldsen, Karen, I; Handcock, Mark S.; Erosheva, Elena A.
作者单位:University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle; University of Massachusetts System; University of Massachusetts Amherst; University of California System; University of California Los Angeles
摘要:Respondent-driven sampling (RDS) is a method for sampling from a target population by leveraging social connections. RDS is invaluable to the study of hard-to-reach populations. However, RDS is costly and can be infeasible. RDS is infeasible when RDS point estimators have small effective sample sizes (large design effects) or when RDS interval estimators have large confidence intervals relative to estimates obtained in previous studies or poor coverage. As a result, researchers need tools to a...
-
作者:Yuan, Mia; Tang, Cheng Yong; Hong, Yili; Yang, Jian
作者单位:Virginia Polytechnic Institute & State University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; University of Colorado System; University of Colorado Denver
摘要:Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we attempt to fill this blank by developing a procedure for quantifying the level of associated uncertainties upon carefully disentangling multiple con...