-
作者:Hwang, Youngdeok; Lu, Siyuan; Kim, Jae-Kwang
作者单位:Sungkyunkwan University (SKKU); International Business Machines (IBM); IBM USA; Iowa State University; Korea Advanced Institute of Science & Technology (KAIST)
摘要:Accurately forecasting solar power using the data from multiple sources is an important but challenging problem. Our goal is to combine two different physics model forecasting outputs with real measurements from an automated monitoring network so as to better predict solar power in a timely manner. To this end, we propose a new approach of analyzing large-scale multilevel models with great computational efficiency requiring minimum monitoring and intervention. This approach features a division...
-
作者:Chiquet, Julien; Mariadassou, Mahendra; Robin, Stephane
作者单位:Universite Paris Saclay; INRAE; AgroParisTech; INRAE; Universite Paris Saclay
摘要:Many application domains, such as ecology or genomics, have to deal with multivariate non-Gaussian observations. A typical example is the joint observation of the respective abundances of a set of species in a series of sites aiming to understand the covariations between these species. The Gaussian setting provides a canonical way to model such dependencies but does not apply in general. We consider here the multivariate exponential family framework for which we introduce a generic model with ...
-
作者:Mankad, Shawn; Hu, Shengli; Gopal, Anandasivam
作者单位:Cornell University; University System of Maryland; University of Maryland College Park
摘要:Mobile apps are one of the building blocks of the mobile digital economy. A differentiating feature of mobile apps to traditional enterprise software is online reviews, which are available on app marketplaces and represent a valuable source of consumer feedback on the app. We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a const...
-
作者:Nalenz, Malte; Villani, Mattias
作者单位:Linkoping University
摘要:We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu [Ann. Appl. Stat. 2 (2008) 916-954] where rules from decision trees and linear terms are used in a Ll -regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictors while leaving the important signal essentially untouc...
-
作者:Ertefaie, Ashkan; Anh Nguyen; Harding, David J.; Morenoff, Jeffrey D.; Yang, Wei
作者单位:University of Rochester; University of Michigan System; University of Michigan; University of California System; University of California Berkeley; University of Pennsylvania
摘要:This article discusses an instrumental variable approach for analyzing censored data that includes many instruments that are weakly associated with the endogenous variable. We study the effect of imprisonment on time to employment using an administrative data on all individuals sentenced for felony in Michigan in the years 2003-2006. Despite the large body of research on the effect of prison on employment, this is still a controversial topic, especially since some of the studies could have bee...
-
作者:Griffin, Maryclare; Gile, Krista J.; Fredricksen-Goldsen, Karen, I; Handcock, Mark S.; Erosheva, Elena A.
作者单位:University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle; University of Massachusetts System; University of Massachusetts Amherst; University of California System; University of California Los Angeles
摘要:Respondent-driven sampling (RDS) is a method for sampling from a target population by leveraging social connections. RDS is invaluable to the study of hard-to-reach populations. However, RDS is costly and can be infeasible. RDS is infeasible when RDS point estimators have small effective sample sizes (large design effects) or when RDS interval estimators have large confidence intervals relative to estimates obtained in previous studies or poor coverage. As a result, researchers need tools to a...
-
作者:Yuan, Mia; Tang, Cheng Yong; Hong, Yili; Yang, Jian
作者单位:Virginia Polytechnic Institute & State University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; University of Colorado System; University of Colorado Denver
摘要:Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we attempt to fill this blank by developing a procedure for quantifying the level of associated uncertainties upon carefully disentangling multiple con...
-
作者:Liu, Lydia T.; Dobriban, Edgar; Singer, Amit
作者单位:University of California System; University of California Berkeley; University of Pennsylvania; Princeton University; Princeton University
摘要:Many applications involve large datasets with entries from exponential family distributions. Our main motivating application is photon-limited imaging, where we observe images with Poisson distributed pixels. We focus on X-ray Free Electron Lasers (XFEL), a quickly developing technology whose goal is to reconstruct molecular structure. In XFEL, estimating the principal components of the noiseless distribution is needed for denoising and for structure determination. However, the standard method...
-
作者:Petersen, Ashley; Simon, Noah; Witten, Daniela
作者单位:University of Minnesota System; University of Minnesota Twin Cities; University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle
摘要:In the past few years, new technologies in the field of neuroscience have made it possible to simultaneously image activity in large populations of neurons at cellular resolution in behaving animals. In mid-2016, a huge repository of this so-called calcium imaging data was made publicly available. The availability of this large-scale data resource opens the door to a host of scientific questions for which new statistical methods must be developed. In this paper we consider the first step in th...