-
作者:Denti, Francesco; Camerlenghi, Federico; Guindani, Michele; Mira, Antonietta
作者单位:University of California System; University of California Irvine; University of Milano-Bicocca; Universita della Svizzera Italiana; University of Insubria; University of Milano-Bicocca; Collegio Carlo Alberto; Bocconi University
摘要:The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is...
-
作者:Matechou, Eleni; Argiento, Raffaele
作者单位:University of Kent; University of Bergamo
摘要:We propose a novel approach for modeling capture-recapture (CR) data on open populations that exhibit temporary emigration, while also accounting for individual heterogeneity to allow for differences in visit patterns and capture probabilities between individuals. Our modeling approach combines changepoint processes-fitted using an adaptive approach-for inferring individual visits, with Bayesian mixture modeling-fitted using a nonparametric approach-for identifying dusters of individuals with ...
-
作者:McCulloch, Charles E.; Neuhaus, John M.
作者单位:University of California System; University of California San Francisco
摘要:Statistical models that generate predicted random effects are widely used to evaluate the performance of and rank patients, physicians, hospitals and health plans from longitudinal and clustered data. Predicted random effects have been proven to outperform treating clusters as fixed effects (essentially a categorical predictor variable) and using standard regression models, on average. These predicted random effects are often used to identify extreme or outlying values, such as poorly performi...
-
作者:Zhong, Wenxuan; Liu, Yiwen; Zeng, Peng
作者单位:University System of Georgia; University of Georgia; University of Arizona; Auburn University System; Auburn University
摘要:With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singu...
-
作者:He, Jingyu; Hahn, P. Richard
作者单位:City University of Hong Kong; Arizona State University; Arizona State University-Tempe
摘要:This article develops a novel stochastic tree ensemble method for nonlinear regression, referred to as accelerated Bayesian additive regression trees, or XBART. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning algorithms, XBART attains state-of-the-art performance at prediction and function estimation. Simulation studies demonstrate that XBART provides accurate point-wise estimates of the m...
-
作者:Chen, Hao; Xia, Yin
作者单位:University of California System; University of California Davis; Fudan University
摘要:Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Si...
-
作者:McFowland, Edward, III; Shalizi, Cosma Rohilla
作者单位:University of Minnesota System; University of Minnesota Twin Cities; Carnegie Mellon University; The Santa Fe Institute
摘要:Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, that is, with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ti...
-
作者:Zhou, Jie; Sun, Will Wei; Zhang, Jingfei; Li, Lexin
作者单位:University of Miami; Purdue University System; Purdue University; University of California System; University of California Berkeley
摘要:In modern data science, dynamic tensor data prevail in numerous applications. An important task is to characterize the relationship between dynamic tensor datasets and external covariates. However, the tensor data are often only partially observed, rendering many existing methods inapplicable. In this article, we develop a regression model with a partially observed dynamic tensor as the response and external covariates as the predictor. We introduce the low-rankness, sparsity, and fusion struc...
-
作者:Dubey, Paromita; Muller, Hans-Georg
作者单位:University of Southern California; University of California System; University of California Davis
-
作者:Liu, Yi; Rockova, Veronika
作者单位:University of Chicago; University of Chicago
摘要:Thompson sampling is a heuristic algorithm for the multi-armed bandit problem which has a long tradition in machine learning. The algorithm has a Bayesian spirit in the sense that it selects arms based on posterior samples of reward probabilities of each arm. By forging a connection between combinatorial binary bandits and spike-and-slab variable selection, we propose a stochastic optimization approach to subset selection called Thompson variable selection (TVS). TVS is a framework for interpr...