-
作者:Tipton, John R.; Hooten, Mevin B.; Nolan, Connor; Booth, Robert K.; McLachlan, Jason
作者单位:University of Arkansas System; University of Arkansas Fayetteville; Colorado State University System; Colorado State University Fort Collins; United States Department of the Interior; United States Geological Survey; University of Arizona; Lehigh University; University of Notre Dame
摘要:Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framewo...
-
作者:Nethery, Rachel C.; Mealli, Fabrizia; Dominici, Francesca
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; University of Florence
摘要:Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal effects. When data suffer from non-overlap, estimation of these estimands requires reliance on model specifications due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health resea...
-
作者:Berger, Moritz; Wagner, Michael; Schmid, Matthias
作者单位:University of Bonn; University of Bonn; Helmholtz Association; German Center for Neurodegenerative Diseases (DZNE)
摘要:We propose a regression model termed extended GB2 model, which is designed to analyze ratios of biomarkers in epidemiological and medical research. Typical examples of biomarker ratios are given by the LDL/HDL cholesterol ratio in cardiovascular research and the amyloid-beta 42/40 ratio in dementia research. Unlike regression modeling with a log-transformed response, which is often used to describe ratio outcomes in observational studies, the extended GB2 model directly links the expectation o...
-
作者:Marchetti-Bowick, Micol; Yu, Yaoliang; Wu, Wei; Xing, Eric P.
作者单位:Carnegie Mellon University; University of Waterloo
摘要:In this work, we present a new approach for jointly performing eQTL mapping and gene network inference while encouraging a transfer of information between the two tasks. We address this problem by formulating it as a multiple-output regression task in which we aim to learn the regression coefficients while simultaneously estimating the conditional independence relationships among the set of response variables. The approach we develop uses structured sparsity penalties to encourage the sharing ...
-
作者:Zhang, Ningshan; Schmaus, Kyle; Perry, Patrick O.
作者单位:New York University
摘要:We consider a particular instance of a common problem in recommender systems, using a database of book reviews to inform user-targeted recommendations. In our dataset, books are categorized into genres and subgenres. To exploit this nested taxonomy, we use a hierarchical model that enables information pooling across across similar items at many levels within the genre hierarchy. The main challenge in deploying this model is computational. The data sizes are large and fitting the model at scale...
-
作者:Katsevich, Eugene; Sabatti, Chiara
作者单位:Stanford University
摘要:We tackle the problem of selecting from among a large number of variables those that are important for an outcome. We consider situations where groups of variables are also of interest. For example, each variable might be a genetic polymorphism, and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorphisms. In this context, to discover that a variable is relevant for the outcome implies discovering that the larger ent...
-
作者:Berg, Stephen; Zhu, Jun; Clayton, Murray K.; Shea, Monika E.; Mladenoff, David J.
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison
摘要:The Wisconsin Public Land Survey database describes historical forest composition at high spatial resolution and is of interest in ecological studies of forest composition in Wisconsin just prior to significant Euro-American settlement. For such studies it is useful to identify recurring subpopulations of tree species known as communities, but standard clustering approaches for subpopulation identification do not account for dependence between spatially nearby observations. Here, we develop an...
-
作者:Liang, Kun
作者单位:University of Waterloo
摘要:Finding differentially expressed genes is a common task in high-throughput transcriptome studies. While traditional statistical methods rank the genes by their test statistics alone, we analyze an RNA sequencing dataset using the auxiliary information of gene length and the test statistics from a related microarray study. Given the auxiliary information, we propose a novel nonparametric empirical Bayes procedure to estimate the posterior probability of differential expression for each gene. We...
-
作者:Zhang, Hongbin; Wu, Lang
作者单位:City University of New York (CUNY) System; University of British Columbia
摘要:For a time-to-event outcome with censored time-varying covariates, a joint Cox model with a linear mixed effects model is the standard modeling approach. In some applications such as AIDS studies, mechanistic nonlinear models are available for some covariate process such as viral load during anti-HIV treatments, derived from the underlying data-generation mechanisms and disease progression. Such a mechanistic nonlinear covariate model may provide better-predicted values when the covariates are...
-
作者:Dobra, Adrian; Valdes, Camilo; Ajdic, Dragana; Clarke, Bertrand; Clarke, Jennifer
作者单位:University of Washington; University of Washington Seattle; State University System of Florida; Florida International University; University of Miami; University of Miami; University of Nebraska System; University of Nebraska Lincoln
摘要:There is a growing awareness of the important roles that microbial communities play in complex biological processes. Modern investigation of these often uses next generation sequencing of metagenomic samples to determine community composition. We propose a statistical technique based on clique loglinear models and Bayes model averaging to identify microbial components in a metagenomic sample at various taxonomic levels that have significant associations. We describe the model class, a stochast...