-
作者:Mao, Jialiang; Ma, Li
作者单位:Duke University
摘要:Studying the human microbiome has gained substantial interest in recent years, and a common task in the analysis of these data is to cluster microbiome compositions into subtypes. This subdivision of samples into subgroups serves as an intermediary step in achieving personalized diagnosis and treatment. In applying existing clustering methods to modern microbiome studies, including the American Gut Project (AGP) data, we found that this seemingly standard task, however, is very challenging in ...
-
作者:Kahmann, Sydney; Hartman, Erin; Leap, Jorja; Brantingham, P. Jeffrey
作者单位:University of California System; University of California Los Angeles; University of California System; University of California Berkeley; University of California System; University of California Los Angeles; University of California System; University of California Los Angeles
摘要:In 2011, the Los Angeles Police Department (LAPD), in conjunction with other governmental and nonprofit groups, launched the Community Safety Partnership (CSP) in several public housing developments in Los Angeles. Following a relationship-based policing model, officers were assigned to work collaboratively with community members to reduce crime and build trust. However, evaluating the causal impact of this policy intervention is difficult, given the notable differences between communities whe...
-
作者:Severn, Katie E.; Dryden, Ian L.; Preston, Simon P.
作者单位:University of Nottingham; State University System of Florida; Florida International University
摘要:Networks arise in many applications, such as in the analysis of text documents, social interactions and brain activity. We develop a general framework for extrinsic statistical analysis of samples of networks, motivated by networks representing text documents in corpus linguistics. We identify networks with their graph Laplacian matrices for which we define metrics, embeddings, tangent spaces and a projection from Euclidean space to the space of graph Laplacians. This framework provides a way ...
-
作者:Janicki, Ryan; Raim, Andrew M.; Holan, Scott H.; Maples, Jerry J.
作者单位:University of Missouri System; University of Missouri Columbia
摘要:Leveraging multivariate spatial dependence to improve the precision of estimates using American Community Survey data and other sample survey data has been a topic of recent interest among data users and federal statistical agencies. One strategy is to use a multivariate spatial mixed effects model with a Gaussian observation model and latent Gaussian process model. In practice, this works well for a wide range of tabulations. Nevertheless, in situations in which the data exhibit heterogeneity...
-
作者:Mou, Xichen; Zhang, Hongmei; Arshad, S. Hasan
作者单位:University of Memphis; University of Southampton
摘要:DNA methylation can be transmitted through generations. This paper proposes a clustering method to identify the intergenerational patterns from parents to their offspring. Motivated by the potential of correlation between DNA methylation sites, we use the multivariate generalized beta distribution to model the blockwise correlation structure among the sites. A stochastic EM algorithm is implemented to estimate the parameters, and BIC is applied to determine the optimal number of clusters. Simu...
-
作者:Yan, By Han; Wu, Jiexing; LI, Yang; Liu, Jun S.
作者单位:Harvard University; Alphabet Inc.; Google Incorporated
摘要:Bi-clustering is a useful approach in analyzing large biological data sets when the observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions and propose three Bayesian bi-clustering models on categorical data which increase in complexities in their modeling of the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: f...
-
作者:Chen, Jiahua; Liu, Yukun; Taylor, Carilyn G.; Zidek, James, V
作者单位:University of British Columbia; East China Normal University
摘要:The distribution of lumber strength of any grade may evolve, for example, due to climate change, forest fire, changes in processing methods, and other factors. So, in North America the forest products industry monitors the evolution of their means, percentiles, or other parameters to ensure the wood products meet the industrial standard. For administrative convenience and informativeness, one may adopt a rotating sampling plan by sampling 36 mills in the initial occasion and having six of them...
-
作者:D'Angelo, Nicoletta; Adelfio, Giada; Abbruzzo, Antonino; Mateu, Jorge
作者单位:University of Palermo; Universitat Jaume I
摘要:We analyse the spatio-temporal distribution of visitors' stops by touristic attractions in Palermo (Italy), using theory of stochastic point processes living on linear networks. We first propose an inhomogeneous Poisson point process model with a separable parametric spatio-temporal first-order intensity. We account for the spatial interaction among points on the given network, fitting a Gibbs point process model with mixed effects for the purely spatial component. This allows us to study firs...
-
作者:Barata, Raquel; Prado, Raquel; Sanso, Bruno
作者单位:University of California System; University of California Santa Cruz
摘要:Atmospheric rivers (ARs) are elongated regions of water vapor in the atmosphere that play a key role in global water cycles, particularly in western U.S. precipitation. The primary component of many AR detection schemes is the thresholding of the integrated water vapor transport (IVT) magnitude at a single quantile over time. Utilizing a recently developed family of parametric distributions for quantile regression, this paper develops a flexible dynamic quantile linear model (exDQLM) which ena...
-
作者:Chen, Aiyou; Au, Timothy C.
作者单位:Alphabet Inc.; Google Incorporated
摘要:Evaluating the incremental return on ad spend (iROAS) of a prospective online marketing strategy (i.e., the ratio of the strategy's causal effect on some response metric of interest relative to its causal effect on the ad spend) has become increasingly more important. Although randomized geo experiments are frequently employed for this evaluation, obtaining reliable estimates of iROAS can be challenging, as oftentimes only a small number of highly heterogeneous units are used. Moreover, advert...