-
作者:Savitsky, Terrance D.; Paddock, Susan M.
作者单位:RAND Corporation
摘要:We develop a dependent Dirichlet process (DDP) model for repeated measures multiple membership (MM) data. This data structure arises in studies under which an intervention is delivered to each client through a sequence of elements which overlap with those of other clients on different occasions. Our interest concentrates on study designs for which the overlaps of sequences occur for clients who receive an intervention in a shared or grouped fashion whose memberships may change over multiple tr...
-
作者:Vu, Duy Q.; Hunter, David R.; Schweinberger, Michael
作者单位:University of Melbourne; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Rice University
摘要:We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data s...
-
作者:Chandler, Richard B.; Royle, J. Andrew
作者单位:United States Department of the Interior; United States Geological Survey
摘要:Recently developed spatial capture-recapture (SCR) models represent a major advance over traditional capture-recapture (CR) models because they yield explicit estimates of animal density instead of population size within an unknown area. Furthermore, unlike nonspatial CR methods, SCR models account for heterogeneity in capture probability arising from the juxtaposition of animal activity centers and sample locations. Although the utility of SCR methods is gaining recognition, the requirement t...
-
作者:Gaydos, Travis L.; Heckman, Nancy E.; Kirkpatrick, Mark; Stinchcombe, J. R.; Schmitt, Johanna; Kingsolver, Joel; Marron, J. S.
作者单位:MITRE Corporation; University of British Columbia; University of Texas System; University of Texas Austin; University of Toronto; University of California System; University of California Davis; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill
摘要:Principal Components Analysis (PCA) is a common way to study the sources of variation in a high-dimensional data set. Typically, the leading principal components are used to understand the variation in the data or to reduce the dimension of the data for subsequent analysis. The remaining principal components are ignored since they explain little of the variation in the data. However, evolutionary biologists gain important insights from these low variation directions. Specifically, they are int...
-
作者:Stein, Michael L.; Chen, Jie; Anitescu, Mihai
作者单位:University of Chicago; United States Department of Energy (DOE); Argonne National Laboratory
摘要:We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certain conditions, including bounded condition number of the covariance matrix, the approach achieves O(n) storage and nearly O(n) computational effort per optimization step, where n is the number of data sites. Here, we prove that if the condition number of the covariance matrix is bounded, then the approximat...
-
作者:Crossett, Andrew; Lee, Ann B.; Klei, Lambertus; Devlin, Bernie; Roeder, Kathryn
作者单位:Pennsylvania State System of Higher Education (PASSHE); West Chester University of Pennsylvania; Carnegie Mellon University; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh
摘要:Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predis-position to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be identified from family records, and more distant relatives can be inferred from large panels of genetic markers. Unfortunately these empirical estimates can be noisy, especially regarding distant rel...
-
作者:Konomi, Bledar A.; Dhavala, Soma S.; Huang, Jianhua Z.; Kundu, Subrata; Huitink, David; Liang, Hong; Ding, Yu; Mallick, Bani K.
作者单位:Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station
摘要:The properties of materials synthesized with nanoparticles (NPs) are highly correlated to the sizes and shapes of the nanoparticles. The transmission electron microscopy (TEM) imaging technique can be used to measure the morphological characteristics of NPs, which can be simple circles or more complex irregular polygons with varying degrees of scales and sizes. A major difficulty in analyzing the TEM images is the overlapping of objects, having different morphological properties with no specif...
-
作者:Rusch, Thomas; Hofmarcher, Paul; Hatzinger, Reinhold; Hornik, Kurt
作者单位:Vienna University of Economics & Business; Johannes Kepler University Linz; Vienna University of Economics & Business
摘要:The WikiLeaks Afghanistan war logs contain nearly 77,000 reports of incidents in the US-led Afghanistan war, covering the period from January 2004 to December 2009. The recent growth of data on complex social systems and the potential to derive stories from them has shifted the focus of journalistic and scientific attention increasingly toward data-driven journalism and computational social science. In this paper we advocate the usage of modern statistical methods for problems of data journali...