-
作者:Qiu, Yixuan; Wang, Xiao
作者单位:Carnegie Mellon University; Purdue University System; Purdue University
摘要:Latent variable models cover a broad range of statistical and machine learning models, such as Bayesian models, linear mixed models, and Gaussian mixture models. Existing methods often suffer from two major challenges in practice: (a) a proper latent variable distribution is difficult to be specified; (b) making an exact likelihood inference is formidable due to the intractable computation. We propose a novel framework for the inference of latent variable models that overcomes these two limita...
-
作者:Lin, Kevin Z.; Lei, Jing; Roeder, Kathryn
作者单位:University of Pennsylvania; Carnegie Mellon University
摘要:Scientists often embed cells into a lower-dimensional space when studying single-cell RNA-seq data for improved downstream analyses such as developmental trajectory analyses, but the statistical properties of such nonlinear embedding methods are often not well understood. In this article, we develop the exponential-family SVD (eSVD), a nonlinear embedding method for both cells and genes jointly with respect to a random dot product model using exponential-family distributions. Our estimator use...
-
作者:Mohan, Karthika; Pearl, Judea
作者单位:University of California System; University of California Berkeley; University of California System; University of California Los Angeles
摘要:This article reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: transparency, estimability, and testability. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are missing not at random (MNAR). In particular, we identify conditions that guarantee c...
-
作者:Eckles, Dean; Bakshy, Eytan
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Facebook Inc
摘要:Peer effects, in which an individual's behavior is affected by peers' behavior, are posited by multiple theories in the social sciences. Randomized field experiments that identify peer effects, however, are often expensive or infeasible, so many studies of peer effects use observational data, which is expected to suffer from confounding. Here we show, in the context of information and media diffusion, that high-dimensional adjustment of a nonexperimental control group (660 million observations...
-
作者:Xie, Fangzheng; Xu, Yanxun
作者单位:Johns Hopkins University
摘要:We develop a Bayesian approach called the Bayesian projected calibration to address the problem of calibrating an imperfect computer model using observational data from an unknown complex physical system. The calibration parameter and the physical system are parameterized in an identifiable fashion via the L-2-projection. The physical system is imposed a Gaussian process prior distribution, which naturally induces a prior distribution on the calibration parameter through the L-2-projection con...
-
作者:Delaigle, Aurore; Hall, Peter; Huang, Wei; Kneip, Alois
作者单位:University of Melbourne; University of Melbourne; University of Bonn; University of Bonn
摘要:We consider the problem of estimating the covariance function of functional data which are only observed on a subset of their domain, such as fragments observed on small intervals or related types of functional data. We focus on situations where the data enable to compute the empirical covariance function or smooth versions of it only on a subset of its domain which contains a diagonal band. We show that estimating the covariance function consistently outside that subset is possible as long as...
-
作者:Jiang, Bei; Raftery, Adrian E.; Steele, Russell J.; Wang, Naisyin
作者单位:University of Alberta; University of Washington; University of Washington Seattle; McGill University; University of Michigan System; University of Michigan
摘要:There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals' identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values. However, information loss or incorrectly specified imputation models can weaken or invalidate the inferences obtained from the MI-datasets. We propose a new masking framework...
-
作者:Qiu, Hongxiang; Carone, Marco; Sadikova, Ekaterina; Petukhova, Maria; Kessler, Ronald C.; Luedtke, Alex
作者单位:University of Washington; University of Washington Seattle; Harvard University; Harvard Medical School; University of Washington; University of Washington Seattle
-
作者:Sarkar, Abhra; Pati, Debdeep; Mallick, Bani K.; Carroll, Raymond J.
作者单位:University of Texas System; University of Texas Austin; Texas A&M University System; Texas A&M University College Station; University of Technology Sydney
摘要:Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hr recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating ...
-
作者:Hu, Jianwei; Zhang, Jingfei; Qin, Hong; Yan, Ting; Zhu, Ji
作者单位:Central China Normal University; University of Miami; Zhongnan University of Economics & Law; University of Michigan System; University of Michigan
摘要:The stochastic block model is widely used for detecting community structures in network data. How to test the goodness of fit of the model is one of the fundamental problems and has gained growing interests in recent years. In this article, we propose a novel goodness-of-fit test based on the maximum entry of the centered and rescaled adjacency matrix for the stochastic block model. One noticeable advantage of the proposed test is that the number of communities can be allowed to grow linearly ...