-
作者:Balakrishnan, Sivaraman; Wasserman, Larry
作者单位:Carnegie Mellon University
摘要:The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson. In this survey we review some recently developed methods for testing hypotheses about high-dimensional multinomials. Traditional tests like the chi(2)-test and the likelihood ratio test can have poor power in the high-dimensional setting. Much of the research in this area has focused on finding tests with asymptotically normal limits and developing (stringent) co...
-
作者:Meng, Xiao-Li
作者单位:Harvard University
摘要:Statisticians are increasingly posed with thought-provoking and even paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. By developing measures for data quality, this article suggests a framework to address such a question: Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population? A 5-element Eulerformula-like identity shows that for any dataset of siz...
-
作者:Zhang, Yilin; Poux-Berthe, Marie; Wells, Chris; Koc-Michalska, Karolina; Rohe, Karl
作者单位:University of Wisconsin System; University of Wisconsin Madison; Audencia; Boston University
摘要:We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii) is...
-
作者:Shin, Yei Eun; Ding, Yu; Huang, Jianhua Z.
作者单位:Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI); NIH National Cancer Institute- Division of Cancer Epidemiology & Genetics
摘要:In the wind industry, engineers perform retrofitting upgrades on inservice wind turbines for the purpose of improving power production capabilities. Considering how costly an upgrade can be, people often wonder about the upgrade effect: whether it indeed improves turbine performances, and if so, how much. One cannot simply compare power outputs for the purpose of assessing a turbine's improvement, as wind power generation is affected by an array of environmental covariates, including wind spee...
-
作者:Donnat, Claire; Holmes, Susan
作者单位:Stanford University
摘要:From longitudinal biomedical studies to social networks, graphs have emerged as essential objects for describing evolving interactions between agents in complex systems. In such studies, after pre-processing, the data are encoded by a set of graphs, each representing a system's state at a different point in time or space. The analysis of the system's dynamics depends on the selection of the appropriate analytical tools. In particular, after specifying properties characterizing similarities bet...
-
作者:Snoke, Joshua; Brick, Timothy R.; Slavkovic, Aleksandra; Hunter, Michael D.
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; University of Oklahoma System; University of Oklahoma Health Sciences Center
摘要:This paper focuses on the privacy paradigm of providing access to researchers to remotely carry out analyses on sensitive data stored behind separate firewalls. We address the situation where the analysis demands data from multiple physically separate databases which cannot be combined. Motivating this work is a real model based on research data on kinship foster placement that came from multiple sources and could only be combined through a lengthy process with a trusted research network. We d...