-
作者:Chang, Andersen; Allen, Genevera I.
作者单位:Rice University; Rice University
摘要:With modern calcium imaging technology, activities of thousands of neurons can be recorded in vivo. These experiments can potentially provide new insights into intrinsic functional neuronal connectivity, defined as contemporaneous correlations between neuronal activities. As a common tool for estimating conditional dependencies in high-dimensional settings, graphical models are a natural choice for estimating functional connectivity networks. However, raw neuronal activity data presents a uniq...
-
作者:Fan, Yimei; Liao, Yuan; Ryzhov, Ilya o.; Zhang, Kunpeng
作者单位:University System of Maryland; University of Maryland College Park; Rutgers University System; Rutgers University New Brunswick; Rutgers University Newark; University System of Maryland; University of Maryland College Park
摘要:In many applications of business and marketing analytics, predictive models are fit using hierarchically structured data: common characteristics of products, customers, or web pages are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. The model may thus contain hundreds of thousands of binary variables, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant feat...
-
作者:Rhodes, Grace; Davidian, Marie; Lu, Wenbin
作者单位:North Carolina State University
摘要:Sepsis, a complex medical condition that involves severe infections with life-threatening organ dysfunction, is a leading cause of death worldwide. Treatment of sepsis is highly challenging. When making treatment decisions, clinicians and patients desire accurate predictions of mean residual life (MRL) that leverage all available patient information, including longitudinal biomarker data. Biomarkers are biological, clinical, and other variables reflecting disease progression that are often mea...
-
作者:Dai, Ben; Shen, Xiaotong; Chen, Lin yee; Li, Chunlin; Pan, Wei
作者单位:Chinese University of Hong Kong; University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities
摘要:In explainable artificial intelligence, discriminative feature localization is critical to reveal a black-box model's decision-making process from raw data to prediction. In this article we use two real datasets, the MNIST handwritten digits and MIT-BIH electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely, adaptiveness, predictive importance and effectiveness. Then we develop a localization framework, based on adversarial attacks, to effectively l...
-
作者:De Iorio, Maria; Favaro, Stefano; Guglielmi, Alessandra; Ye, Lifeng
作者单位:National University of Singapore; University of London; University College London
摘要:The study of temporal dynamics of gender and ethnic stereotypes is an important topic in many disciplines at the intersection between statistics and social sciences. In this paper we make use of word embeddings, a common tool in natural language processing and of Bayesian nonparametric mixture modeling for the analysis of temporal dynamics of gender stereotypes in adjectives and occupation over the 20th and 21st centuries in the United States. Our Bayesian nonparametric approach relies on a no...
-
作者:Huang, Theodore; Ploenzke, Matthew; Braun, Danielle
作者单位:Harvard University; Harvard T.H. Chan School of Public Health
摘要:Pedigree data contain family history information that is used to analyze hereditary diseases. These clinical data sets may contain duplicate records due to the same family visiting a clinic multiple times or a clinician entering multiple versions of the family for testing purposes. Inferences drawn from the data or using them for training or validation without removing the duplicates could lead to invalid conclusions, and hence identifying the duplicates is essential. Since family structures c...
-
作者:Zhang, Hong; Liu, Ming; Jin, Jiashun; Wu, Zheyang
作者单位:Pfizer; Pfizer USA; Worcester Polytechnic Institute; Carnegie Mellon University
摘要:The SNP-set analysis is a powerful tool for dissecting the genetics of complex human diseases. There are three fundamental genetic association approaches to SNR-set analysis: the marginal model fitting approach, the joint model fitting approach, and the decorrelation approach. A problem of primary interest is how these approaches compare with each other. To address this problem, we develop a theoretical platform to compare the signal-tonoise ratio (SNR) of these approaches under the generalize...
-
作者:Kim, Yura; Kessler, Daniel; Levina, Elizaveta
作者单位:University of Michigan System; University of Michigan
摘要:Functional connections in the brain are frequently represented by weighted networks, with nodes representing locations in the brain and edges representing the strength of connectivity between these locations. One challenge in analyzing such data is that inference at the individual edge level is not particularly biologically meaningful; interpretation is more useful at the level of so-called functional systems or groups of nodes and connections between them; this is often called graph-aware inf...
-
作者:Naf, Jeffrey; Spohn, Meta-lina; Michel, Loris; Meinshausen, Nicolai
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich
摘要:Given the prevalence of missing data in modern statistical research, a broad range of methods is available for any given imputation task. How does one choose the best imputation method in a given application? The standard approach is to select some observations, set their status to missing, and compare prediction accuracy of the methods under consideration of these observations. Besides having to somewhat artificially mask observations, a shortcoming of this approach is that imputations based ...
-
作者:Nguyen, Thi Kim Hue; van den Berge, Koen; Chiogna, Monica; Risso, Davide
作者单位:University of Padua; Ghent University; University of Bologna
摘要:The problem of estimating the structure of a graph from observed data is of growing interest in the context of high-throughput genomic data and single-cell RNA sequencing in particular. These, however, are challenging applications, since the data consist of high-dimensional counts with high variance and overabundance of zeros. Here we present a general framework for learning the structure of a graph from single-cell RNA-seq data, based on the zero-inflated negative binomial distribution. We de...