-
作者:Chen, Jianmin; Aseltine, Robert H.; Wang, Fei; Chen, Kun
作者单位:University of Connecticut; Cornell University; Weill Cornell Medicine
摘要:Statistical learning with a large number of rare binary features is commonly encountered in analyzing electronic health records (EHR) data, especially in the modeling of disease onset with prior medical diagnoses and procedures. Dealing with the resulting highly sparse and large-scale binary feature matrix is notoriously challenging as conventional methods may suffer from a lack of power in testing and inconsistency in model fitting, while machine learning methods may suffer from the inability...
-
作者:Bormetti, Giacomo
作者单位:University of Bologna
-
作者:Zheng, Lili; Allen, Genevera I.
作者单位:Rice University; Rice University; Rice University; Baylor College of Medicine; Baylor College of Medicine; Baylor College Medical Hospital
摘要:In this article, we investigate the Gaussian graphical model inference problem in a novel setting that we call erose measurements, referring to irregularly measured or observed data. For graphs, this results in different node pairs having vastly different sample sizes which frequently arises in data integration, genomics, neuroscience, and sensor networks. Existing works characterize the graph selection performance using the minimum pairwise sample size, which provides little insights for eros...
-
作者:Duan, Leo L.; Roy, Arkaprava
作者单位:State University System of Florida; University of Florida; State University System of Florida; University of Florida
摘要:Spectral clustering views the similarity matrix as a weighted graph, and partitions the data by minimizing a graph-cut loss. Since it minimizes the across-cluster similarity, there is no need to model the distribution within each cluster. As a result, one reduces the chance of model misspecification, which is often a risk in mixture model-based clustering. Nevertheless, compared to the latter, spectral clustering has no direct ways of quantifying the clustering uncertainty (such as the assignm...
-
作者:Guo, Zijian
作者单位:Rutgers University System; Rutgers University New Brunswick
摘要:Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this article, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen and B & uuml;hlmann, AoS, 43(4), 1801-1830). The maximin effect provides a...
-
作者:He, Yi
作者单位:University of Amsterdam
摘要:This article establishes a comprehensive theory of the optimality, robustness, and cross-validation selection consistency for the ridge regression under factor-augmented models with possibly dense idiosyncratic information. Using spectral analysis for random matrices, we show that the ridge regression is asymptotically efficient in capturing both factor and idiosyncratic information by minimizing the limiting predictive loss among the entire class of spectral regularized estimators under large...
-
作者:Westling, Ted; Luedtke, Alex; Gilbert, Peter B.; Carone, Marco
作者单位:University of Massachusetts System; University of Massachusetts Amherst; University of Washington; University of Washington Seattle; Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle
摘要:In the absence of data from a randomized trial, researchers may aim to use observational data to draw causal inference about the effect of a treatment on a time-to-event outcome. In this context, interest often focuses on the treatment-specific survival curves, that is, the survival curves were the population under study to be assigned to receive the treatment or not. Under certain conditions, including that all confounders of the treatment-outcome relationship are observed, the treatment-spec...
-
作者:Katzfuss, Matthias; Schafer, Florian
作者单位:Texas A&M University System; Texas A&M University College Station; University System of Georgia; Georgia Institute of Technology
摘要:A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while resulting in a closed-form and invertible posterior map. We then focus on inferring the distribution of a nonstationary spatial field from a small numbe...
-
作者:Mao, Huiying; Martin, Ryan; Reich, Brian J. J.
作者单位:North Carolina State University
摘要:Predicting the response at an unobserved location is a fundamental problem in spatial statistics. Given the difficulty in modeling spatial dependence, especially in nonstationary cases, model-based prediction intervals are at risk of misspecification bias that can negatively affect their validity. Here we present a new approach for model-free nonparametric spatial prediction based on the conformal prediction machinery. Our key observation is that spatial data can be treated as exactly or appro...
-
作者:Wu, Xiao; Mealli, Fabrizia; Kioumourtzoglou, Marianthi-Anna; Dominici, Francesca; Braun, Danielle
作者单位:Columbia University; University of Florence; University of Florence; European University Institute; Columbia University; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard University Medical Affiliates; Dana-Farber Cancer Institute
摘要:In the context of a binary treatment, matching is a well-established approach in causal inference. However, in the context of a continuous treatment or exposure, matching is still underdeveloped. We propose an innovative matching approach to estimate an average causal exposure-response function under the setting of continuous exposures that relies on the generalized propensity score (GPS). Our approach maintains the following attractive features of matching: a) clear separation between the des...