-
作者:Zhou, Doudou; Liu, Molei; Li, Mengyan; Cai, Tianxi
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; National University of Singapore; Columbia University; Bentley University; Harvard University; Harvard Medical School
摘要:Transfer learning is crucial for training models that generalize to unlabeled target populations using labeled source data, especially in real-world studies where label scarcity and covariate shift are common. While most research focuses on model estimation, there is limited literature on transfer inference for model accuracy despite its importance. We introduce a novel Doubly Robust Augmented Model Accuracy Transfer Inferen Ce (DRAMATIC) method for point and interval estimation of commonly us...
-
作者:Joseph, V. Roshan
作者单位:University System of Georgia; Georgia Institute of Technology
摘要:This article proposes a new kriging that has a rational form. It is shown that the generalized least squares estimator of the mean from rational kriging is much more well behaved than that of ordinary kriging. Parameter estimation and uncertainty quantification for rational kriging are proposed using a Gaussian process framework. A generalized version of rational kriging is also proposed, which includes ordinary and rational kriging as special cases. Extensive simulations carried out over a wi...
-
作者:Wang, Tengyao; Dobriban, Edgar; Gataric, Milana; Samworth, Richard J.
作者单位:University of London; London School Economics & Political Science; University of Pennsylvania; University of Cambridge; CRUK Cambridge Institute; Cancer Research UK
摘要:We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data. Our primary goal is to identify important variables for distinguishing between the classes; existing low-dimensional methods can then be applied for final class assignment. To this end, we score projections according to their class-distinguishing ability; for instance, motivated...
-
作者:Yang, Yachong; Kuchibhotla, Arun Kumar
作者单位:University of Pennsylvania; Carnegie Mellon University
摘要:Conformal prediction is a generic methodology for finite-sample valid distribution-free prediction. This technique has garnered a lot of attention in the literature partly because it can be applied with any machine learning algorithm that provides point predictions to yield valid prediction regions. Of course, the efficiency (width/volume) of the resulting prediction region depends on the performance of the machine learning algorithm. In the context of point prediction, several techniques (suc...
-
作者:Yuan, Yubai; Zhang, Yijiao; Shahbaba, Babak; Fortin, Norbert; Cooper, Keiland; Nie, Qing; Qu, Annie
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Fudan University; University of California System; University of California Irvine; University of California System; University of California Irvine; University of California System; University of California Irvine; University of California System; University of California Santa Barbara
摘要:Detecting dynamic patterns shared across heterogeneous datasets is a critical yet challenging task in many scientific domains, particularly within the biomedical sciences. Systematic heterogeneity inherent in diverse data sources can significantly hinder the effectiveness of existing machine learning methods in uncovering shared underlying dynamics. Additionally, practical and technical constraints in real-world experimental designs often limit data collection to only a small number of subject...
-
作者:Sun, Dayu; Sun, Zhuowei; Zhao, Xingqiu; Cao, Hongyuan
作者单位:Indiana University System; Indiana University Bloomington; Jilin University; Dalian Medical University; Hong Kong Polytechnic University; State University System of Florida; Florida State University
摘要:We study the transformed hazards model with time-dependent covariates observed intermittently for the censored outcome. Existing work assumes the availability of the whole trajectory of the time-dependent covariates, which is unrealistic. We propose combining kernel-weighted log-likelihood and sieve maximum log-likelihood estimation to conduct statistical inference. The method is robust and easy to implement. We establish the asymptotic properties of the proposed estimator and contribute to a ...
-
作者:Lyu, Zhongyuan; Chen, Ling; Gu, Yuqi
作者单位:Columbia University; Columbia University
摘要:The latent class model is a widely used mixture model for multivariate discrete data. Besides the existence of qualitatively heterogeneous latent classes, real data often exhibit additional quantitative heterogeneity nested within each latent class. The modern latent class analysis also faces extra challenges, including the high-dimensionality, sparsity, and heteroscedastic noise inherent in discrete data. Motivated by these phenomena, we introduce the Degree-heterogeneous Latent Class Model a...
-
作者:Bu, Qiushi; Liang, Hua; Zhang, Xinyu; Zou, Jiahui
作者单位:Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; Chinese Academy of Sciences; University of Chinese Academy of Sciences, CAS; George Washington University; Chinese Academy of Sciences; University of Science & Technology of China, CAS; Capital University of Economics & Business
摘要:Tensors have broad applications in neuroimaging, data mining, digital marketing, etc. CANDECOMP/PARAFAC (CP) tensor decomposition can effectively reduce the number of parameters to gain dimensionality-reduction and thus plays a key role in tensor regression. However, in CP decomposition, there is uncertainty about which rank to use. In this article, we develop a model averaging method to handle this uncertainty by weighting the estimators from candidate tensor regression models with different ...
-
作者:Zhai, Qingqing; Ye, Zhisheng; Li, Cheng; Revie, Matthew; Dunson, David
作者单位:Shanghai University; National University of Singapore; National University of Singapore; University of Strathclyde; Duke University
摘要:Many lifeline infrastructure systems consist of thousands of components configured in a complex directed network. Disruption of the infrastructure constitutes a recurrent failure process over a directed network. Statistical inference for such network recurrence data is challenging because of the large number of nodes with irregular connections among them. Motivated by 16 years of Scottish Water operation records, we propose a network Gamma-Poisson Autoregressive NHPP (GPAN) model for recurrent...
-
作者:Xia, Xintao; Zhang, Linjun; Cai, Zhanrui
作者单位:Iowa State University; Rutgers University System; Rutgers University New Brunswick; University of Hong Kong
摘要:Privacy preservation has become a critical concern in high-dimensional data analysis due to the growing prevalence of data-driven applications. Since its proposal, sliced inverse regression has emerged as a widely used statistical technique to reduce the dimensionality of covariates while maintaining sufficient statistical information. In this used, we propose optimally differentially private algorithms specifically designed to address privacy concerns in the context of sufficient dimension re...