-
作者:Gu, Zhiling; Yu, Shan; Wang, Guannan; Wang, Lily
作者单位:Yale University; University of Virginia; George Mason University
摘要:Generative artificial intelligence (AI) has transformed the biomedical imaging field through image synthesis, addressing challenges of data availability, privacy, and diversity in biomedical research. This article proposes a novel nonparametric method within the functional data framework to discern significant differences between the mean and covariance functions of original and synthetic biomedical imaging data, thereby enhancing the fidelity and utility of synthetic data. Focusing on surface...
-
作者:Choi, Jungjun; Kwon, Hyukjun; Liao, Yuan
作者单位:University of Rhode Island; Princeton University; Rutgers University System; Rutgers University New Brunswick
摘要:This article studies the inference about linear functionals of high-dimensional low-rank matrices. While most existing inference methods would require consistent estimation of the true rank, our procedure is robust to rank misspecification, making it a promising approach in applications where rank estimation can be unreliable. We estimate the low-rank spaces using pre-specified weighting matrices, known as diversified projections. A novel statistical insight is that, unlike the usual statistic...
-
作者:Perry, Ronan; Panigrahi, Snigdha; Bien, Jacob; Witten, Daniela
作者单位:University of Washington; University of Washington Seattle; University of Michigan System; University of Michigan; University of Southern California; University of Washington; University of Washington Seattle
摘要:Principal component analysis (PCA) is a longstanding approach for dimension reduction. It rests upon the assumption that the underlying signal has low rank, and thus can be well-summarized using a small number of dimensions. The output of PCA is typically represented using a scree plot, which displays the proportion of variance explained (PVE) by each principal component. While the PVE is extensively reported in routine analyses, to the best of our knowledge the notion of inference on the PVE ...
-
作者:Kang, Seungwoo; Oh, Hee-Seok
作者单位:Seoul National University (SNU); Seoul National University (SNU)
摘要:A new measure, L-1 centrality, is proposed to assess the centrality of vertices in an undirected and connected graph. The proposed measure can adequately handle graphs with weights assigned to vertices and edges. This study provides tools for graphical and multiscale analysis based on the L-1 centrality. Specifically, the suggested analysis tools include the target plot, L-1 centrality-based neighborhood, and local L-1 centrality. Most importantly, our work is closely associated with the conce...
-
作者:Chen, Elynn; Chen, Xi; Jing, Wenbo; Zhang, Yichen
作者单位:New York University; Purdue University System; Purdue University
摘要:As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this article investigates tensor PCA within a distributed framework where direct data pooling is theoretically suboptimal or practically infeasible. We offer a comprehensive analysis of ...
-
作者:Li, Sai; Zhang, Linjun
作者单位:Renmin University of China; Rutgers University System; Rutgers University New Brunswick
摘要:In conventional statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data. However, this assumption does not always hold, especially in applications where the target population are not well-represented in the training data. This is a notable issue in health-related studies, where specific ethnic populations may be underrepresented, posing a significant challenge for researchers aiming to make statistical inferences a...
-
作者:Han, Larry; Hou, Jue; Cho, Kelly; Duan, Rui; Cai, Tianxi
作者单位:Harvard University; Northeastern University; University of Minnesota System; University of Minnesota Twin Cities; US Department of Veterans Affairs; Harvard University; Harvard Medical School
摘要:Federated learning of causal estimands may greatly improve estimation efficiency by leveraging data from multiple study sites, but robustness to heterogeneity and model misspecifications is vital for ensuring validity. We develop a Federated Adaptive Causal Estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a flexibly specified target population of interest. FACE accounts for site-level heterogeneity in the...
-
作者:Shao, Meijia; Xia, Dong; Zhang, Yuan
作者单位:Hong Kong University of Science & Technology; University System of Ohio; Ohio State University
摘要:U-statistics play central roles in many statistical learning tools but face the haunting issue of scalability. Despite extensive research on accelerating computation by U-statistic reduction, existing results almost exclusively focused on power analysis. Little work addresses risk control accuracy, which requires distinct and much more challenging techniques. In this article, we establish the first statistical inference procedure with provably higher-order accurate risk control for incomplete ...
-
作者:Zhang, Shuangjie; Shen, Yuning; Chen, Irene A.; Lee, Juhee
作者单位:University of California System; University of California Santa Cruz; University of California System; University of California Los Angeles
摘要:Group factor models have been developed to infer relationships between multiple co-occurring multivariate continuous responses. Motivated by complex count data from multi-domain microbiome studies using next-generation sequencing, we develop a sparse Bayesian group factor model (Sp-BGFM) for multiple count table data that captures the interaction between microorganisms in different domains. Sp-BGFM uses a rounded kernel mixture model using a Dirichlet process (DP) prior with log-normal mixture...
-
作者:Qiu, Yixuan; Gao, Qingyi; Wang, Xiao
作者单位:Shanghai University of Finance & Economics; Purdue University System; Purdue University
摘要:Generative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields. However, many data such as natural images usually do not populate the ambient Euclidean space but instead reside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncover the structure of the data, possibly resulting in mismatch of latent re...