-
作者:Tian, Zhiyi; Xu, Jiaming; Tang, Jen
作者单位:IQVIA; Duke University; Purdue University System; Purdue University
摘要:Clustering is a widely used unsupervised learning technique that groups data into homogeneous clusters. However, when dealing with real-world data that contain categorical values, existing algorithms can be computationally costly in high dimensions and can struggle with noisy data that has missing values. Furthermore, except for one algorithm, no others provide theoretical guarantees of clustering accuracy. In this article, we propose a general categorical data encoding method and a computatio...
-
作者:Zhou, Doudou; Zhang, Yufeng; Sonabend-W, Aaron; Wang, Zhaoran; Lu, Junwei; Cai, Tianxi
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; Northwestern University; Harvard University; Harvard Medical School
摘要:Evidence-based or data-driven dynamic treatment regimes are essential for personalized medicine, which can benefit from offline reinforcement learning (RL). Although massive healthcare data are available across medical institutions, they are prohibited from sharing due to privacy constraints. Besides, heterogeneity exists in different sites. As a result, federated offline RL algorithms are necessary and promising to deal with the problems. In this article, we propose a multi-site Markov decisi...
-
作者:Zhang, Jingnan; Wang, Junhui; Wang, Xueqin
作者单位:Chinese Academy of Sciences; University of Science & Technology of China, CAS; Chinese University of Hong Kong
摘要:Community detection in multi-layer networks, which aims at finding groups of nodes with similar connective patterns among all layers, has attracted tremendous interests in multi-layer network analysis. Most existing methods are extended from those for single-layer networks, which assume that different layers are independent. In this article, we propose a novel community detection method in multi-layer networks with inter-layer dependence, which integrates the stochastic block model (SBM) and t...
-
作者:Wu, Xiaoyang; Huo, Yuyang; Ren, Haojie; Zou, Changliang
作者单位:Nankai University; Shanghai Jiao Tong University
摘要:In the big data era, subsampling or sub-data selection techniques are often adopted to extract a fraction of informative individuals from the massive data. Existing subsampling algorithms focus mainly on obtaining a representative subset to achieve the best estimation accuracy under a given class of models. In this article, we consider a semi-supervised setting wherein a small or moderate sized labeled data is available in addition to a much larger sized unlabeled data. The goal is to sample f...
-
作者:Chen, Elynn Y.; Song, Rui; Jordan, Michael I.
作者单位:New York University; Amazon.com; University of California System; University of California Berkeley
摘要:Reinforcement Learning holds great promise for data-driven decision-making in various social contexts, including healthcare, education, and business. However, classical methods that focus on the mean of the total return may yield misleading results when dealing with heterogeneous populations typically found in large-scale datasets. To address this issue, we introduce the K-Value Heterogeneous Markov Decision Process, a framework designed to handle sequential decision problems with latent popul...
-
作者:Li, Jingyi Jessica; Zhou, Heather J.; Bickel, Peter J.; Tong, Xin
作者单位:University of California System; University of California Los Angeles; University of California System; University of California Berkeley; University of Southern California
摘要:Motivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize the squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies the line memberships. We construct the generalized Pearson correlation squares by focusing on three aspects: variable exchangeability, no parametric model assumptions, and inference of population-level parameters. To com...
-
作者:Ben-Michael, Eli; Imai, Kosuke; Jiang, Zhichao
作者单位:Carnegie Mellon University; Carnegie Mellon University; Harvard University; Sun Yat Sen University
摘要:Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. ...
-
作者:Wang, Zihang; Gaynanova, Irina; Aravkin, Aleksandr; Risk, Benjamin B.
作者单位:Emory University; University of Michigan System; University of Michigan; University of Washington; University of Washington Seattle
摘要:Independent component analysis (ICA) is widely used to estimate spatial resting-state networks and their time courses in neuroimaging studies. It is thought that independent components correspond to sparse patterns of co-activating brain locations. Previous approaches for introducing sparsity to ICA replace the non-smooth objective function with smooth approximations, resulting in components that do not achieve exact zeros. We propose a novel Sparse ICA method that enables sparse estimation of...
-
作者:Li, Haoran; Aue, Alexander; Paul, Debashis; Peng, Jie
作者单位:Auburn University System; Auburn University; University of California System; University of California Davis
摘要:We consider the problem of testing linear hypotheses under a multivariate regression model with a high-dimensional response and spiked noise covariance. The proposed family of tests consists of test statistics based on a weighted sum of projections of the data onto the estimated latent factor directions, with the weights acting as the regularization parameters. We establish asymptotic normality of the test statistics under the null hypothesis. We also establish the power characteristics of the...
-
作者:Chen, Xi; Lai, Zehua; Li, He; Zhang, Yichen
作者单位:New York University; University of Chicago; Purdue University System; Purdue University
摘要:This article investigates the problem of online statistical inference of model parameters in stochastic optimization problems via the Kiefer-Wolfowitz algorithm with random search directions. We first present the asymptotic distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators, whose asymptotic covariance matrices depend on the distribution of search directions and the function-value query complexity. The distributional result reflects the tradeoff between statis...