-
作者:Ward, Kes; Dilillo, Giuseppe; Eckley, Idris; Fearnhead, Paul
作者单位:Lancaster University; Istituto Nazionale Astrofisica (INAF); Lancaster University
-
作者:Ma, Ping; Chen, Yongkai; Lu, Haoran; Zhong, Wenxuan
作者单位:University System of Georgia; University of Georgia
摘要:With the rapid development of quantum computers, researchers have shown quantum advantages in physics-oriented problems. Quantum algorithms tackling computational biology problems are still lacking. In this article, we demonstrate the quantum advantage in analyzing CITE-seq data. CITE-seq, a single-cell technology, enables researchers to simultaneously measure expressions of RNA and surface protein detected by antibody-derived tags (ADTs) in the same cells. CITE-seq data hold tremendous potent...
-
作者:Pensia, Ankit; Jog, Varun; Loh, Po-Ling
作者单位:University of California System; University of California Berkeley; University of Cambridge
摘要:We study the problem of linear regression where both covariates and responses are potentially (i) heavy-tailed and (ii) adversarially contaminated. Several computationally efficient estimators have been proposed for the simpler setting where the covariates are sub-Gaussian and uncontaminated; however, these estimators may fail when the covariates are either heavy-tailed or contain outliers. In this work, we show how to modify the Huber regression, least trimmed squares, and least absolute devi...
-
作者:Cai, Tianxi; Li, Mengyan; Liu, Molei
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard Medical School; Bentley University; Columbia University
摘要:In this work, we propose a Semi-supervised Triply Robust Inductive transFer LEarning (STRIFLE) approach, which integrates heterogeneous data from a label-rich source population and a label-scarce target population and uses a large amount of unlabeled data simultaneously to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an imputation model, to combine transfer ...
-
作者:Ma, Shujie; Niu, Po-Yao; Zhang, Yichong; Zhu, Yinchu
作者单位:University of California System; University of California Riverside; Singapore Management University; Brandeis University
摘要:This article investigates statistical inference for noisy matrix completion in a semi-supervised model when auxiliary covariates are available. The model consists of two parts. One part is a low-rank matrix induced by unobserved latent factors; the other part models the effects of the observed covariates through a coefficient matrix which is composed of high-dimensional column vectors. We model the observational pattern of the responses through a logistic regression of the covariates, and allo...
-
作者:Chen, Canyi; Qiao, Nan; Zhu, Liping
作者单位:University of Michigan System; University of Michigan; Renmin University of China; Renmin University of China
摘要:This article concerns efficiently classifying high-dimensional data over decentralized networks. Penalized support vector machines (SVMs) are widely used for high-dimensional classification tasks. However, the double nonsmoothness of the objective function poses significant challenges in developing efficient decentralized learning methods. Existing approaches frequently suffer from slow, sublinear convergence rates. To address this issue, we consider a convolution-based smoothing technique for...
-
作者:Wang, Yibo; Lee, Sunghee; Elliott, Michael R.
作者单位:University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:Respondent-driven sampling (RDS) is widely used to collect data from hidden populations in social and biomedical science. Although RDS may provide comprehensive coverage of the target hidden population through social network recruitment, its nonrandom sampling process poses challenges for generalizing findings beyond the sample. Current analytical methods rely on the network size (degree) reported by respondents to adjust for unequal sampling probabilities. However, the accuracy of the reporte...
-
作者:Cai, Leheng; Guo, Xu; Lian, Heng; Zhu, Liping
作者单位:Tsinghua University; Beijing Normal University; City University of Hong Kong; Renmin University of China
摘要:High-dimensional penalized rank regression is a powerful tool for modeling high-dimensional data due to its robustness and estimation efficiency. However, the non-smoothness of the rank loss brings great challenges to the computation. To solve this critical issue, high-dimensional convoluted rank regression has been recently proposed, introducing penalized convoluted rank regression estimators. However, these developed estimators cannot be directly used to make inference. In this article, we i...
-
作者:Chi, Chien-Ming; Fan, Yingying; Ing, Ching-Kang; Lv, Jinchi
作者单位:Academia Sinica - Taiwan; University of Southern California; National Tsing Hua University
摘要:We make some initial attempt to establish the theoretical and methodological foundation for the model-X knockoffs inference for time series data. We suggest the method of time series knockoffs inference (TSKI) by exploiting the ideas of subsampling and e-values to address the difficulty caused by the serial dependence. We also generalize the robust knockoffs inference in Barber, Cand & egrave;s, and Samworth to the time series setting to relax the assumption of known covariate distribution req...
-
作者:Kamath, Gautam; Mouzakis, Argyris; Regehr, Matthew; Singhal, Vikrant; Steinke, Thomas; Ullman, Jonathan
作者单位:University of Waterloo; Alphabet Inc.; DeepMind; Northeastern University
摘要:Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and...