-
作者:Cai, Zhibo; Xia, Yingcun; Hang, Weiqiang
作者单位:National University of Singapore; University of Electronic Science & Technology of China
摘要:Sufficient dimension reduction (SDR) has progressed steadily. However, its ability to improve general function estimation or classification has not been well received, especially for high-dimensional data. In this article, we first devise a local linear smoother for high dimensional nonparametric regression and then utilise it in the outer-product-of-gradient (OPG) approach of SDR. We call the method high-dimensional OPG (HOPG). To apply SDR to classification in high-dimensional data, we propo...
-
作者:Dai, Xiaowu; Li, Lexin
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Multimodal imaging has transformed neuroscience research. While it presents unprecedented opportunities, it also imposes serious challenges. Particularly, it is difficult to combine the merits of the interpretability attributed to a simple association model with the flexibility achieved by a highly adaptive nonlinear model. In this article, we propose an orthogonalized kernel debiased machine learning approach, which is built upon the Neyman orthogonality and a form of decomposition orthogonal...
-
作者:Nie, Lizhen; Rockova, Veronika
作者单位:University of Chicago; University of Chicago
摘要:The impracticality of posterior sampling has prevented the widespread adoption of spike-and-slab priors in high-dimensional applications. To alleviate the computational burden, optimization strategies have been proposed that quickly find local posterior modes. Trading off uncertainty quantification for computational speed, these strategies have enabled spike-and-slab deployments at scales that would be previously unfeasible. We build on one recent development in this strand of work: the Spike-...
-
作者:Yang, Ying; Yao, Fang
作者单位:Peking University
摘要:Functional data analysis has attracted considerable interest and is facing new challenges, one of which is the increasingly available data in a streaming manner. In this article we develop an online nonparametric method to dynamically update the estimates of mean and covariance functions for functional data. The kernel-type estimates can be decomposed into two sufficient statistics depending on the data-driven bandwidths. We propose to approximate the future optimal bandwidths by a sequence of...
-
作者:Insua, David Rios; Naveiro, Roi; Gallego, Victor; Poulos, Jason
作者单位:Consejo Superior de Investigaciones Cientificas (CSIC); CSIC - Instituto de Ciencias Matematicas (ICMAT); CUNEF Universidad; Harvard University; Harvard Medical School
摘要:Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting Machine Learning (ML) systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems. This creates a new class of security vulnerabilities that ML systems may face, and a new desirable property called adversarial robustness essential to trust operations based on ML outputs. Most work in AML is built upon a game-theoretic modeling of t...
-
作者:Xue, Haoran; Shen, Xiaotong; Pan, Wei
作者单位:University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities
摘要:Transcriptome-Wide Association Studies (TWAS) have recently emerged as a popular tool to discover (putative) causal genes by integrating an outcome GWAS dataset with another gene expression/transcriptome GWAS (called eQTL) dataset. In our motivating and target application, we'd like to identify causal genes for Low-Density Lipoprotein cholesterol (LDL), which is crucial for developing new treatments for hyperlipidemia and cardiovascular diseases. The statistical principle underlying TWAS is (t...
-
作者:Yao, Shunan; Rava, Bradley; Tong, Xin; James, Gareth
作者单位:University of Southern California; University of Southern California
摘要:Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which constrains the more severe type of error (e.g., the Type I error) under a preferred level while minimizing the other (e.g., the Type II error). However...
-
作者:Liang, Decai; Huang, Hui; Guan, Yongtao; Yao, Fang
作者单位:Nankai University; Sun Yat Sen University; University of Miami; Peking University
摘要:For spatially dependent functional data, a generalized Karhunen-Loeve expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separability, and propose a formal test to examine its validity for non-replicated spatially stationary funct...
-
作者:Fang, Ethan X.; Wang, Zhaoran; Wang, Lan
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Northwestern University; University of Miami; Duke University
摘要:There has recently been a surge on the methodological development for optimal individualized treatment rule (ITR) estimation. The standard methods in the literature are designed to maximize the potential average performance (assuming larger outcomes are desirable). A notable drawback of the standard approach, due to heterogeneity in treatment response, is that the estimated optimal ITR may be suboptimal or even detrimental to certain disadvantaged subpopulations. Motivated by the importance of...
-
作者:Hallin, Marc; Hlubinka, Daniel; Hudecova, Sarka
作者单位:Universite Libre de Bruxelles; Universite Libre de Bruxelles; Charles University Prague
摘要:Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified d-dimensional error density has remained an open problem for more than half a century. None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting. A concept of center-outward multivariate ranks and signs based on measure transportation ideas has been ...