-
作者:Yao, Zhigang; Zhang, Zhenyue
作者单位:National University of Singapore; Zhejiang University
摘要:We consider the classification problem and focus on nonlinear methods for classification on manifolds. For multivariate datasets lying on an embedded nonlinear Riemannian manifold within the higher-dimensional ambient space, we aim to acquire a classification boundary for the classes with labels, using the intrinsic metric on the manifolds. Motivated by finding an optimal boundary between the two classes, we invent a novel approach-the principal boundary. From the perspective of classification...
-
作者:Li, Meng; Dunson, David B.
作者单位:Rice University; Duke University
摘要:We propose a new approach for assigning weights to models using a divergence-based method (D-probabilities), relying on evaluating parametric models relative to a nonparametric Bayesian reference using Kullback-Leibler divergence. D-probabilities are useful in goodness-of-fit assessments, in comparing imperfect models, and in providing model weights to be used in model aggregation. D-probabilities avoid some of the disadvantages of Bayesian model probabilities, such as large sensitivity to pri...
-
作者:Shen, Cencheng; Priebe, Carey E.; Vogelstein, Joshua T.
作者单位:University of Delaware; Johns Hopkins University; Johns Hopkins University; Johns Hopkins University
摘要:Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation (Dcorr)-a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments-to the multiscale graph correl...
-
作者:Zhao, Yichuan
作者单位:University System of Georgia; Georgia State University
-
作者:Sun, Qiang; Zhou, Wen-Xin; Fan, Jianqing
作者单位:University of Toronto; University of California System; University of California San Diego; Fudan University; Princeton University
摘要:Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded t...
-
作者:Guan, Leying; Chen, Xi; Wong, Wing Hung
作者单位:Stanford University; Stanford University
摘要:The perturbation of a transcription factor should affect the expression levels of its direct targets. However, not all genes showing changes in expression are direct targets. To increase the chance of detecting direct targets, we propose a modified two-group model where the null group corresponds to genes which are not direct targets, but can have small nonzero effects. We model the behavior of genes from the null set by a Gaussian distribution with unknown variance . To estimate , we focus on...
-
作者:Luo, Shan; Chen, Zehua
作者单位:Shanghai Jiao Tong University; National University of Singapore
摘要:High-dimensional multiresponse models with complex group structures in both the response variables and the covariates arise from current researches in important fields such as genetics and medicine. However, no enough research has been done on such models. One of a few researches, if not the only one, is the article by Li, Nan, and Zhu where the sparse group Lasso approach is extended to such models. In this article, we propose a novel approach named the sequential canonical correlation search...
-
作者:Florens, Jean-Pierre; Simar, Leopold; Van Keilegom, Ingrid
作者单位:Universite de Toulouse; Universite Toulouse 1 Capitole; Toulouse School of Economics; Universite Catholique Louvain; KU Leuven
摘要:Consider the model with , where tau is an unknown constant (the boundary of X), Z is a random variable defined on , epsilon is a symmetric error, and epsilon and Z are independent. Based on an iid sample of Y, we aim at identifying and estimating the boundary tau when the law of epsilon is unknown (apart from symmetry) and in particular its variance is unknown. We propose an estimation procedure based on a minimal distance approach and by making use of Laguerre polynomials. Asymptotic results ...
-
作者:Shu, Hai; Wang, Xiao; Zhu, Hongtu
作者单位:University of Texas System; UTMD Anderson Cancer Center; Purdue University System; Purdue University; University of North Carolina; University of North Carolina Chapel Hill
摘要:A typical approach to the joint analysis of two high-dimensional datasets is to decompose each data matrix into three parts: a low-rank common matrix that captures the shared information across datasets, a low-rank distinctive matrix that characterizes the individual information within a single dataset, and an additive noise matrix. Existing decomposition methods often focus on the orthogonality between the common and distinctive matrices, but inadequately consider the more necessary orthogona...
-
作者:Leng, Ling
作者单位:Amazon.com