-
作者:Sriperumbudur, Bharath K.; Sterge, Nicholas
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Kernel methods are powerful learning methodologies that allow to perform nonlinear data analysis. Despite their popularity, they suffer from poor scalability in big data scenarios. Various approximation methods, including random feature approximation, have been proposed to alleviate the problem. However, the statistical consistency of most of these approximate kernel methods is not well understood except for kernel ridge regression wherein it has been shown that the random feature approximatio...
-
作者:Donoho, David L.; Kipnis, Alon
作者单位:Stanford University; Reichman University
摘要:We adapt Higher Criticism (HC) to the comparison of two frequency tables which may-or may not-exhibit moderate differences between the tables in some unknown, relatively small subset out of a large number of categories. Our analysis of the power of the proposed HC test quantifies the rarity and size of assumed differences and applies moderate deviations-analysis to determine the asymptotic powerfulness/powerlessness of our proposed HC procedure. Our analysis considers the null hypothesis of no...
-
作者:Rodriguez-Casal, Alberto; Saavedra-Nieves, Paula
作者单位:Universidade de Santiago de Compostela; Universidade de Santiago de Compostela
摘要:Given a random sample of points from some unknown density, we propose a method for estimating density level sets, for a positive threshold t, under the r-convexity assumption. This shape condition generalizes the convexity property and allows to consider level sets with more than one connected component. The main problem in practice is that r is an unknown geometric characteristic of the set related to its curvature, which may depend on t. A stochastic algorithm is proposed for selecting its v...
-
作者:Li, Bing; Song, Jun
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Korea University
摘要:We develop a general theory and estimation methods for functional linear sufficient dimension reduction, where both the predictor and the response can be random functions, or even vectors of functions. Unlike the existing dimension reduction methods, our approach does not rely on the estimation of conditional mean and conditional variance. Instead, it is based on a new statistical construction-the weak conditional expectation, which is based on Carleman operators and their inducing functions. ...
-
作者:Pananjady, Ashwin; Samworth, Richard J.
作者单位:University System of Georgia; Georgia Institute of Technology; University System of Georgia; Georgia Institute of Technology; University of Cambridge
摘要:Motivated by models for multiway comparison data, we consider the problem of estimating a coordinatewise isotonic function on the domain [0, 1](d) from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivariate case d >= 3. We study both the minimax risk of estimation (in empirical L-2 loss) and the fundam...
-
作者:Yadlowsky, Steve; Namkoong, Hongseok; Basu, Sanjay; Duchi, John; Tian, Lu
作者单位:Alphabet Inc.; Google Incorporated; Columbia University; Stanford University; Stanford University
摘要:For observational studies, we study the sensitivity of causal inference when treatment assignments may depend on unobserved confounders. We develop a loss minimization approach for estimating bounds on the conditional average treatment effect (CATE) when unobserved confounders have a bounded effect on the odds ratio of treatment selection. Our approach is scalable and allows flexible use of model classes in estimation, including nonparametric and black-box machine learning methods. Based on th...
-
作者:Guo, Zijian; Cevid, Domagoj; Buhlmann, Peter
作者单位:Rutgers University System; Rutgers University New Brunswick; Swiss Federal Institutes of Technology Domain; ETH Zurich
摘要:Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the doubly debiased lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bi...
-
作者:Dalalyan, Arnak S.; Minasyan, Arshak
作者单位:Institut Polytechnique de Paris; ENSAE Paris; Yerevan State University
摘要:The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time, which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to 0.5, a...
-
作者:Lee, Kuang-Yao; Li, Lexin
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; University of California System; University of California Berkeley
摘要:Sufficient dimension reduction (SDR) embodies a family of methods that aim for reduction of dimensionality without loss of information in a regression setting. In this article, we propose a new method for nonparametric function-on-function SDR, where both the response and the predictor are a function. We first develop the notions of functional central mean subspace and functional central subspace, which form the population targets of our functional SDR. We then introduce an average Frechet der...
-
作者:Beyhum, Jad; El Ghouch, Anouar; Portier, Francois; Van Keilegom, Ingrid
作者单位:KU Leuven; Universite Catholique Louvain; IMT - Institut Mines-Telecom; Institut Polytechnique de Paris; Telecom Paris; Ecole Nationale de la Statistique et de l'Analyse de l'Information (ENSAI); Institut Polytechnique de Paris; ENSAE Paris
摘要:We consider the problem of estimating the distribution of time-to-event data that is subject to censoring and for which the event of interest might never occur, that is, some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model (Stochastic Models of Tumor Latency and Their Biostatistical Applications (1996) World Scientific), which adapts the Cox model to the presence of cured subjects. Estimatin...