-
作者:Sriperumbudur, Bharath K.; Sterge, Nicholas
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Kernel methods are powerful learning methodologies that allow to perform nonlinear data analysis. Despite their popularity, they suffer from poor scalability in big data scenarios. Various approximation methods, including random feature approximation, have been proposed to alleviate the problem. However, the statistical consistency of most of these approximate kernel methods is not well understood except for kernel ridge regression wherein it has been shown that the random feature approximatio...
-
作者:Yadlowsky, Steve; Namkoong, Hongseok; Basu, Sanjay; Duchi, John; Tian, Lu
作者单位:Alphabet Inc.; Google Incorporated; Columbia University; Stanford University; Stanford University
摘要:For observational studies, we study the sensitivity of causal inference when treatment assignments may depend on unobserved confounders. We develop a loss minimization approach for estimating bounds on the conditional average treatment effect (CATE) when unobserved confounders have a bounded effect on the odds ratio of treatment selection. Our approach is scalable and allows flexible use of model classes in estimation, including nonparametric and black-box machine learning methods. Based on th...
-
作者:Argiento, Raffaele; De Iorio, Maria
作者单位:University of Bergamo; National University of Singapore
摘要:Mixture models are one of the most widely used statistical tools when dealing with data from heterogeneous populations. Following a Bayesian nonparametric perspective, we introduce a new class of priors: the Normalized Independent Point Process. We investigate the probabilistic properties of this new class and present many special cases. In particular, we provide an explicit formula for the distribution of the implied partition, as well as the posterior characterization of the new process in t...
-
作者:Efromovich, Sam
作者单位:University of Texas System; University of Texas Dallas
摘要:It is well known that estimation of a bivariate cumulative distribution function of a pair of right censored lifetimes presents challenges unparalleled to the univariate case where a product-limit Kaplan-Meyer's methodology typically yields optimal estimation, and the literature on optimal estimation of the joint probability density is next to none. The paper, for the first time in the survival analysis literature, develops the theory and methodology of sharp minimax and adaptive nonparametric...
-
作者:Giordano, Matteo; Ray, Kolyan
作者单位:University of Cambridge; Imperial College London
摘要:We study nonparametric Bayesian models for reversible multidimensional diffusions with periodic drift. For continuous observation paths, reversibility is exploited to prove a general posterior contraction rate theorem for the drift gradient vector field under approximation-theoretic conditions on the induced prior for the invariant measure. The general theorem is applied to Gaussian priors and p-exponential priors, which are shown to converge to the truth at the optimal nonparametric rate over...
-
作者:Jeon, Jeong Min; Park, Byeong U.; Van Keilegom, Ingrid
作者单位:KU Leuven; Seoul National University (SNU)
摘要:This paper develops a foundation of methodology and theory for non-parametric regression with Lie group-valued predictors contaminated by measurement errors. Our methodology and theory are based on harmonic analysis on Lie groups, which is largely unknown in statistics. We establish a novel deconvolution regression estimator, and study its rate of convergence and asymptotic distribution. We also provide asymptotic confidence intervals based on the asymptotic distribution of the estimator and o...
-
作者:Lopes, Miles E.
作者单位:University of California System; University of California Davis
摘要:Nonasymptotic bounds for Gaussian and bootstrap approximation have recently attracted significant interest in high-dimensional statistics. This paper studies Berry-Esseen bounds for such approximations with respect to the multivariate Kolmogorov distance, in the context of a sum of n random vectors that are p-dimensional and i.i.d. Up to now, a growing line of work has established bounds with mild logarithmic dependence on p. However, the problem of developing corresponding bounds with near n(...
-
作者:Montanari, Andrea; Zhong, Yiqiao
作者单位:Stanford University; Stanford University
摘要:Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in t...
-
作者:Gao, Fengnan; Wang, Tengyao
作者单位:Fudan University; University of London; London School Economics & Political Science
摘要:We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first projecting the matrices of covariates and response vectors along directions that are complementary in sign in a subset of the coordinates, a process which we call complementary sketching. The resulting projected covariates and responses are aggregated to form two test statistics, which are shown to ...
-
作者:Barber, Rina Foygel; Janson, Lucas
作者单位:University of Chicago; Harvard University
摘要:Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distr...