-
作者:Moitra, Ankur; Wein, Alexander S.
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); University of California System; University of California Davis
摘要:We revisit the fundamental question of simple-versus-simple hypothesis testing with an eye toward computational complexity, as the statistically optimal likelihood ratio test is often computationally intractable in highdimensional settings. In the classical spiked Wigner model with a general i.i.d. spike prior, we show (conditional on a conjecture) that an existing test based on linear spectral statistics achieves the best possible trade-off curve between type-I and type-II error rates among a...
-
作者:Montanari, Andrea; Ruan, Feng; Sohn, Youngtak; Yan, Jun
作者单位:Stanford University; Stanford University; Northwestern University; Massachusetts Institute of Technology (MIT)
摘要:Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that map the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data (y(i),x(i)), i <= n are i.i.d. with x(i)similar to N(0,Sigma)xi similar to N(0,Sigma) a p-dimensional Gaussian fe...
-
作者:Bhattacharjee, Satarupa; Li, Bing; Xue, Lingzhou
作者单位:State University System of Florida; University of Florida; Florida State University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Random objects are complex non-Euclidean data taking values in general metric spaces, possibly devoid of any underlying vector space structure. Such data are becoming increasingly abundant with the rapid advancement in technology. Examples include probability distributions, positive semidefinite matrices and data on Riemannian manifolds. However, except for regression for object-valued response with Euclidean predictors and distributionon-distribution regression, there has been limited develop...
-
作者:Bhattacharya, Anirban; Pati, Debdeep; Yang, Yun
作者单位:Texas A&M University System; Texas A&M University College Station; University of Wisconsin System; University of Wisconsin Madison; University System of Maryland; University of Maryland College Park
摘要:As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is s...
-
作者:Leanthous, G.; Eorgiadis, A. G.; Epski, O. V.
作者单位:Maynooth University; Trinity College Dublin; Centre National de la Recherche Scientifique (CNRS); Aix-Marseille Universite
摘要:This is the second part of the research project initiated in Cleanthous, Georgiadis and Lepski (2024a). We deal with the problem of the adaptive estimation of the L-2-norm of a probability density on & Ropf;(d), d >= 1, from independent observations. The unknown density is assumed to be uniformly bounded by unknown constant and to belong to the union of balls in the isotropic/anisotropic Nikolskii's spaces. In Cleanthous, Georgiadis and Lepski (2024a), we have proved that the optimally adaptiv...
-
作者:Qiu, Jingkun; Chen, Song Xi; Shao, Qi-Man
作者单位:Peking University; Tsinghua University; Southern University of Science & Technology
摘要:Berry-Esseen type bounds for Gaussian approximation of standardized sums have been extensively studied under exponential type moment conditions. In this paper, a Cramer type moderate deviation theorem is established for self-normalized Gaussian approximation under finite moment conditions. More specifically, let X-1, X-2, ...,X-n be i.i.d. R-p-valued random vectors with zero means. Let Sn,j =Sigma(n)(i=1) Xij and V-2 (n,j) = Sigma(n)(i=1) X-2 (ij) . We show that if the correlation matrix of X-...
-
作者:Kotekal, Subhodh; Kundu, Soumyabrata
作者单位:University of Chicago
摘要:Heteroskedasticity testing in nonparametric regression is a classic statistical problem with important practical applications, yet fundamental limits are unknown. Adopting a minimax perspective, this article considers the testing problem in the context of an alpha-H & ouml;lder mean and a beta-H & ouml;lder variance function. For alpha > 0 and beta is an element of (0, 1/2), the sharp minimax separation rate n(-4 alpha) + n(-4 beta /(4 beta+1)) + n(-2 beta )is established. To achieve the minim...
-
作者:Zhao, Junlong; Liu, Xiumin; Du, Bin; Liu, Yufeng
作者单位:Beijing Normal University; Beijing Technology & Business University; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine
摘要:Converting a continuous variable into a discrete one is a commonly used technique for solving various problems in both statistics and machine learning. It is well known that discretizations result in biases. However, this issue has not been studied systematically. In this paper, a general framework is proposed to understand and compare the approximation errors of different slicing strategies. Poincar & eacute;-type inequalities are first established for univariate discretizations and then gene...
-
作者:Ignatiadis, Nikolaos; Sen, Bodhisattva
作者单位:University of Chicago; University of Chicago; Columbia University
摘要:A common task in high-throughput biology is to screen for associations across thousands of units of interest, for example, genes or proteins. Often, the data for each unit are modeled as Gaussian measurements with unknown mean and variance and are summarized as per-unit sample averages and sample variances. The downstream goal is multiple testing for the means. In this domain, it is routine to moderate (i.e., to shrink) the sample variances through parametric empirical Bayes methods before com...
-
作者:Amorino, Chiara; Gloter, Arnaud
作者单位:University of Luxembourg; Centre National de la Recherche Scientifique (CNRS); Universite Paris Saclay
摘要:Our research analyses the balance between maintaining privacy and preserving statistical accuracy when dealing with multivariate data that is subject to componentwise local differential privacy (CLDP). With CLDP, each component of the private data is made public through a separate privacy channel. This allows for varying levels of privacy protection for different components or for the privatization of each component by different entities, each with their own distinct privacy policies. It also ...