-
作者:Dieuleveut, Aymeric; Durmus, Alain; Bach, Francis
作者单位:Institut Polytechnique de Paris; Ecole Polytechnique; Centre National de la Recherche Scientifique (CNRS); Universite Paris Saclay; Universite PSL; Ecole Normale Superieure (ENS); Inria; Centre National de la Recherche Scientifique (CNRS)
摘要:We consider the minimization of a strongly convex objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step size, as well as the lack of convergence in the general (nonquadra...
-
作者:Fang, Xiao; Li, Jian; Siegmund, David
作者单位:Chinese University of Hong Kong; Adobe Systems Inc.; Stanford University
摘要:To segment a sequence of independent random variables at an unknown number of change-points, we introduce new procedures that are based on thresholding the likelihood ratio statistic, and give approximations for the probability of a false positive error when there are no change-points. We also study confidence regions based on the likelihood ratio statistic for the change-points and joint confidence regions for the change-points and the parameter values. Applications to segment array CGH data ...
-
作者:Biau, Gerard; Cadre, Benoit; Sangnier, Maxime; Tanielian, Ugo
作者单位:Universite Paris Cite; Sorbonne Universite; Ecole Normale Superieure de Rennes (ENS Rennes); Universite de Rennes
摘要:Generative Adversarial Networks (GANs) are a class of generative algorithms that have been shown to produce state-of-the-art samples, especially in the domain of image creation. The fundamental principle of GANs is to approximate the unknown distribution of a given data set by optimizing an objective function through an adversarial game between a family of generators and a family of discriminators. In this paper, we offer a better theoretical understanding of GANs by analyzing some of their ma...
-
作者:Cattaneo, Matias D.; Farrell, Max H.; Feng, Yingjie
作者单位:Princeton University; University of Chicago; Princeton University
摘要:We present large sample results for partitioning-based least squares nonparametric regression, a popular method for approximating conditional expectation functions in statistics, econometrics and machine learning. First, we obtain a general characterization of their leading asymptotic bias. Second, we establish integrated mean squared error approximations for the point estimator and propose feasible tuning parameter selection. Third, we develop point-wise inference methods based on undersmooth...
-
作者:Johnstone, Iain M.; Onatski, Alexei
作者单位:Stanford University; University of Cambridge
摘要:We consider the five classes of multivariate statistical problems identified by James (Ann. Math. Stat. 35 (1964) 475-501), which together cover much of classical multivariate analysis, plus a simpler limiting case, symmetric matrix denoising. Each of James' problems involves the eigenvalues of E-1 H where H and E are proportional to high-dimensional Wishart matrices. Under the null hypothesis, both Wisharts are central with identity covariance. Under the alternative, the noncentrality or the ...
-
作者:Han, Lei; Tan, Kean Ming; Yang, Ting; Zhang, Tong
作者单位:Tencent; University of Michigan System; University of Michigan; Hong Kong University of Science & Technology; Hong Kong University of Science & Technology
摘要:A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. We propose a general subsampling scheme for large-scale multiclass logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller va...
-
作者:Kneip, Alois; Liebl, Dominik
作者单位:University of Bonn
摘要:We propose a new reconstruction operator that aims to recover the missing parts of a function given the observed parts. This new operator belongs to a new, very large class of functional operators which includes the classical regression operators as a special case. We show the optimality of our reconstruction operator and demonstrate that the usually considered regression operators generally cannot be optimal reconstruction operators. Our estimation theory allows for autocorrelated functional ...
-
作者:Alquier, Pierre; Ridgway, James
作者单位:RIKEN
摘要:While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family F. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been ap...
-
作者:Nickl, Richard; Ray, Kolyan
作者单位:University of Cambridge; University of London; King's College London
摘要:The problem of determining a periodic Lipschitz vector field b = (b(1),..., b(d)) from an observed trajectory of the solution (X-t : 0 <= t <= T) of the multi-dimensional stochastic differential equation dX(t) = b(X-t) dt + dW(t), t >= 0, where W-t is a standard d-dimensional Brownian motion, is considered. Convergence rates of a penalised least squares estimator, which equals the maxi-mum a posteriori (MAP) estimate corresponding to a high-dimensional Gaus-sian product prior, are derived. The...
-
作者:Shah, Rajen D.; Peters, Jonas
作者单位:University of Cambridge; University of Copenhagen
摘要:It is a common saying that testing for conditional independence, that is, testing whether whether two random vectors X and Y are independent, given Z, is a hard statistical problem if Z is a continuous random variable (or vector). In this paper, we prove that conditional independence is indeed a particularly difficult hypothesis to test for. Valid statistical tests are required to have a size that is smaller than a pre-defined significance level, and different tests usually have power against ...