-
作者:Nadler, Boaz
作者单位:Weizmann Institute of Science
摘要:Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n -> infinity. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the...
-
作者:Wang, Lie; Brown, Lawrence D.; Cai, T. Tony; Levine, Michael
作者单位:University of Pennsylvania; Purdue University System; Purdue University
摘要:Variance function estimation in nonparametric regression is considered and the minimax rate of convergence is derived. We are particularly interested in the effect of the unknown mean on the estimation of the variance function. Our results indicate that, contrary to the common practice, it is not desirable to base the estimator of the variance function on the residuals from an optimal estimator of the mean when the mean function is not smooth. Instead it is more desirable to use estimators of ...
-
作者:Yu, Kyusang; Park, Byeong U.; Mammen, Enno
作者单位:University of Mannheim; Seoul National University (SNU)
摘要:Generalized additive models have been popular among statisticians and data analysts in multivariate nonparametric regression with non-Gaussian responses including binary and count data. In this paper, a new likelihood approach for fitting generalized additive models is proposed. It aims to maximize a smoothed likelihood. The additive functions are estimated by solving a system of nonlinear integral equations. An iterative algorithm based on smooth backfitting is developed from the Newton-Kanto...
-
作者:Robinson, P. M.
作者单位:University of London; London School Economics & Political Science
摘要:Moving from univariate to bivariate jointly dependent long-memory time series introduces a phase parameter (gamma), at the frequency of principal interest. zeros for short-memory series gamma = 0 automatically. The latter case has also been stressed under long memory, along with the fractional differencing case gamma = (delta(2) - delta(1))pi/2, where delta(1), delta(2) are the memory parameters of the two series. We develop time domain conditions under which these are and are not relevant, an...
-
作者:Li, Jun; Liu, Regina Y.
作者单位:University of California System; University of California Riverside; Rutgers University System; Rutgers University New Brunswick
摘要:This paper introduces and studies multivariate spacings. The spacings are developed using the order statistics derived from data depth. Specifically, the spacing between two consecutive order statistics is the region which bridges the two order statistics, in the sense that the region contains all the points whose depth values fall between the depth values of the two consecutive order statistics. These multivariate spacings can be viewed as a data-driven realization of the so-called statistica...
-
作者:Zou, Hui; Yuan, Ming
作者单位:University of Minnesota System; University of Minnesota Twin Cities; University System of Georgia; Georgia Institute of Technology
摘要:Coefficient estimation and variable selection in multiple linear regression is routinely done in the (penalized) least squares (LS) framework. The concept of model selection oracle introduced by Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348-1360] characterizes the optimal behavior of a model selection procedure. However, the least-squares oracle theory breaks down if the error variance is infinite. In the current paper we propose a new regression method called composite quantile regressi...
-
作者:Groeneboom, Piet; Maathuis, Marloes H.; Wellner, Jon A.
作者单位:Delft University of Technology; Swiss Federal Institutes of Technology Domain; ETH Zurich; University of Washington; University of Washington Seattle; Vrije Universiteit Amsterdam
摘要:We study nonparametric estimation for current status data with competing risks. Our main interest is in the nonparametric maximum likelihood estimator (MLE), and for comparison we also consider a simpler naive estimator. Groeneboom, Maathuis and Wellner [Ann. Statist. (2008) 36 10311063] proved that both types of estimators converge globally and locally at rate n(1/3). We use these results to derive the local limiting distributions of the estimators. The limiting distribution of the naive esti...
-
作者:Anderes, Ethan B.; Stein, Michael L.
作者单位:University of California System; University of California Berkeley; University of Chicago
摘要:This paper presents a new approach to the estimation of the deformation of an isotropic Gaussian random field on R-2 based on dense observations of a single realization of the deformed random field. Under this framework we investigate the identification and estimation of deformations. We then present a complete methodological package-from model assumptions to algorithmic recovery of the deformation-for the class of nonstationary processes obtained by deforming isotropic Gaussian random fields.
-
作者:Leonenko, Nikolai; Pronzat, Luc; Savani, Vippal
作者单位:Cardiff University; Centre National de la Recherche Scientifique (CNRS); Universite Cote d'Azur
摘要:A class of estimators of the Renyi and Tsallis entropies of an unknown distribution f in R-m is presented. These estimators are based on the kth nearest-neighbor distances computed from a sample of N i.i.d. vectors with distribution f. We show that entropies of any order q, including Shannon's entropy, can be estimated consistently with minimal assumptions on f. Moreover, we show that it is straightforward to extend the nearest-neighbor method to estimate the statistical distance between two d...
-
作者:Zhou, Jianhui; He, Xuming
作者单位:University of Virginia; University of Illinois System; University of Illinois Urbana-Champaign
摘要:The curse of dimensionality has remained a challenge for high-dimensional data analysis in statistics'. The sliced inverse regression (SIR) and canonical correlation (CANCOR) methods aim to reduce the dimensionality of data by replacing the explanatory variables with a small number of composite directions without losing much information. However, the estimated composite directions generally involve all of the variables, making their interpretation difficult. To simplify the direction estimates...