-
作者:Ehrlinger, John; Ishwaran, Hemant
作者单位:Cleveland Clinic Foundation; University of Miami
摘要:We consider L(2)Boosting, a special case of Friedman's generic boosting algorithm applied to linear regression under L-2-loss. We study L(2)Boosting for an arbitrary regularization parameter and derive an exact closed form expression for the number of steps taken along a fixed coordinate direction. This relationship is used to describe L(2)Boosting's solution path, to describe new tools for studying its path, and to characterize some of the algorithm's unique properties, including active set c...
-
作者:Lecue, Guillaume; Mendelson, Shahar
作者单位:Universite Paris-Est-Creteil-Val-de-Marne (UPEC); Universite Gustave-Eiffel; Centre National de la Recherche Scientifique (CNRS); Technion Israel Institute of Technology
摘要:We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures b...
-
作者:Wang, Qiying; Phillips, Peter C. B.
作者单位:University of Sydney; Yale University
摘要:We provide a limit theory for a general class of kernel smoothed U-statistics that may be used for specification testing in time series regression with nonstationary data. The test framework allows for linear and nonlinear models with endogenous regressors that have autoregressive unit roots or near unit roots. The limit theory for the specification test depends on the self-intersection local time of a Gaussian process. A new weak convergence result is developed for certain partial sums of fun...
-
作者:Comminges, Laetitia; Dalalyan, Arnak S.
作者单位:Universite Gustave-Eiffel; Institut Polytechnique de Paris; Ecole Nationale des Ponts et Chaussees; Institut Polytechnique de Paris; ENSAE Paris
摘要:We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension d. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions rel...
-
作者:VanderWeele, Tyler J.; Richardson, Thomas S.
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; University of Washington; University of Washington Seattle
摘要:The sufficient-component cause framework assumes the existence of sets of sufficient causes that bring about an event. For a binary outcome and an arbitrary number of binary causes any set of potential outcomes can be replicated by positing a set of sufficient causes; typically this representation is not unique. A sufficient cause interaction is said to be present if within all representations there exists a sufficient cause in which two or more particular causes are all present. A singular in...
-
作者:Jing, Bing-Yi; Kong, Xin-Bing; Liu, Zhi
作者单位:Hong Kong University of Science & Technology; Fudan University; Xiamen University
摘要:It is generally accepted that the asset price processes contain jumps. In fact, pure jump models have been widely used to model asset prices and/or stochastic volatilities. The question is: is there any statistical evidence from the high-frequency financial data to support using pure jump models alone? The purpose of this paper is to develop such a statistical test against the necessity of a diffusion component. The test is very simple to use and yet effective. Asymptotic properties of the pro...
-
作者:Dawid, Philip; Lauritzen, Steffen; Parry, Matthew
作者单位:University of Cambridge; University of Oxford; University of Otago
摘要:A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space X is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quo...
-
作者:Belitser, Eduard; Ghosal, Subhashis; van Zanten, Harry
作者单位:Eindhoven University of Technology; North Carolina State University; University of Amsterdam
摘要:We propose a two-stage procedure for estimating the location mu and size M of the maximum of a smooth d-variate regression function f. In the first stage, a preliminary estimator of mu obtained from a standard nonparametric smoothing method is used. At the second stage, we zoom-in near the vicinity of the preliminary estimator and make further observations at some design points in that vicinity. We fit an appropriate polynomial regression model to estimate the location and size of the maximum....
-
作者:Spokoiny, Vladimir
作者单位:Leibniz Association; Weierstrass Institute for Applied Analysis & Stochastics; Humboldt University of Berlin; Moscow Institute of Physics & Technology
摘要:The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are as follows: (1) the study is nonasymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family. These two features enable to bridge the gap between parametric and nonparametric theory and to build a uni...
-
作者:Amini, Arash A.; Wainwright, Martin J.
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:We consider the sampling problem for functional PCA (fPCA), where the simplest example is the case of taking time samples of the underlying functional components. More generally, we model the sampling operation as a continuous linear map from H to R-m, where the functional components to lie in some Hilbert subspace H of L-2, such as a reproducing kernel Hilbert space of smooth functions. This model includes time and frequency sampling as special cases. In contrast to classical approach in fPCA...