-
作者:Zhang, T; Yu, B
作者单位:International Business Machines (IBM); IBM USA; University of California System; University of California Berkeley
摘要:Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a te...
-
作者:Ishwaran, H; Rao, JS
作者单位:Cleveland Clinic Foundation; University System of Ohio; Case Western Reserve University
摘要:Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on t...
-
作者:James, LF
作者单位:Hong Kong University of Science & Technology
摘要:Suppose that P theta(g) is a linear functional of a Dirichlet process with shape theta H, where theta > 0 is the total mass and H is a fixed probability measure. This paper describes how one can use the well-known Bayesian prior to posterior analysis of the Dirichlet process, and a posterior calculus for Gamma processes to ascertain properties of linear functionals of Dirichlet processes. In particular, in conjunction with a Gamma identity, we show easily that a generalized Cauchy-Stieltjes tr...
-
作者:Gelman, A
作者单位:Columbia University
摘要:Analysis of variance (ANOVA) is an extremely important method in exploratory and confirmatory data analysis. Unfortunately, in complex problems (e.g., split-plot designs), it is not always easy to set up an appropriate ANOVA. We propose a hierarchical analysis that automatically gives the correct ANOVA comparisons even in complex scenarios. The inferences for all means and variances are performed under a model with a separate batch of effects for each row of the ANOVA table. We connect to clas...
-
作者:Yao, F; Müller, HG; Wang, JL
作者单位:Colorado State University System; Colorado State University Fort Collins; University of California System; University of California Davis
摘要:We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a disc...
-
作者:Koltchinskii, V; Panchenko, D
作者单位:University of New Mexico; Massachusetts Institute of Technology (MIT)
摘要:We introduce and study several measures of complexity of functions from the convex hull of a given base class. These complexity measures take into account the sparsity of the weights of a convex combination as well as certain clustering properties of the base functions involved in it. We prove new upper confidence bounds on the generalization error of ensemble (voting) classification algorithms that utilize the new complexity measures along with the empirical distributions of classification ma...
-
作者:Stute, W; Zhu, LX
作者单位:Justus Liebig University Giessen; University of Hong Kong
摘要:In this paper we study goodness-of-fit testing of single-index models. The large sample behavior of certain score-type test statistics is investigated. As a by-product, we obtain asymptotically distribution-free maximin tests for a large class of local alternatives. Furthermore, characteristic function based goodness-of-fit tests are proposed which are omnibus and able to detect peak alternatives. Simulation results indicate that the approximation through the limit distribution is acceptable a...
-
作者:Zhou, HH; Hwang, JTG
作者单位:Yale University; Cornell University
摘要:Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator based on the full model? The James-Stein estimator is such an estimator, having a property called minimaxity. However, the estimator considers only one reduced model, namely the origin. Hence it reduces...
-
作者:Cai, TT; Low, MG
作者单位:University of Pennsylvania
摘要:Adaptive estimation of linear functionals over a collection of parameter spaces is considered. A between-class modulus of continuity, a geometric quantity, is shown to be instrumental in characterizing the degree of adaptability over two parameter spaces in the same way that the usual modulus Of Continuity captures the minimax difficulty of estimation over a single parameter space. A general construction of optimally adaptive estimators based on an ordered modulus of continuity is given. The r...
-
作者:Ma, SG; Kosorok, MR
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison
摘要:We consider partly linear transformation models applied to current status data. The unknown quantities are the transformation function, a linear regression parameter and a nonparametric regression effect. It is shown that the penalized MLE for the regression parameter is asymptotically normal and efficient and converges at the parametric rate, although the penalized MLE for the transformation function and nonparametric regression effect are only n(1/3) consistent. Inference for the regression ...