-
作者:Jiang, Bo; Liu, Jun S.
作者单位:Harvard University
摘要:Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential variables under the general index model, in which the response is dependent of predictors through an unknown function of one or more linear combinations of them. Instead of building a predictive model of the response given combinations of predictors, we model the...
-
作者:Fithian, William; Hastie, Trevor
作者单位:Stanford University
摘要:For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic regression by adjusting the class balance locally in feature space via an accept reject scheme. Our method generalizes standard case-control sampling, using a pilot estimate to preferentially select examples, whose, responses are conditionally rare given their featu...
-
作者:He, Xu; Qian, Peter Z. G.
作者单位:Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; University of Wisconsin System; University of Wisconsin Madison
摘要:Orthogonal array based space-filling designs (Owen [Statist. Sinica 2(1992a) 439-452]; Tang [J. Amer. Statist. Assoc. 88 (1993) 1392-1397]) have become popular in computer experiments, numerical integration, stochastic optimization and uncertainty quantification. As improvements of ordinary Latin hypercube designs, these designs achieve stratification in multi-dimensions. If the underlying orthogonal array has strength t, such designs achieve uniformity up to t dimensions. Existing central lim...
-
作者:Liu, Weidong; Shao, Qi-Man
作者单位:Shanghai Jiao Tong University; Shanghai Jiao Tong University; Chinese University of Hong Kong
摘要:Applying the Benjamini and Hochberg (B H) method to multiple Student's t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N(0, 1), Student's t distribution t(n-1) or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimensio...
-
作者:Castillo, Ismael; Nickl, Richard
作者单位:Centre National de la Recherche Scientifique (CNRS); Sorbonne Universite; Universite Paris Cite; Centre National de la Recherche Scientifique (CNRS); University of Cambridge
摘要:We continue the investigation of Bernstein- von Mises theorems for non-parametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We introduce multiscale spaces on which nonparametric priors and posteriors are naturally defined, and prove Bernstein- von Mises theorems for a variety of priors in the setting of Gaussian nonparametric regression and in the i.i.d. sampling model. From these results we deduce several applications where posterior-based inference coincides with efficient f...
-
作者:Chernozhukov, Victor; Chetverikov, Dents; Kato, Kengo
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); University of California System; University of California Los Angeles; University of Tokyo
摘要:Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Gine and Nickl [Probab. Theory Related Fields 143 (2009) 569-596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the ...
-
作者:Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert
作者单位:Simon Fraser University; Stanford University; Carnegie Mellon University; Carnegie Mellon University; Stanford University
-
作者:Cheng, Ming-Yen; Honda, Toshio; Li, Jialiang; Peng, Heng
作者单位:National Taiwan University; Hitotsubashi University; National University of Singapore; Hong Kong Baptist University
摘要:Ultra-high dimensional longitudinal data are increasingly common and the analysis is challenging both theoretically and methodologically. We offer a new automatic procedure for finding a sparse semivarying coefficient model, which is widely accepted for longitudinal data analysis. Our proposed method first reduces the number of covariates to a moderate order by employing a screening procedure, and then identifies both the varying and constant coefficients using a group SCAD estimator, which is...
-
作者:Segers, Johan; van den Akker, Ramon; Werker, Bas J. M.
作者单位:Universite Catholique Louvain; Tilburg University
摘要:We propose, for multivariate Gaussian copula models with unknown margins and structured correlation matrices, a rank-based, semiparametrically efficient estimator for the Euclidean copula parameter. This estimator is defined as a one-step update of a rank-based pilot estimator in the direction of the efficient influence function, which is calculated explicitly. Moreover, finite-dimensional algebraic conditions are given that completely characterize efficiency of the pseudo-likelihood estimator...
-
作者:Bochkina, Natalia A.; Green, Peter J.
作者单位:University of Edinburgh; University of Bristol; University of Technology Sydney
摘要:We study the asymptotic behaviour of the posterior distribution in a broad class of statistical models where the true solution occurs on the boundary of the parameter space. We show that in this case Bayesian inference is consistent, and that the posterior distribution has not only Gaussian components as in the case of regular models (the Bernstein-von Mises theorem) but also has Gamma distribution components whose form depends on the behaviour of the prior distribution near the boundary and h...