-
作者:Tan, Kean Ming; Sun, Qiang; Witten, Daniela
作者单位:University of Michigan System; University of Michigan; University of Toronto; University of Washington; University of Washington Seattle
摘要:We propose a sparse reduced rank Huber regression for analyzing large and complex high-dimensional data with heavy-tailed random noise. The proposed method is based on a convex relaxation of a rank-and sparsity-constrained nonconvex optimization problem, which is then solved using a block coordinate descent and an alternating direction method of multipliers algorithm. We establish nonasymptotic estimation error bounds under both Frobenius and nuclear norms in the high-dimensional setting. This...
-
作者:Chevallier, Augustin; Fearnhead, Paul; Sutton, Matthew
作者单位:Lancaster University; Queensland University of Technology (QUT)
摘要:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), has recently shown great promise: they are nonreversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selecti...
-
作者:Li, Zhu; Su, Weijie J.; Sejdinovic, Dino
作者单位:University of London; University College London; University of Pennsylvania; University of Oxford
摘要:Modern machine learning models often exhibit the benign overfitting phenomenon, which has recently been characterized using the double descent curves. In addition to the classical U-shaped learning curve, the learning risk undergoes another descent as we increase the number of parameters beyond a certain threshold. In this article, we examine the conditions under which benign overfitting occurs in the random feature (RF) models, that is, in a two-layer neural network with fixed first layer wei...
-
作者:Chen, Yinyin; He, Shishuang; Yang, Yun; Liang, Feng
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal theoretical investigation of the statistical identifiability and accuracy of latent topic estimation. In this article, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood that is naturally connected to the...
-
作者:Deng, Hang; Han, Qiyang; Sen, Bodhisattva
作者单位:Rutgers University System; Rutgers University New Brunswick; Columbia University
摘要:In this article, we develop automated inference methods for local parameters in a collection of convexity constrained models based on the natural constrained tuning-free estimators. A canonical example is given by the univariate convex regression model, in which automated inference is drawn for the function value, the function derivative at a fixed interior point, and the anti-mode of the convex regression function, based on the widely used tuning-free, piecewise linear convex least squares es...
-
作者:Nishimura, Akihiko; Suchard, Marc A.
作者单位:Johns Hopkins University; University of California System; University of California Los Angeles
摘要:In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10(5)-10(6) and of 10(4) -10(5). Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the large n and large psetting, however, the required poster...
-
作者:Papadogeorgou, Georgia; Bello, Carolina; Ovaskainen, Otso; Dunson, David B.
作者单位:State University System of Florida; University of Florida; Swiss Federal Institutes of Technology Domain; ETH Zurich; University of Jyvaskyla; University of Helsinki; Norwegian University of Science & Technology (NTNU); Duke University
摘要:Reductions in natural habitats urge that we better understand species' interconnection and how biological communities respond to environmental changes. However, ecological studies of species' interactions are limited by their geographic and taxonomic focus which can distort our understanding of interaction dynamics. We focus on bird-plant interactions that refer to situations of potential fruit consumption and seed dispersal. We develop an approach for predicting species' interactions that acc...
-
作者:Frazier, David T.; Nott, David J.; Drovandi, Christopher; Kohn, Robert
作者单位:Monash University; National University of Singapore; National University of Singapore; University of Queensland; University of New South Wales Sydney
摘要:Implementing Bayesian inference is often computationally challenging in complex models, especially when calculating the likelihood is difficult. Synthetic likelihood is one approach for carrying out inference when the likelihood is intractable, but it is straightforward to simulate from the model. The method constructs an approximate likelihood by taking a vector summary statistic as being multivariate normal, with the unknown mean and covariance estimated by simulation. Previous research demo...
-
作者:Tang, Weijing; He, Kevin; Xu, Gongjun; Zhu, Ji
作者单位:University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:This article introduces an Ordinary Differential Equation (ODE) notion for survival analysis. The ODE notion not only provides a unified modeling framework, but more importantly, also enables the development of a widely applicable, scalable, and easy-to-implement procedure for estimation and inference. Specifically, the ODE modeling framework unifies many existing survival models, such as the proportional hazards model, the linear transformation model, the accelerated failure time model, and t...
-
作者:Dai, Chenguang; Lin, Buyu; Xing, Xin; Liu, Jun S.
作者单位:Harvard University; Virginia Polytechnic Institute & State University
摘要:Selecting relevant features associated with a given response variable is an important problem in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent interest. This article introduces a data-splitting method (referred to as DS) to asymptotically control the FDR while maintaining a high power. For each feature, DS constructs a test statistic by estimating two independent regression coefficients via data split...