-
作者:Linero, Antonio R.; Yang, Yun
作者单位:State University System of Florida; Florida State University; University of Illinois System; University of Illinois Urbana-Champaign
摘要:Ensembles of decision trees are a useful tool for obtaining flexible estimates of regression functions. Examples of these methods include gradient-boosted decision trees, random forests and Bayesian classification and regression trees. Two potential shortcomings of tree ensembles are their lack of smoothness and their vulnerability to the curse of dimensionality. We show that these issues can be overcome by instead considering sparsity inducing soft decision trees in which the decisions are tr...
-
作者:Zhu, Yunzhang; Li, Lexin
作者单位:University System of Ohio; Ohio State University; University of California System; University of California Berkeley
摘要:Matrix-valued data, where the sampling unit is a matrix consisting of rows and columns of measurements, are emerging in numerous scientific and business applications. Matrix Gaussian graphical models are a useful tool to characterize the conditional dependence structure of rows and columns. We employ non-convex penalization to tackle the estimation of multiple graphs from matrix-valued data under a matrix normal distribution. We propose a highly efficient non-convex optimization algorithm that...
-
作者:Zheng, Yao; Zhu, Qianqian; Li, Guodong; Xiao, Zhijie
作者单位:University of Hong Kong; Shanghai University of Finance & Economics; Boston College
摘要:Estimating conditional quantiles of financial time series is essential for risk management and many other financial applications. For time series models with conditional heteroscedasticity, although it is the generalized auto-regressive conditional heteroscedastic (GARCH) model that has the greatest popularity, quantile regression for this model usually gives rise to non-smooth non-convex optimization which may hinder its practical feasibility. The paper proposes an easy-to-implement hybrid qu...
-
作者:Fogarty, Colin B.
作者单位:Massachusetts Institute of Technology (MIT)
摘要:Although attractive from a theoretical perspective, finely stratified experiments such as paired designs suffer from certain analytical limitations that are not present in block-randomized experiments with multiple treated and control individuals in each block. In short, when using a weighted difference in means to estimate the sample average treatment effect, the traditional variance estimator in a paired experiment is conservative unless the pairwise average treatment effects are constant ac...
-
作者:Bloem-Reddy, Benjamin; Orbanz, Peter
作者单位:University of Oxford; Columbia University
摘要:We introduce a class of generative network models that insert edges by connecting the starting and terminal vertices of a random walk on the network graph. Within the taxonomy of statistical network models, this class is distinguished by permitting the location of a new edge to depend explicitly on the structure of the graph, but being nonetheless statistically and computationally tractable. In the limit of infinite walk length, the model converges to an extension of the preferential attachmen...
-
作者:Deligiannidis, George; Doucet, Arnaud; Pitt, Michael K.
作者单位:University of Oxford; University of London; King's College London
摘要:The pseudomarginal algorithm is a Metropolis-Hastings-type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedly by using Monte Carlo samples. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically...
-
作者:Liang, Faming; Jia, Bochao; Xue, Jingnan; Li, Qizhai; Luo, Ye
作者单位:Purdue University System; Purdue University; State University System of Florida; University of Florida; Chinese Academy of Sciences
摘要:Missing data are frequently encountered in high dimensional problems, but they are usually difficult to deal with by using standard algorithms, such as the expectation-maximization algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap: we propose a general algorithm for high dimensional missing data problems. The algorithm works by iterating between a...
-
作者:Liu, Yang; Liu, Yukun; Li, Pengfei; Qin, Jing
作者单位:East China Normal University; University of Waterloo; National Institutes of Health (NIH) - USA; NIH National Institute of Allergy & Infectious Diseases (NIAID)
摘要:Capture-recapture experiments are widely used cost-effective sampling techniques for estimating population sizes or abundances in biology, ecology, demography, epidemiology and reliability studies. For continuous time capture-recapture data, existing estimation methods are based on conditional likelihoods and an inverse weighting estimating equation. The corresponding Wald-type confidence intervals for the abundance may have severe undercoverage, and their lower limits can be below the number ...
-
作者:Xie, Jichun; Li, Ruosha
作者单位:Duke University; University of Texas System; University of Texas Health Science Center Houston
摘要:Motivated by gene coexpression pattern analysis, we propose a novel sample quantile contingency (SQUAC) statistic to infer quantile associations conditioning on covariates. It features enhanced flexibility in handling variables with both arbitrary distributions and complex association patterns conditioning on covariates. We first derive its asymptotic null distribution, and then develop a multiple-testing procedure based on the SQUAC statistic to test simultaneously the independence between on...
-
作者:Chown, Justin; Mueller, Ursula U.
作者单位:Ruhr University Bochum; Texas A&M University System; Texas A&M University College Station
摘要:Heteroscedastic errors can lead to inaccurate statistical conclusions if they are not properly handled. We introduce a test for heteroscedasticity for the non-parametric regression model with multiple covariates. It is based on a suitable residual-based empirical distribution function. The residuals are constructed by using local polynomial smoothing. Our test statistic involves a detection function' that can verify heteroscedasticity by exploiting just the independence-dependence structure be...