-
作者:He, Jingyu; Hahn, P. Richard
作者单位:City University of Hong Kong; Arizona State University; Arizona State University-Tempe
摘要:This article develops a novel stochastic tree ensemble method for nonlinear regression, referred to as accelerated Bayesian additive regression trees, or XBART. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning algorithms, XBART attains state-of-the-art performance at prediction and function estimation. Simulation studies demonstrate that XBART provides accurate point-wise estimates of the m...
-
作者:Dai, Chenguang; Lin, Buyu; Xing, Xin; Liu, Jun S.
作者单位:Harvard University; Virginia Polytechnic Institute & State University
摘要:The Generalized Linear Model (GLM) has been widely used in practice to model counts or other types of non- Gaussian data. This article introduces a framework for feature selection in the GLM that can achieve robust False Discovery Rate (FDR) control. The main idea is to construct a mirror statistic based on data perturbation to measure the importance of each feature. FDR control is achieved by taking advantage of themirror statistic's property that its sampling distribution is (asymptotically)...
-
作者:Tian, Ye; Feng, Yang
作者单位:Columbia University; New York University
摘要:In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its l(1)/l(2)-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and sources are sufficiently close to each other, these boun...
-
作者:Li, Lexin; Zeng, Jing; Zhang, Xin
作者单位:University of California System; University of California Berkeley; State University System of Florida; Florida State University; Chinese Academy of Sciences; University of Science & Technology of China, CAS
摘要:Multimodal data are now prevailing in scientific research. One of the central questions in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis...
-
作者:Kramlinger, Peter; Krivobokova, Tatyana; Sperlich, Stefan
作者单位:University of Vienna; University of Geneva
摘要:In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for duster specific predictors. Consistent confidence sets for multiple inference are constructed under both, the mar...
-
作者:Ma, Haiqiang; Jiang, Jiming
作者单位:Jiangxi University of Finance & Economics; University of California System; University of California Davis
摘要:We propose a new classified mixed model prediction (CMMP) procedure, called pseudo-Bayesian CMMP, that uses network information in matching the group index between the training data and new data, whose characteristics of interest one wishes to predict. The current CMMP procedures do not incorporate such information; as a result, the methods are not consistent in terms of matching the group index. Although, as the number of training data groups increases, the current CMMP method can predict the...
-
作者:Chen, Hui; Ren, Haojie; Yao, Fang; Zou, Changliang
作者单位:Nankai University; Nankai University; Shanghai Jiao Tong University; Peking University
摘要:In multiple change-point analysis, one of the main difficulties is to determine the number of change-points. Various consistent selection methods, including the use of Schwarz information criterion and cross-validation, have been proposed to balance the model fitting and complexity. However, there is lack of systematic approaches to provide theoretical guarantee of significance in determining the number of changes. In this paper, we introduce a data-adaptive selection procedure via error rate ...
-
作者:Chen, Yuxin; Fan, Jianqing; Wang, Bingyan; Yan, Yuling
作者单位:Princeton University; Princeton University
摘要:We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e., a sort of random Fourier design andGaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current article makes two contributions by demonstrating that (i) a two-stage nonconvex algorithm attains minimax-optimal accuracy withi...
-
作者:Fan, Jianqing; Guo, Yongyi; Wang, Kaizheng
作者单位:Princeton University; Columbia University
摘要:When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and investigates two communication-efficient accurate statistical estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcas...
-
作者:Chen, Hao; Xia, Yin
作者单位:University of California System; University of California Davis; Fudan University
摘要:Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Si...