-
作者:Blackwell, Matthew; Pashley, Nicole E.
作者单位:Harvard University; Harvard University; Rutgers University System; Rutgers University New Brunswick
摘要:Factorial experiments are widely used to assess the marginal, joint, and interactive effects of multiple concurrent factors. While a robust literature covers the design and analysis of these experiments, there is less work on how to handle treatment noncompliance in this setting. To fill this gap, we introduce a new methodology that uses the potential outcomes framework for analyzing 2(K) factorial experiments with noncompliance on any number of factors. This framework builds on and extends th...
-
作者:Ren, Zhimei; Wei, Yuting; Candes, Emmanuel
作者单位:University of Chicago; University of Pennsylvania; Stanford University
摘要:Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This article introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derand...
-
作者:He, Jingyu; Hahn, P. Richard
作者单位:City University of Hong Kong; Arizona State University; Arizona State University-Tempe
摘要:This article develops a novel stochastic tree ensemble method for nonlinear regression, referred to as accelerated Bayesian additive regression trees, or XBART. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning algorithms, XBART attains state-of-the-art performance at prediction and function estimation. Simulation studies demonstrate that XBART provides accurate point-wise estimates of the m...
-
作者:Dai, Chenguang; Lin, Buyu; Xing, Xin; Liu, Jun S.
作者单位:Harvard University; Virginia Polytechnic Institute & State University
摘要:The Generalized Linear Model (GLM) has been widely used in practice to model counts or other types of non- Gaussian data. This article introduces a framework for feature selection in the GLM that can achieve robust False Discovery Rate (FDR) control. The main idea is to construct a mirror statistic based on data perturbation to measure the importance of each feature. FDR control is achieved by taking advantage of themirror statistic's property that its sampling distribution is (asymptotically)...
-
作者:Tian, Ye; Feng, Yang
作者单位:Columbia University; New York University
摘要:In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its l(1)/l(2)-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and sources are sufficiently close to each other, these boun...
-
作者:Li, Lexin; Zeng, Jing; Zhang, Xin
作者单位:University of California System; University of California Berkeley; State University System of Florida; Florida State University; Chinese Academy of Sciences; University of Science & Technology of China, CAS
摘要:Multimodal data are now prevailing in scientific research. One of the central questions in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis...
-
作者:Kramlinger, Peter; Krivobokova, Tatyana; Sperlich, Stefan
作者单位:University of Vienna; University of Geneva
摘要:In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for duster specific predictors. Consistent confidence sets for multiple inference are constructed under both, the mar...
-
作者:Ma, Haiqiang; Jiang, Jiming
作者单位:Jiangxi University of Finance & Economics; University of California System; University of California Davis
摘要:We propose a new classified mixed model prediction (CMMP) procedure, called pseudo-Bayesian CMMP, that uses network information in matching the group index between the training data and new data, whose characteristics of interest one wishes to predict. The current CMMP procedures do not incorporate such information; as a result, the methods are not consistent in terms of matching the group index. Although, as the number of training data groups increases, the current CMMP method can predict the...
-
作者:Chen, Hui; Ren, Haojie; Yao, Fang; Zou, Changliang
作者单位:Nankai University; Nankai University; Shanghai Jiao Tong University; Peking University
摘要:In multiple change-point analysis, one of the main difficulties is to determine the number of change-points. Various consistent selection methods, including the use of Schwarz information criterion and cross-validation, have been proposed to balance the model fitting and complexity. However, there is lack of systematic approaches to provide theoretical guarantee of significance in determining the number of changes. In this paper, we introduce a data-adaptive selection procedure via error rate ...
-
作者:Chen, Yuxin; Fan, Jianqing; Wang, Bingyan; Yan, Yuling
作者单位:Princeton University; Princeton University
摘要:We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e., a sort of random Fourier design andGaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current article makes two contributions by demonstrating that (i) a two-stage nonconvex algorithm attains minimax-optimal accuracy withi...