-
作者:Tian, Ye; Feng, Yang
作者单位:Columbia University; New York University
摘要:Variable screening methods have been shown to be effective in dimension reduction under the ultra-high dimensional setting. Most existing screening methods are designed to rank the predictors according to their individual contributions to the response. As a result, variables that are marginally independent but jointly dependent with the response could be missed. In this work, we propose a new framework for variable screening, random subspace ensemble (RaSE), which works by evaluating the quali...
-
作者:Bai, Peiliang; Safikhani, Abolfazl; Michailidis, George
作者单位:State University System of Florida; University of Florida; State University System of Florida; University of Florida; State University System of Florida; University of Florida
摘要:We study the problem of detecting and locating change points in high-dimensional Vector Autoregressive (VAR) models, whose transition matrices exhibit low rank plus sparse structure. We first address the problem of detecting a single change point using an exhaustive search algorithm and establish a finite sample error bound for its accuracy. Next, we extend the results to the case of multiple change points that can grow as a function of the sample size. Their detection is based on a two-step a...
-
作者:Zhong, Wei; Qian, Chen; Liu, Wanjun; Zhu, Liping; Li, Runze
作者单位:Xiamen University; Virginia Polytechnic Institute & State University; Renmin University of China; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:It is important to quantify the differences in returns to skills using the online job advertisements data, which have attracted great interest in both labor economics and statistics fields. In this article, we study the relationship between the posted salary and the job requirements in online labor markets. There are two challenges to deal with. First, the posted salary is always presented in an interval-valued form, for example, 5k-10k yuan per month. Simply taking the mid-point or the lower ...
-
作者:Park, Yeonjoo; Li, Bo; Li, Yehua
作者单位:University of Texas System; University of Texas at San Antonio; University of Illinois System; University of Illinois Urbana-Champaign; University of California System; University of California Riverside
摘要:Reliable prediction for crop yield is crucial for economic planning, food security monitoring, and agricultural risk management. This study aims to develop a crop yield forecasting model at large spatial scales using meteorological variables closely related to crop growth. The influence of climate patterns on agricultural productivity can be spatially inhomogeneous due to local soil and environmental conditions. We propose a Bayesian spatially varying functional model (BSVFM) to predict county...
-
作者:Ibriga, Hilda S.; Sun, Will Wei
作者单位:Purdue University System; Purdue University; Purdue University System; Purdue University
摘要:We aim to provably complete a sparse and highly missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users' click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on nonmissing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this ...
-
作者:Cai, Jian-Feng; Li, Jingyang; Xia, Dong
作者单位:Hong Kong University of Science & Technology
摘要:We investigate a generalized framework to estimate a latent low-rank plus sparse tensor, where the low-rank tensor often captures the multi-way principal components and the sparse tensor accounts for potential model mis-specifications or heterogeneous signals that are unexplainable by the low-rank part. The framework flexibly covers both linear and generalized linear models, and can easily handle continuous or categorical variables. We propose a fast algorithm by integrating the Riemannian gra...
-
作者:Yao, Shunan; Rava, Bradley; Tong, Xin; James, Gareth
作者单位:University of Southern California; University of Southern California
摘要:Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which constrains the more severe type of error (e.g., the Type I error) under a preferred level while minimizing the other (e.g., the Type II error). However...
-
作者:Liang, Decai; Huang, Hui; Guan, Yongtao; Yao, Fang
作者单位:Nankai University; Sun Yat Sen University; University of Miami; Peking University
摘要:For spatially dependent functional data, a generalized Karhunen-Loeve expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separability, and propose a formal test to examine its validity for non-replicated spatially stationary funct...
-
作者:Du, Lilun; Guo, Xu; Sun, Wenguang; Zou, Changliang
作者单位:Hong Kong University of Science & Technology; Beijing Normal University; University of Southern California; Nankai University
摘要:We develop a new class of distribution-free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening, and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data-driven threshold along the ranking to contr...
-
作者:Gil-Leyva, Maria F.; Mena, Ramses H.
摘要:Our object of study is the general class of stick-breaking processes with exchangeable length variables. These generalize well-known Bayesian nonparametric priors in an unexplored direction. We give conditions to assure the respective species sampling process is proper and the corresponding prior has full support. For a rich subclass we explain how, by tuning a single [0,1]-valued parameter, the stochastic ordering of the weights can be modulated, and Dirichlet and Geometric priors can be reco...