-
作者:Williamson, Brian D.; Gilbert, Peter B.; Simon, Noah R.; Carone, Marco
作者单位:Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle
摘要:In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response-in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading...
-
作者:Alquier, Pierre; Cherief-Abdellatif, Badr-Eddine; Derumigny, Alexis; Fermanian, Jean-David
作者单位:RIKEN; University of Oxford; Delft University of Technology; Institut Polytechnique de Paris; ENSAE Paris
摘要:This article deals with robust inference for parametric copula models. Estimation using canonical maximum likelihood might be unstable, especially in the presence of outliers. We propose to use a procedure based on the maximum mean discrepancy (MMD) principle. We derive nonasymptotic oracle inequalities, consistency and asymptotic normality of this new estimator. In particular, the oracle inequality holds without any assumption on the copula family, and can be applied in the presence of outlie...
-
作者:Li, Sai; Cai, T. Tony; Li, Hongzhe
作者单位:Renmin University of China; University of Pennsylvania; University of Pennsylvania
摘要:Transfer learning for high-dimensional Gaussian graphical models (GGMs) is studied. The target GGM is estimated by incorporating the data from similar and related auxiliary studies, where the similarity between the target graph and each auxiliary graph is characterized by the sparsity of a divergence matrix. An estimation algorithm, Trans-CLIME, is proposed and shown to attain a faster convergence rate than the minimax rate in the single-task setting. Furthermore, we introduce a universal debi...
-
作者:Ma, Pulong; Bhadra, Anindya
作者单位:Clemson University; Purdue University System; Purdue University
摘要:The Matern covariance function is a popular choice for prediction in spatial statistics and uncertainty quantification literature. A key benefit of the Matern class is that it is possible to get precise control over the degree of mean-square differentiability of the random process. However, the Matern class possesses exponentially decaying tails, and thus, may not be suitable for modeling polynomially decaying dependence. This problem can be remedied using polynomial covariances; however, one ...
-
作者:Zhen, Yaoming; Wang, Junhui
作者单位:City University of Hong Kong
摘要:Conventional network data have largely focused on pairwise interactions between two entities, yet multi-way interactions among multiple entities have been frequently observed in real-life hypergraph networks. In this article, we propose a novel method for detecting community structure in general hypergraph networks, uniform or non-uniform. The proposed method introduces a null vertex to augment a nonuniform hypergraph into a uniform multi-hypergraph, and then embeds the multi-hypergraph in a l...
-
作者:Dudek, Anna E.; Lenart, Lukasz
作者单位:AGH University of Krakow; Cracow University of Economics
摘要:We introduce a new approach for nonparametric spectral density estimation based on the subsampling technique, which we apply to the important class of nonstationary time series. These are almost periodically correlated sequences. In contrary to existing methods, our technique does not require demeaning of the data. On the simulated data examples, we compare our estimator of spectral density function with the classical one. Additionally, we propose a modified estimator, which allows to reduce t...
-
作者:Painsky, Amichai
作者单位:Tel Aviv University
摘要:Consider a finite sample from an unknown distribution over a countable alphabet. The missing mass refers to the probability of symbols that do not appear in the sample. Estimating the missing mass is a basic problem in statistics and related fields, which dates back to the early work of Laplace, and the more recent seminal contribution of Good and Turing. In this article, we introduce a generalized Good-Turing (GT) framework for missing mass estimation. We derive an upper-bound for the risk (i...
-
作者:Lee, Kuang-Yao; Li, Lexin; Li, Bing; Zhao, Hongyu
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Temple University; University of California System; University of California Berkeley; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Yale University
摘要:In this article, we develop a nonparametric graphical model for multivariate random functions. Most existing graphical models are restricted by the assumptions of multivariate Gaussian or copula Gaussian distributions, which also imply linear relations among the random variables or functions on different nodes. We relax those assumptions by building our graphical model based on a new statistical object-the functional additive regression operator. By carrying out regression and neighborhood sel...
-
作者:Liu, Yan; Wang, Dewei; Li, Li; Li, Dingsheng
作者单位:Nevada System of Higher Education (NSHE); University of Nevada Reno; University of South Carolina System; University of South Carolina Columbia
摘要:The National Health and Nutrition Examination Survey (NHANES) has been continuously biomonitoring Americans' exposure to two families of harmful environmental chemicals: polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs). However, biomonitoring these chemicals is expensive. To save cost, in 2005, NHANES resorted to pooled biomonitoring; that is, amalgamating individual specimens to form a pool and measuring chemical levels from pools. Despite being publicly available, ...
-
作者:Zhang, Jingfei; Li, Yi
作者单位:University of Miami; University of Michigan System; University of Michigan
摘要:Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precisionmatrix of aGaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network ...