-
作者:Zhao, Junlong; Liu, Chao; Niu, Lu; Leng, Chenlei
作者单位:Beijing Normal University; Beihang University; University of Warwick; Alan Turing Institute
摘要:Influence diagnosis is an integrated component of data analysis but has been severely underinvestigated in a high dimensional regression setting. One of the key challenges, even in a fixed dimensional setting, is how to deal with multiple influential points that give rise to masking and swamping effects. The paper proposes a novel group deletion procedure referred to as multiple influential point detection by studying two extreme statistics based on a marginal-correlation-based influence measu...
-
作者:Greenewald, Kristjan; Zhou, Shuheng; Hero, Alfred, III
作者单位:International Business Machines (IBM); IBM USA; University of California System; University of California Riverside; University of Michigan System; University of Michigan
摘要:The paper introduces a multiway tensor generalization of the bigraphical lasso which uses a two-way sparse Kronecker sum multivariate normal model for the precision matrix to model parsimoniously conditional dependence relationships of matrix variate data based on the Cartesian product of graphs. We call this tensor graphical lasso generalization TeraLasso. We demonstrate by using theory and examples that the TeraLasso model can be accurately and scalably estimated from very limited data sampl...
-
作者:Bhattacharya, Bhaswar B.
作者单位:University of Pennsylvania
摘要:Testing equality of two multivariate distributions is a classical problem for which many non-parametric tests have been proposed over the years. Most of the popular two-sample tests, which are asymptotically distribution free, are based either on geometric graphs constructed by using interpoint distances between the observations (multivariate generalizations of the Wald-Wolfowitz runs test) or on multivariate data depth (generalizations of the Mann-Whitney rank test). The paper introduces a ge...
-
作者:Liang, Tengyuan; Su, Weijie J.
作者单位:University of Chicago; University of Pennsylvania
摘要:Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint informs us only how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained by using iterative optimization methods. The paper makes progress along ...
-
作者:Zhao, Qingyuan; Small, Dylan S.; Bhattacharya, Bhaswar B.
作者单位:University of Pennsylvania
摘要:To identify the estimand in missing data problems and observational studies, it is common to base the statistical estimation on the 'missingness at random' and 'no unmeasured confounder' assumptions. However, these assumptions are unverifiable by using empirical data and pose serious threats to the validity of the qualitative conclusions of statistical inference. A sensitivity analysis asks how the conclusions may change if the unverifiable assumptions are violated to a certain degree. We cons...
-
作者:Heller, Ruth; Meir, Amit; Chatterjee, Nilanjan
作者单位:Tel Aviv University; University of Washington; University of Washington Seattle; Johns Hopkins University
摘要:The practice of pooling several individual test statistics to form aggregate tests is common in many statistical applications where individual tests may be underpowered. Although selection by aggregate tests can serve to increase power, the selection process invalidates inference based on the individual test statistics, making it difficult to identify those that drive the signal in follow-up inference. Here, we develop a general approach for valid inference following selection by aggregate tes...
-
作者:Athey, Susan; Imbens, Guido W.; Wager, Stefan
作者单位:Stanford University
摘要:There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pretreatment variables. The unconfoundedness assumption is often more plausible if a large number of pretreatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. We develop a me...
-
作者:Sengupta, Srijan; Chen, Yuguo
作者单位:Virginia Polytechnic Institute & State University; University of Illinois System; University of Illinois Urbana-Champaign
摘要:The community structure that is observed in empirical networks has been of particular interest in the statistics literature, with a strong emphasis on the study of block models. We study an important network feature called node popularity, which is closely associated with community structure. Neither the classical stochastic block model nor its degree-corrected extension can satisfactorily capture the dynamics of node popularity as observed in empirical networks. We propose a popularity-adjust...
-
作者:Goncalves, Flavio B.; Gamerman, Dani
作者单位:Universidade Federal de Minas Gerais; Universidade Federal do Rio de Janeiro
摘要:We present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to enable evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretization error is involved despite the non-tractability of the likelihood function and infinite dimensionality of the problem. The method is based on a Mark...
-
作者:Schouten, Barry
作者单位:Utrecht University
摘要:In most real life studies, auxiliary variables are available and are employed to explain and understand missing data patterns and to evaluate and control causal relationships with variables of interest. Usually their availability is assumed to be a fact, even if the variables are measured without the objectives of the study in mind. As a result, inference with missing data and causal inference require some assumptions that cannot easily be validated or checked. In this paper, a framework is co...