-
作者:Zhao, Junlong; Liu, Chao; Niu, Lu; Leng, Chenlei
作者单位:Beijing Normal University; Beihang University; University of Warwick; Alan Turing Institute
摘要:Influence diagnosis is an integrated component of data analysis but has been severely underinvestigated in a high dimensional regression setting. One of the key challenges, even in a fixed dimensional setting, is how to deal with multiple influential points that give rise to masking and swamping effects. The paper proposes a novel group deletion procedure referred to as multiple influential point detection by studying two extreme statistics based on a marginal-correlation-based influence measu...
-
作者:Greenewald, Kristjan; Zhou, Shuheng; Hero, Alfred, III
作者单位:International Business Machines (IBM); IBM USA; University of California System; University of California Riverside; University of Michigan System; University of Michigan
摘要:The paper introduces a multiway tensor generalization of the bigraphical lasso which uses a two-way sparse Kronecker sum multivariate normal model for the precision matrix to model parsimoniously conditional dependence relationships of matrix variate data based on the Cartesian product of graphs. We call this tensor graphical lasso generalization TeraLasso. We demonstrate by using theory and examples that the TeraLasso model can be accurately and scalably estimated from very limited data sampl...
-
作者:Bhattacharya, Bhaswar B.
作者单位:University of Pennsylvania
摘要:Testing equality of two multivariate distributions is a classical problem for which many non-parametric tests have been proposed over the years. Most of the popular two-sample tests, which are asymptotically distribution free, are based either on geometric graphs constructed by using interpoint distances between the observations (multivariate generalizations of the Wald-Wolfowitz runs test) or on multivariate data depth (generalizations of the Mann-Whitney rank test). The paper introduces a ge...
-
作者:Liang, Tengyuan; Su, Weijie J.
作者单位:University of Chicago; University of Pennsylvania
摘要:Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint informs us only how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained by using iterative optimization methods. The paper makes progress along ...
-
作者:Zhao, Qingyuan; Small, Dylan S.; Bhattacharya, Bhaswar B.
作者单位:University of Pennsylvania
摘要:To identify the estimand in missing data problems and observational studies, it is common to base the statistical estimation on the 'missingness at random' and 'no unmeasured confounder' assumptions. However, these assumptions are unverifiable by using empirical data and pose serious threats to the validity of the qualitative conclusions of statistical inference. A sensitivity analysis asks how the conclusions may change if the unverifiable assumptions are violated to a certain degree. We cons...
-
作者:Heller, Ruth; Meir, Amit; Chatterjee, Nilanjan
作者单位:Tel Aviv University; University of Washington; University of Washington Seattle; Johns Hopkins University
摘要:The practice of pooling several individual test statistics to form aggregate tests is common in many statistical applications where individual tests may be underpowered. Although selection by aggregate tests can serve to increase power, the selection process invalidates inference based on the individual test statistics, making it difficult to identify those that drive the signal in follow-up inference. Here, we develop a general approach for valid inference following selection by aggregate tes...