-
作者:Hentschel, Manuel; Engelke, Sebastian; Segers, Johan
作者单位:University of Geneva; Universite Catholique Louvain
摘要:The severity of multivariate extreme events is driven by the dependence between the largest marginal observations. The H & uuml;sler-Reiss distribution is a versatile model for this extremal dependence, and it is usually parameterized by a variogram matrix. In order to represent conditional independence relations and obtain sparse parameterizations, we introduce the novel H & uuml;sler-Reiss precision matrix. Similarly to the Gaussian case, this matrix appears naturally in density representati...
-
作者:Han, Yang; Wu, Weichi; Zhang, Wenyang
作者单位:University of Manchester; Tsinghua University
摘要:In panel data analysis, individual attributes are of importance in many real applications. With the advancement of data collection, it is often possible to acquire enough information for individual attributes in a collected panel dataset, and data from other individuals may contain the information for the attributes of the individual under concern. Homogeneity pursuit is an important topic in panel data analysis when individual attributes are of interest. Existing approaches are mainly based o...
-
作者:Shen, Shuting; Lu, Junwei; Lin, Xihong
作者单位:National University of Singapore; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University
摘要:In light of the rapidly growing large-scale data in federated ecosystems, the traditional principal component analysis (PCA) is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under distributed settings. In this article, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension d and the sa...
-
作者:Jin, Jiashun; Ke, Zheng Tracy; Tang, Jiajun; Wang, Jingming
作者单位:Carnegie Mellon University; Harvard University; University of Virginia
摘要:The block-model family has four popular network models (SBM, DCBM, MMSBM, and DCMM). A fundamental problem is, how well each of these models fits with real networks. We propose GoF-MSCORE as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas. The first is to use cycle count statistics as a general recipe for GoF. The second is a novel network fitting scheme. GoF-MSCORE is a flexible GoF approach, and we further extend it to SBM, DCBM, and MMSBM. ...
-
作者:Parikh, Harsh; Ross, Rachael K.; Stuart, Elizabeth; Rudolph, Kara E.
作者单位:Johns Hopkins University; Columbia University
摘要:Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our article addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), ...
-
作者:Liu, Weidong; Mao, Xiaojun; Tu, Jiyuan
作者单位:Shanghai Jiao Tong University; Shanghai Jiao Tong University; Shanghai University of Finance & Economics
摘要:This article introduces two highly efficient distributed non-convex sparse learning algorithms. Our approach accommodates non-convexity in both the loss function and penalty, acknowledging the potential non-uniqueness of local minimizers due to the inherent non-convexity. The development of an algorithm that ensures convergence to a locally minimal solution with desired statistical properties becomes imperative in this context. To overcome this challenge, we propose a strategy involving the re...
-
作者:Hamura, Yasuyuki; Irie, Kaoru; Sugasawa, Shonosuke
作者单位:Kyoto University; University of Tokyo; Keio University
摘要:Count data with zero inflation and large outliers are ubiquitous in many scientific applications. However, posterior analysis under a standard statistical model, such as Poisson or negative binomial distribution, is sensitive to such contamination. This study introduces a novel framework for Bayesian modeling of counts that is robust to both zero inflation and large outliers. In doing so, we introduce rescaled beta distribution and adopt it to absorb undesirable effects from zero and outlying ...
-
作者:Peng, Jingfu; Li, Yang; Yang, Yuhong
作者单位:Tsinghua University; Renmin University of China; Renmin University of China
摘要:In the past decades, model averaging (MA) has attracted much attention as it has emerged as an alternative tool to the model selection (MS) statistical approach. Hansen introduced a Mallows model averaging (MMA) method with model weights selected by minimizing a Mallows' Cp criterion. The main theoretical justification for MMA is an asymptotic optimality (AOP), which states that the risk/loss of the resulting MA estimator is asymptotically equivalent to that of the best but infeasible averaged...
-
作者:Zhan, Wentao; Datta, Abhirup
作者单位:Johns Hopkins University
摘要:Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While nonlinear machine learning algorithms like neural networks are increasingly being used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance. We propose NN-GLS, embedding neural networks directly w...
-
作者:Spector, Asher; Janson, Lucas
作者单位:Stanford University; Harvard University
摘要:Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems arise, for example, when locating stars in astronomical surveys and in changepoint detection. Common B...