-
作者:Aleshin-Guendel, Serge; Sadinle, Mauricio
作者单位:University of Washington; University of Washington Seattle
摘要:Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this problem have focused on linking two files assumed to be free of duplicates, or on detecting which records in a single file are duplicates. However, it is common in practice to encounter scenarios that fit somewhere in between or beyond these two settings. We pro...
-
作者:Zhang, B.; Small, D. S.; Lasater, K. B.; McHugh, M.; Silber, J. H.; Rosenbaum, P. R.
作者单位:University of Pennsylvania; University of Pennsylvania; University of Pennsylvania
摘要:Multivariate matching has two goals (i) to construct treated and control groups that have similar distributions of observed covariates, and (ii) to produce matched pairs or sets that are homogeneous in a few key covariates. When there are only a few binary covariates, both goals may be achieved by matching exactly for these few covariates. Commonly, however, there are many covariates, so goals (i) and (ii) come apart, and must be achieved by different means. As is also true in a randomized exp...
-
作者:Liu, Hua; You, Jinhong; Cao, Jiguo
作者单位:Xi'an Jiaotong University; Shanghai Lixin University of Accounting & Finance; Shanghai University of Finance & Economics; Simon Fraser University
摘要:Motivated by recent work studying massive functional data, such as the COVID-19 data, we propose a new dynamic interaction semiparametric function-on-scalar (DISeF) model. The proposed model is useful to explore the dynamic interaction among a set of covariates and their effects on the functional response. The proposed model includes many important models investigated recently as special cases. By tensor product B-spline approximating the unknown bivariate coefficient functions, a three-step e...
-
作者:Zhang, Qingzhao; Ma, Shuangge
作者单位:Xiamen University; Yale University
-
作者:Balocchi, Cecilia; Deshpande, Sameer K. K.; George, Edward I. I.; Jensen, Shane T. T.
作者单位:University of Edinburgh; University of Wisconsin System; University of Wisconsin Madison; University of Pennsylvania
摘要:Accurate estimation of the change in crime over time is a critical first step toward better understanding of public safety in large urban environments. Bayesian hierarchical modeling is a natural way to study spatial variation in urban crime dynamics at the neighborhood level, since it facilitates principled sharing of information between spatially adjacent neighborhoods. Typically, however, cities contain many physical and social boundaries that may manifest as spatial discontinuities in cri...
-
作者:Du, Jin-Hong; Guo, Yifeng; Wang, Xueqin
作者单位:Carnegie Mellon University; University of Hong Kong; Chinese Academy of Sciences; University of Science & Technology of China, CAS
摘要:The expanding number of assets offers more opportunities for investors but poses new challenges for modern portfolio management (PM). As a central plank of PM, portfolio selection by expected utility maximization (EUM) faces uncontrollable estimation and optimization errors in ultrahigh-dimensional scenarios. Past strategies for high-dimensional PM mainly concern only large-cap companies and select many stocks, making PM impractical. We propose a sample-average-approximation-based portfolio st...
-
作者:Yuan, Yubai; Qu, Annie
作者单位:University of California System; University of California Irvine
摘要:Link prediction infers potential links from observed networks, and is one of the essential problems in network analyses. In contrast to traditional graph representation modeling which only predicts two-way pairwise relations, we propose a novel tensor-based joint network embedding approach on simultaneously encoding pairwise links and hyperlinks onto a latent space, which captures the dependency between pairwise and multi-way links in inferring potential unobserved hyperlinks. The major advant...
-
作者:Zhu, Wanrong; Chen, Xi; Wu, Wei Biao
作者单位:University of Chicago; New York University; University of Chicago
摘要:The stochastic gradient descent (SGD) algorithm is widely used for parameter estimation, especially for huge datasets and online learning. While this recursive algorithm is popular for computation and memory efficiency, quantifying variability and randomness of the solutions has been rarely studied. This article aims at conducting statistical inference of SGD-based estimates in an online setting. In particular, we propose a fully online estimator for the covariance matrix of averaged SGD (ASGD...
-
作者:Zhang, Yao; Zhao, Qingyuan
作者单位:Stanford University; University of Cambridge
摘要:The meaning of randomization tests has become obscure in statistics education and practice over the last century. This article makes a fresh attempt at rectifying this core concept of statistics. A new term-quasi-randomization test-is introduced to define significance tests based on theoretical models and distinguish these tests from the randomization tests based on the physical act of randomization. The practical importance of this distinction is illustrated through a real stepped-wedge clust...
-
作者:Laga, Ian; Bao, Le; Niu, Xiaoyue
作者单位:Montana State University System; Montana State University Bozeman; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Aggregated Relational Data (ARD), formed from How many X's do you know? questions, is a powerful tool for learning important network characteristics with incomplete network data. Compared to traditional survey methods, ARD is attractive as it does not require a sample from the target population and does not ask respondents to self-reveal their own status. This is helpful for studying hard-to-reach populations like female sex workers who may be hesitant to reveal their status. From December 20...