-
作者:Aleshin-Guendel, Serge; Sadinle, Mauricio
作者单位:University of Washington; University of Washington Seattle
摘要:Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this problem have focused on linking two files assumed to be free of duplicates, or on detecting which records in a single file are duplicates. However, it is common in practice to encounter scenarios that fit somewhere in between or beyond these two settings. We pro...
-
作者:Yuan, Yubai; Qu, Annie
作者单位:University of California System; University of California Irvine
摘要:Link prediction infers potential links from observed networks, and is one of the essential problems in network analyses. In contrast to traditional graph representation modeling which only predicts two-way pairwise relations, we propose a novel tensor-based joint network embedding approach on simultaneously encoding pairwise links and hyperlinks onto a latent space, which captures the dependency between pairwise and multi-way links in inferring potential unobserved hyperlinks. The major advant...
-
作者:Laga, Ian; Bao, Le; Niu, Xiaoyue
作者单位:Montana State University System; Montana State University Bozeman; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Aggregated Relational Data (ARD), formed from How many X's do you know? questions, is a powerful tool for learning important network characteristics with incomplete network data. Compared to traditional survey methods, ARD is attractive as it does not require a sample from the target population and does not ask respondents to self-reveal their own status. This is helpful for studying hard-to-reach populations like female sex workers who may be hesitant to reveal their status. From December 20...
-
作者:Qi, Zhengling; Pang, Jong-Shi; Liu, Yufeng
作者单位:George Washington University; University of Southern California; University of North Carolina; University of North Carolina Chapel Hill
摘要:With the emergence of precision medicine, estimating optimal individualized decision rules (IDRs) has attracted tremendous attention in many scientific areas. Most existing literature has focused on finding optimal IDRs that can maximize the expected outcome for each individual. Motivated by complex individualized decision making procedures and the popular conditional value at risk (CVaR) measure, we propose a new robust criterion to estimate optimal IDRs in order to control the average lower ...
-
作者:Dai, Xiongtao; Lopez-Pintado, Sara
作者单位:Iowa State University; Northeastern University
摘要:We develop a novel exploratory tool for non-Euclidean object data based on data depth, extending celebrated Tukey's depth for Euclidean data. The proposed metric halfspace depth, applicable to data objects in a general metric space, assigns to data points depth values that characterize the centrality of these points with respect to the distribution and provides an interpretable center-outward ranking. Desirable theoretical properties that generalize standard depth properties postulated for Euc...
-
作者:Dai, Xiaowu; Lyu, Xiang; Li, Lexin
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley; University of California System; University of California Berkeley
摘要:Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has been frequently studied. However, none of the existing solutions can control the false discovery rate (FDR) unless the sample size tends to infinity. The knockoff framework is a recent proposal that can address this issue, but few knockoff solutions are directly applicable to nonparametric models. In this article, we...
-
作者:Luo, Lan; Zhou, Ling; Song, Peter X-K
作者单位:University of Iowa; Southwestern University of Finance & Economics - China; Southwestern University of Finance & Economics - China; University of Michigan System; University of Michigan
摘要:This article develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary statistics of historical data, but with no use of any historical subject-level raw data. We compare ...
-
作者:Chen, Yen-Chi
作者单位:University of Washington; University of Washington Seattle
摘要:We study the statistical properties of an estimator derived by applying a gradient ascent method with multiple initializations to a multi-modal likelihood function. We derive the population quantity that is the target of this estimator and study the properties of confidence intervals (CIs) constructed from asymptotic normality and the bootstrap approach. In particular, we analyze the coverage deficiency due to finite number of random initializations. We also investigate the CIs by inverting th...
-
作者:Law, Michael; Ritov, Ya'acov
作者单位:University of Michigan System; University of Michigan
摘要:We consider three problems in high-dimensional linear mixed models. Without any assumptions on the design for the fixed effects, we construct asymptotic statistics for testing whether a collection of random effects is zero, derive an asymptotic confidence interval for a single random effect at the parametric rate root n, and propose an empirical Bayes estimator for a part of the mean vector in ANOVA type models that performs asymptotically as well as the oracle Bayes estimator. We support our ...
-
作者:Chiang, Harold D.; Kato, Kengo; Sasaki, Yuya
作者单位:University of Wisconsin System; University of Wisconsin Madison; Cornell University; Vanderbilt University
摘要:We consider inference for high-dimensional separately and jointly exchangeable arrays where the dimensions may be much larger than the sample sizes. For both exchangeable arrays, we first derive high-dimensional central limit theorems over the rectangles and subsequently develop novel multiplier bootstraps with theoretical guarantees. These theoretical results rely on new technical tools such as Hoeffding-type decomposition and maximal inequalities for the degenerate components in the Hoeffidi...