-
作者:Wang, Fan; Zhang, Wei; Yao, Fang
作者单位:Columbia University; Peking University
摘要:The identification of genetic signal regions in the human genome is critical for understanding the genetic architecture of complex traits and diseases. Numerous methods based on scan algorithms (i.e., QSCAN, SCANG, SCANG-STAAR) have been developed to allow dynamic window sizes in whole-genome association studies. Beyond scan algorithms, we have recently developed the binary and research (BiRS) algorithm, which is more computationally efficient than scan-based methods and exhibits superior stat...
-
作者:Huang, Tao; Pei, Youquan; You, Jinhong; Zhang, Wenyang
作者单位:Shanghai University of Finance & Economics; Shandong University; University of Macau
摘要:The development of an appropriate statistical modelling strategy is of paramount importance for the successful analysis of data. The trade-off between flexibility and parsimony is of vital importance in statistical modelling. In the context of clustered data analysis, it is essential to account for the inherent heterogeneity between clusters while simultaneously ensuring parsimony to mitigate the potential for complexity and to preserve the homogeneity within clusters. The objective of this pa...
-
作者:Stanley, Kyle; Lazar, Nicole; Reimherr, Mathew
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
摘要:Many fMRI analyses examine functional connectivity, or statistical dependencies among remote brain regions. Factor analysis, which parsimoniously describes correlations between many observed variables, offers a natural framework in which to study such dependencies. However, multivariate factor models break down when applied to functional and spatiotemporal data, like fMRI. We present a factor model for discretely-observed multidimensional functional data that is well suited to the study of fun...
-
作者:Wang, Tianying; Ionita-Laza, Iuliana; Wei, Ying
作者单位:Colorado State University System; Colorado State University Fort Collins; Columbia University; Lund University
摘要:Transcriptome-wide association studies (TWAS) are powerful tools for identifying gene-level associations by integrating genome-wide association studies and gene expression data. However, most TWAS methods focus on linear associations between genes and traits, ignoring the complex nonlinear relationships that may be present in biological systems. To address this limitation, we propose a novel framework, QTWAS, which integrates a quantilebased gene expression model into the TWAS model, allowing ...
-
作者:Duan, Chenyang; Jiang, Yuan
作者单位:AbbVie; Oregon State University
摘要:To classify biological roles of different species in an ecological system, modern studies collect longitudinal and compositional counts of DNA sequences of taxonomically diagnostic genetic markers to measure the abundance of species over time. The major challenges of conducting this analysis are twofold: how to accommodate the complex dependence in this data type and how to model the longitudinal trajectories of the species' abundances. In this paper we propose a novel method named COMPARING t...
-
作者:Jiang, Bei; Raftery, Adrian E.; Steele, Russell J.; Wang, Naisyin
作者单位:University of Alberta; University of Washington; University of Washington Seattle; McGill University; University of Michigan System; University of Michigan
摘要:Responsible data sharing anchors research reproducibility and promotes the integrity of scientific research. Motivated by Canadian Scleroderma Research Group (CSRG) patient registry data, we present a risk-based method to produce privacy-preserved and high-utility synthetic datasets, which also simultaneously imputes missing data of mixed continuous and categorical types in the original dataset. This method divides all individuals into different subgroups, based on their reidentification risks...
-
作者:Boxer, Kate S.; Hong, Boyeong; Kontokosta, Constantine E.; Neill, Daniel B.
作者单位:New York University; New York University
摘要:Systems such as 311 enable residents of a community to report on their environments and to request nonemergency municipal services. While such systems provide an important link between community and government, resident-generated data suffer from reporting bias, with some subpopulations reporting at lower rates than others. Our research focuses on defining the underreporting of heating and hot water problems to New York City's 311 system and developing methods to estimate under-reporting. Firs...
-
作者:Cabello, Esteban; Morales, Domingo; Perez, Agustin
作者单位:Universidad Miguel Hernandez de Elche; Universidad Miguel Hernandez de Elche
摘要:Exposure indices measure the degree of contact between two groups and are used to quantify occupational discrepancies between genders in a set of occupational sectors. This paper presents a novel methodology for predicting area-level proportions of employed men and women across various occupation sectors, along with estimating exposure indexes. The challenge arises from the compositional nature of the direct estimators of proportions, which tend to be imprecise when sample sizes are small. To ...
-
作者:Ma, Yingying; Lan, Wei; Leng, Chenlei; Li, Ting; Wang, Hansheng
作者单位:Beihang University; Southwestern University of Finance & Economics - China; University of Warwick; Hong Kong Polytechnic University; Peking University
摘要:The social characteristics of players in a social network are closely associated with their network positions and relational importance. Identifying those influential players in a network is of great importance, as it helps to understand how ties are formed, how information is propagated, and, in turn, can guide the dissemination of new information. Motivated by a Sina Weibo social network analysis of the 2021 Henan Floods, where response variables for each Sina Weibo user are available, we pr...
-
作者:Moon, Haeun; Du, Jin-Hong; Lei, Jing; Roeder, Kathryn
作者单位:Seoul National University (SNU); Carnegie Mellon University
摘要:Quantitative measurements produced by mass spectrometry proteomics experiments offer a direct way to explore the role of proteins in molecular mechanisms. However, analysis of such data is challenging due to the large proportion of missing values. A common strategy to address this issue is to utilize an imputed dataset, which often introduces systematic bias into downstream analyses if the imputation errors are ignored. In this paper we propose a statistical framework, inspired by doubly robus...