-
作者:Park, Seyoung; Zhao, Hongyu
作者单位:Sungkyunkwan University (SKKU); Yale University
摘要:Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated...
-
作者:D'Angelo, Silvia; Murphy, Thomas Brendan; Alfo, Marco
作者单位:Sapienza University Rome; University College Dublin
摘要:The Eurovision Song Contest is a popular TV singing competition held annually among country members of the European Broadcasting Union. In this competition, each member can be both contestant and jury, as it can participate with a song and/or vote for other countries' tunes. During the years, the voting system has repeatedly been accused of being biased by tactical voting; votes would represent strategic interests rather than actual musical preferences of the voting countries. In this work, we...
-
作者:Relion, Jesus D. Arroyo; Kessler, Daniel; Levina, Elizaveta; Taylor, Stephan F.
作者单位:Johns Hopkins University; University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has sh...
-
作者:Bertolacci, Michael; Cripps, Edward; Rosen, Ori; Lau, John W.; Cripps, Sally
作者单位:University of Western Australia; University of Sydney; University of Texas System; University of Texas El Paso
摘要:Daily precipitation has an enormous impact on human activity, and the study of how it varies over time and space, and what global indicators influence it, is of paramount importance to Australian agriculture. We analyze over 294 million daily rainfall measurements since 1876, spanning 17,606 sites across continental Australia. The data are not only large but also complex, and the topic would benefit from a common and publicly available statistical framework. We propose a Bayesian hierarchical ...
-
作者:Marino, Maria Francesca; Ranalli, Maria Giovanna; Salvati, Nicola; Alfo, Marco
作者单位:University of Florence; University of Perugia; University of Pisa
摘要:The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the labor force survey. However, direct estimates of unemployment incidence cannot be released for local labor market areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas, and the majority is characterized by a small sample size which renders direct estimates inadequate. The empirical best predictor represents an appropriate, mo...
-
作者:Dinsdale, Daniel; Salibian-barrera, Matias
作者单位:University of British Columbia
摘要:In the last 25 years there has been an important increase in the amount of data collected from animal-mounted sensors (bio-probes) which are often used to study the animals' behaviour or environment. We focus here on an example of the latter, where the interest is in sea surface temperature (SST), and measurements are taken from sensors mounted on elephant seals in the southern Indian Ocean. We show that standard geostatistical models may not be reliable for this type of data, due to the possi...
-
作者:Fukuyama, Julia
作者单位:Indiana University System; Indiana University Bloomington
摘要:Exploratory analysis is an important first step for discovering latent structure and generating hypotheses in large biological data sets. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results that are unstable and difficult to interpret. Here, we present adaptive generalized principal components analysis (adaptive gPCA), a new method that solves these problems by incorporating information about the ...
-
作者:Sohn, Michael B.; Li, Hongzhe
作者单位:University of Pennsylvania
摘要:Motivated by recent advances in causal mediation analysis and problems in the analysis of microbiome data, we consider the setting where the effect of a treatment on an outcome is transmitted through perturbing the microbial communities or compositional mediators. The compositional and high-dimensional nature of such mediators makes the standard mediation analysis not directly applicable to our setting. We propose a sparse compositional mediation model that can be used to estimate the causal d...
-
作者:Liebl, Dominik
作者单位:University of Bonn
摘要:This work is motivated by the problem of testing for differences in the mean electricity prices before and after Germany's abrupt nuclear phaseout after the nuclear disaster in Fukushima Daiichi, Japan, in mid-March 2011. Taking into account the nature of the data and the auction design of the electricity market, we approach this problem using a Local Linear Kernel (LLK) estimator for the nonparametric mean function of sparse covariate-adjusted functional data. We build upon recent theoretical...
-
作者:Wang, Miaoyan; Fischer, Jonathan; Song, Yun S.
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of California System; University of California Berkeley; University of California System; University of California Berkeley; Chan Zuckerberg Initiative (CZI)
摘要:The advent of high-throughput sequencing technologies has led to an increasing availability of large multi-tissue data sets which contain gene expression measurements across different tissues and individuals. In this setting, variation in expression levels arises due to contributions specific to genes, tissues, individuals, and interactions thereof. Classical clustering methods are ill-suited to explore these three-way interactions and struggle to fully extract the insights into transcriptome ...