-
作者:Park, Seyoung; Zhao, Hongyu
作者单位:Sungkyunkwan University (SKKU); Yale University
摘要:Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated...
-
作者:D'Angelo, Silvia; Murphy, Thomas Brendan; Alfo, Marco
作者单位:Sapienza University Rome; University College Dublin
摘要:The Eurovision Song Contest is a popular TV singing competition held annually among country members of the European Broadcasting Union. In this competition, each member can be both contestant and jury, as it can participate with a song and/or vote for other countries' tunes. During the years, the voting system has repeatedly been accused of being biased by tactical voting; votes would represent strategic interests rather than actual musical preferences of the voting countries. In this work, we...
-
作者:Bertolacci, Michael; Cripps, Edward; Rosen, Ori; Lau, John W.; Cripps, Sally
作者单位:University of Western Australia; University of Sydney; University of Texas System; University of Texas El Paso
摘要:Daily precipitation has an enormous impact on human activity, and the study of how it varies over time and space, and what global indicators influence it, is of paramount importance to Australian agriculture. We analyze over 294 million daily rainfall measurements since 1876, spanning 17,606 sites across continental Australia. The data are not only large but also complex, and the topic would benefit from a common and publicly available statistical framework. We propose a Bayesian hierarchical ...
-
作者:Marino, Maria Francesca; Ranalli, Maria Giovanna; Salvati, Nicola; Alfo, Marco
作者单位:University of Florence; University of Perugia; University of Pisa
摘要:The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the labor force survey. However, direct estimates of unemployment incidence cannot be released for local labor market areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas, and the majority is characterized by a small sample size which renders direct estimates inadequate. The empirical best predictor represents an appropriate, mo...
-
作者:Dinsdale, Daniel; Salibian-barrera, Matias
作者单位:University of British Columbia
摘要:In the last 25 years there has been an important increase in the amount of data collected from animal-mounted sensors (bio-probes) which are often used to study the animals' behaviour or environment. We focus here on an example of the latter, where the interest is in sea surface temperature (SST), and measurements are taken from sensors mounted on elephant seals in the southern Indian Ocean. We show that standard geostatistical models may not be reliable for this type of data, due to the possi...
-
作者:Fukuyama, Julia
作者单位:Indiana University System; Indiana University Bloomington
摘要:Exploratory analysis is an important first step for discovering latent structure and generating hypotheses in large biological data sets. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results that are unstable and difficult to interpret. Here, we present adaptive generalized principal components analysis (adaptive gPCA), a new method that solves these problems by incorporating information about the ...
-
作者:Liebl, Dominik
作者单位:University of Bonn
摘要:This work is motivated by the problem of testing for differences in the mean electricity prices before and after Germany's abrupt nuclear phaseout after the nuclear disaster in Fukushima Daiichi, Japan, in mid-March 2011. Taking into account the nature of the data and the auction design of the electricity market, we approach this problem using a Local Linear Kernel (LLK) estimator for the nonparametric mean function of sparse covariate-adjusted functional data. We build upon recent theoretical...
-
作者:Wang, Miaoyan; Fischer, Jonathan; Song, Yun S.
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of California System; University of California Berkeley; University of California System; University of California Berkeley; Chan Zuckerberg Initiative (CZI)
摘要:The advent of high-throughput sequencing technologies has led to an increasing availability of large multi-tissue data sets which contain gene expression measurements across different tissues and individuals. In this setting, variation in expression levels arises due to contributions specific to genes, tissues, individuals, and interactions thereof. Classical clustering methods are ill-suited to explore these three-way interactions and struggle to fully extract the insights into transcriptome ...
-
作者:Park, Soyoung; Carriquiry, Alicia
作者单位:Iowa State University; Iowa State University
摘要:Glass fragments are often compared in the course of a forensic evaluation using their chemical composition determined with technologies such as LA-ICP-MS. At present forensic scientists advocate the use of two comparison criteria based on univariate intervals around all mean elemental concentrations for fragments originating from a known piece of broken glass. The main drawback of this approach is that it does not consider the dependencies between concentrations. Further, when the elemental co...
-
作者:McDavid, Andrew; Gottardo, Raphael; Simon, Noah; Drton, Mathias
作者单位:University of Rochester; University of Washington; University of Washington Seattle; Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle; University of Copenhagen
摘要:Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coregulatory networks from such data, we propose a multi...