The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples
成果类型:
Article
署名作者:
Evans, Steven N.; Matsen, Frederick A.
署名单位:
Fred Hutchinson Cancer Center; University of California System; University of California Berkeley
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/j.1467-9868.2011.01018.x
发表日期:
2012
页码:
569-592
关键词:
microbial communities
quadratic-forms
maximum-likelihood
diversity
distributions
placement
alignment
definite
摘要:
. It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical KantorovichRubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this KantorovichRubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop Lp Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis no difference between two communities can be approximated by using a Gaussian process functional. We relate the L2-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent random variables.
来源URL: