作者:Moore, Jonathan W.; Pitman, Kara J.; Whited, Diane; Marsden, Naxginkw Tara; Sexton, Erin K.; Sergeant, Christopher J.; Connor, Mark
作者单位:Simon Fraser University; University of Montana System; University of Montana
作者:Turner, Michael S.
作者单位:University of Chicago; University of California System; University of California Los Angeles
作者:Wai, Jonathan; Wai, Maya
作者单位:University of Arkansas System; University of Arkansas Fayetteville; University of Arkansas System; University of Arkansas Fayetteville; University of Arkansas System; University of Arkansas Medical Sciences
作者:Altae-Tran, Han; Kannan, Soumya; Suberski, Anthony J.; Mears, Kepler S.; Demircioglu, F. Esra; Moeller, Lukas; Kocalar, Selin; Oshiro, Rachel; Makarova, Kira S.; Macrae, Rhiannon K.; Koonin, Eugene V.; Zhang, Feng
作者单位:Howard Hughes Medical Institute; Massachusetts Institute of Technology (MIT); Harvard University; Massachusetts Institute of Technology (MIT); Broad Institute; Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); National Institutes of Health (NIH) - USA; NIH National Library of Medicine (NLM)
摘要:Microbial systems underpin many biotechnologies, including CRISPR, but the exponential growth of sequence databases makes it difficult to find previously unidentified systems. In this work, we develop the fast locality-sensitive hashing-based clustering (FLSHclust) algorithm, which performs deep clustering on massive datasets in linearithmic time. We incorporated FLSHclust into a CRISPR discovery pipeline and identified 188 previously unreported CRISPR-linked gene modules, revealing many addit...