Analysis and comparison of very large metagenomes with fast clustering and functional annotation

BMC Bioinformatics
Weizhong Li


The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "...Continue Reading


Mar 1, 1973·Computer Programs in Biomedicine·D J Park, B E Wright
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jan 27, 1999·Bioinformatics·S R Eddy
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Aug 13, 2004·Environmental Microbiology·Hanno TeelingFrank Oliver Glöckner
Apr 23, 2005·Science·Susannah Green TringeEdward M Rubin
Mar 22, 2006·BMC Bioinformatics·Beltran Rodriguez-BritoRobert A Edwards
Jun 3, 2006·Science·Steven R GillKaren E Nelson
Oct 10, 2006·Nucleic Acids Research·Hideki NoguchiToshihisa Takagi
Nov 9, 2006·PLoS Biology·Florent E AnglyForest Rohwer
Dec 21, 2006·Nature Methods·Alice C McHardyIsidore Rigoutsos
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
Mar 16, 2007·PLoS Biology·Rekha SeshadriMarvin Frazier
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Oct 13, 2007·Nucleic Acids Research·Victor M MarkowitzNikos C Kyrpides
Jan 15, 2008·Nature Methods·Aaron R QuinlanGábor T Marth
Jan 25, 2008·BMC Bioinformatics·Patrick D Schloss, Jo Handelsman
Feb 12, 2008·BMC Genomics·Ramy K AzizOlga Zagnitko
Feb 21, 2008·Nucleic Acids Research·Lutz KrauseJens Stoye
Mar 14, 2008·Nature·Elizabeth A DinsdaleForest Rohwer
Apr 30, 2008·BMC Bioinformatics·Katharina J HoffPeter Meinicke
Oct 9, 2008·PloS One·Daniel C RichterDaniel H Huson
Oct 11, 2008·Nature Biotechnology·Jay Shendure, Hanlee Ji
Oct 11, 2008·PloS One·Weizhong LiAdam Godzik
Oct 23, 2008·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Hideki NoguchiTakehiko Itoh
Nov 20, 2008·Proceedings of the National Academy of Sciences of the United States of America·Rebecca L Vega ThurberForest L Rohwer


Mar 26, 2010·The ISME Journal·Bánk BeszteriStephen J Giovannoni
Jul 27, 2012·The ISME Journal·María-Eugenia GuazzaroniManuel Ferrer
Feb 15, 2013·The ISME Journal·Vladimir SentchiloJan R van der Meer
Jul 10, 2012·Briefings in Bioinformatics·Tulika Prakash, Todd D Taylor
Jul 10, 2012·Briefings in Bioinformatics·Weizhong LiJohn Wooley
Sep 12, 2012·Briefings in Bioinformatics·Hanno Teeling, Frank Oliver Glöckner
Nov 24, 2012·Briefings in Bioinformatics·Carlotta De FilippoDuccio Cavalieri
May 31, 2011·Nucleic Acids Research·Thomas LingnerPeter Meinicke
Sep 20, 2011·Applied and Environmental Microbiology·Kyoung-Ho Kim, Jin-Woo Bae
Jun 26, 2012·Applied and Environmental Microbiology·Vicente Gomez-AlvarezJorge W Santo Domingo
May 9, 2012·Doklady. Biochemistry and Biophysics·V A IvanisenkoN A Kolchanov
Jan 19, 2011·BMC Bioinformatics·Ryan J KellyIddo Friedberg
Jul 17, 2013·BMC Bioinformatics·Youhei NamikiYutaka Akiyama
Jun 26, 2012·BMC Microbiology·Vicente Gomez-AlvarezJorge W Santo Domingo
Mar 3, 2010·PLoS Computational Biology·John C WooleyIddo Friedberg
Oct 4, 2011·PloS One·Thomas C JeffriesJames G Mitchell
Jun 21, 2014·The ISME Journal·Maria G PachiadakiVirginia Edgcomb
Jul 16, 2014·International Journal of Molecular Sciences·Kathrin P AßhauerPeter Meinicke
Jan 7, 2016·Frontiers in Genetics·Alejandra Escobar-ZepedaAlejandro Sanchez-Flores
Mar 10, 2015·Computational and Structural Biotechnology Journal·Florence Abram
May 15, 2013·Molecular Systems Biology·Nicola SegataCurtis Huttenhower
Oct 15, 2013·Environmental Microbiology·Alexander EilerStefan Bertilsson
Jun 10, 2014·Environmental Microbiology·Evelien M AdriaenssensDon A Cowan
Aug 17, 2011·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Chien-Hao SuHuai-Kuang Tsai
Dec 15, 2015·Biotechnology Research International·Satish KumarManoj Pandit Brahmane
Jul 18, 2015·Genomics, Proteomics & Bioinformatics·Rahul Shubhra MandalSantasabuj Das
Apr 9, 2015·Frontiers in Microbiology·Marco J L Coolen, William D Orsi
Dec 6, 2014·Frontiers in Cell and Developmental Biology·Efthymios LadoukakisAristotelis A Chatziioannou
Jul 2, 2014·Frontiers in Plant Science·Thomas J Sharpton
Nov 22, 2014·Nucleic Acids Research·Cuncong ZhongShibu Yooseph
Nov 14, 2015·PLoS Computational Biology·Stephen NayfachThomas J Sharpton
Feb 6, 2013·PloS One·Aaron RuhsMarcus Krüger
Mar 19, 2016·Environmental Microbiology Reports·William D OrsiJennifer F Biddle
Sep 10, 2014·Environmental Science & Technology·Hyatt C GreenOrin C Shanks
Jun 20, 2018·GigaScience·Alessia ViscontiMario Falchi
Aug 10, 2017·Nature Reviews. Gastroenterology & Hepatology·Marcus J ClaessonPaul W O'Toole
Feb 6, 2014·PloS One·Fang LiangRujiao Li
Feb 9, 2019·Frontiers in Microbiology·Javier Tamames, Fernando Puente-Sánchez
Jun 11, 2019·Frontiers in Genetics·Theodoros KoutsandreasAristotelis A Chatziioannou
Feb 5, 2020·Frontiers in Microbiology·Mahasweta SahaFlorian Weinberger
May 18, 2020·Scientific Reports·Glenn D ChristmanJennifer F Biddle

Related Concepts

Pattern Recognition System
Disease Clustering
Determination, Sequence Homology
Sequence Determinations, DNA
Computational Molecular Biology
Gene Clusters

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.