FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets

BMC Research Notes
Anna Shcherbina

Abstract

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS d...Continue Reading

References

Sep 12, 2015·Bioinformatics·Karel BřindaGregory Kucherov
May 12, 2016·Bioinformatics·Bianca K StöckerSven Rahmann
Jun 21, 2016·Nature Reviews. Genetics·Merly EscalonaDavid Posada
Jul 18, 2018·Bioinformatics·Hadrien GourléErik Bongcam-Rudloff
Dec 24, 2019·Briefings in Functional Genomics·Shatha AlosaimiEmile R Chimusa
Feb 26, 2019·Future Medicinal Chemistry·Ayesha Zainab Beg, Asad U Khan
Jan 5, 2017·BMC Bioinformatics·Robin KobusBertil Schmidt
Feb 10, 2019·Microbiome·Adrian FritzAlice C McHardy

Citations

Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Oct 9, 2008·PloS One·Daniel C RichterDaniel H Huson
Jul 28, 2009·Nature·Harris H WangGeorge M Church
Dec 27, 2011·Bioinformatics·Weichun HuangGabor T Marth
Mar 22, 2012·Nucleic Acids Research·Florent E AnglyGene W Tyson
Jun 13, 2012·Nature Methods·Nicola SegataCurtis Huttenhower
Nov 7, 2012·Bioinformatics·Yukiteru OnoMichiaki Hamada

Related Concepts

In Silico
Bio-Informatics
Sequencing
Massively-Parallel Sequencing
Mimic brand of tebufenozide
Base Pairing
High-Throughput Nucleotide Sequencing
Silo (Dataset)
Genetic Engineering
Point Mutation

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

HLA Genetic Variation

HLA genetic variation has been found to confer risk for a wide variety of diseases. Identifying these associations and understanding their molecular mechanisms is ongoing and holds promise for the development of therapeutics. Find the latest research on HLA genetic variation here.

Super-resolution Microscopy

Super-resolution microscopy is the term commonly given to fluorescence microscopy techniques with resolutions that are not limited by the diffraction of light. Here are the latest discoveries pertaining to super-resolution microscopy.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.

CD4/CD8 Signaling

Cluster of differentiation 4 and 8 (CD8 and CD8) are glycoproteins founds on the surface of immune cells. Here is the latest research on their role in cell signaling pathways.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.