Analyzing gene expression data in terms of gene sets: methodological issues
Abstract
Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene ...Continue Reading
References
Citations
CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data
Related Concepts
Trending Feeds
COVID-19
Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.
Neural Activity: Imaging
Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.
The Tendon Seed Network
Tendons are rich in the extracellular matrix and are abundant throughout the body providing essential roles including structure and mobility. The transcriptome of tendons is being compiled to understand the micro-anatomical functioning of tendons. Discover the latest research pertaining to the Tendon Seed Network here.
Myocardial Stunning
Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.
Chronic Fatigue Syndrome
Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.
Incretins
Incretins are metabolic hormones that stimulate a decrease in glucose levels in the blood and they have been implicated in glycemic regulation in the remission phase of type 1 diabetes. Here is the latest research.
Chromatin Regulation and Circadian Clocks
The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.
Long COVID-19
“Long Covid-19” describes illness in patients who are reporting long-lasting effects of the SARS-CoV-19 infection, often long after they have recovered from acute Covid-19. Ongoing health issues often reported include low exercise tolerance and breathing difficulties, chronic tiredness, and mental health problems such as post-traumatic stress disorder and depression. This feed follows the latest research into Long Covid.
Spatio-Temporal Regulation of DNA Repair
DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.