A glossary for big data in population and public health: discussion and commentary on terminology and research methods

Journal of Epidemiology and Community Health
Daniel FullerKevin Stanley


The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods.


Feb 1, 1997·Artificial Intelligence in Medicine·G F CooperP Spirtes
Mar 2, 2005·Biometrics·Patrick J Heagerty, Yingye Zheng
May 4, 2011·Journal of Medical Systems·Illhoi YooLei Hua
May 3, 2012·Clinical Pharmacology and Therapeutics·R HarpazC Friedman
Mar 29, 2013·Environmental Science. Processes & Impacts·Gaurav PandeyLe Jian
Apr 4, 2013·JAMA : the Journal of the American Medical Association·Travis B Murdoch, Allan S Detsky
May 21, 2013·Patient Education and Counseling·Gary L Kreps, Linda Neuhauser
Mar 5, 2014·The Journal of Allergy and Clinical Immunology·Wei WuSally E Wenzel
Mar 15, 2014·Science·David LazerAlessandro Vespignani
Jul 22, 2014·Clinical and Translational Science·Robert M KaplanRussell E Glasgow
Sep 4, 2014·Integrative Biology : Quantitative Biosciences From Nano to Macro·Francesco PampaloniErnst H K Stelzer
Nov 29, 2014·Science·Muin J Khoury, John P A Ioannidis
Jan 1, 2014·Health Information Science and Systems·Wullianallur Raghupathi, Viju Raghupathi
Apr 29, 2015·IEEE Journal of Biomedical and Health Informatics·Michael Cheffena
May 29, 2015·Nature·Yann LeCunGeoffrey Hinton
Jul 15, 2015·IEEE Journal of Biomedical and Health Informatics·Javier Andreu-PerezGuang-Zhong Yang
Sep 24, 2015·PloS One·Daniel Preoţiuc-PietroNikolaos Aletras
Apr 6, 2016·IEEE Transactions on Pattern Analysis and Machine Intelligence·Changxing DingLarry S Davis
Oct 18, 2016·Annals of the New York Academy of Sciences·Arash Shaban-NejadDavid L Buckeridge
Dec 6, 2016·Psychological Methods·Eric Evan Chen, Sean P Wojcik
Sep 20, 2017·Current Opinion in Neurobiology·Leon A GatysMatthias Bethge

Related Concepts

Research Personnel
Research Methodology
Public Entity
Statistical Analysis
Population Group

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

Related Papers

The Lancet. Psychiatry
Ives Cavalcante PassosFlávio Kapczinski
Monographs in Oral Science
C L LongbottomM Fontana
© 2021 Meta ULC. All rights reserved