Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

Computers in Biology and Medicine
Pedro J García-LaencinaNoémia Afonoso

Abstract

Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment. This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (III) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation...Continue Reading

References

Mar 9, 1989·The New England Journal of Medicine·G M ClarkW L McGuire
Feb 1, 1997·Artificial Intelligence in Medicine·G F CooperP Spirtes
Jun 8, 2001·Bioinformatics·O TroyanskayaR B Altman
May 17, 2005·Computers in Biology and Medicine·Mia K MarkeyDavid M DeLong
Jul 20, 2010·Artificial Intelligence in Medicine·José M JerezLeonardo Franco
Dec 3, 2011·Artificial Intelligence in Medicine·Anneleen DaemenBart De Moor
Dec 14, 2011·Computers in Biology and Medicine·Fatemeh DorriFaezeh Dorri
Feb 23, 2013·Artificial Intelligence in Medicine·Federico CismondiStan N Finkelstein
Sep 17, 2013·Computers in Biology and Medicine·Jemal AbawajyHerbert F Jelinek
Jan 9, 2014·CA: a Cancer Journal for Clinicians·Rebecca SiegelAhmedin Jemal

❮ Previous
Next ❯

Citations

Oct 2, 2015·Journal of Biomedical Informatics·Miriam Seoane SantosArmando Carvalho
Dec 20, 2017·Health Information Science and Systems·Mohamed S BarakatDavid Thwaites
Sep 15, 2017·PloS One·Chip M LynchHermann B Frieboes
May 11, 2018·Indian Journal of Critical Care Medicine : Peer-reviewed, Official Publication of Indian Society of Critical Care Medicine·Amrish Saxena, Shrikant V Meshram
May 29, 2020·Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine·Ahmad AbujaberAyman El-Menyar
Mar 19, 2020·Database : the Journal of Biological Databases and Curation·Zeeshan AhmedXinQi Dong
Feb 16, 2019·Scientific Reports·Peilin LiYang Liu
Dec 16, 2020·BMC Medical Informatics and Decision Making·Ahmad AbujaberAyman El-Menyar
Feb 28, 2019·Entropy·Jaime Salvador-MenesesJose Garcia-Rodriguez
Apr 4, 2021·International Journal of Environmental Research and Public Health·Maikel Luis KollingLeonel Pablo Carvalho Tedesco
Sep 4, 2020·Journal of Biomedical Informatics·Ishleen KaurTanvir Ahmad

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.