An index-based algorithm for fast on-line query processing of latent semantic analysis

PloS One
Mingxi ZhangWei Wang

Abstract

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumul...Continue Reading

References

Feb 12, 2008·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society·A Levey, M Lindenbaum
May 14, 2014·PloS One·Liang TangXiyuan Hu
Aug 15, 2014·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society·Minsik Lee, Chong-Ho Choi
Apr 25, 2015·IEEE Transactions on Neural Networks and Learning Systems·Xin LuoAhmed Alabdulwahab
Nov 30, 2010·Multivariate Behavioral Research·Peter F Halpin, Michael D Maraun
Jul 19, 2016·Journal of Bioinformatics and Computational Biology·Javier GonzálezGabriel Martos
Sep 1, 2016·Computational Intelligence and Neuroscience·Wen ZhangSiguang Zhang
Sep 17, 2016·Computational and Mathematical Methods in Medicine·Xi WuJiliu Zhou
Mar 9, 2017·Conference Proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society· Juyoung Park Kyungtae Kang

❮ Previous
Next ❯

Software Mentioned

Windows
Visual
ILSA
SimRank
QUIC
Studio
LSA
Compute Unified Device Architecture ( CUDA )

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

Proceedings of the ... SIAM International Conference on Data Mining
Jingbo ShangJiawei Han
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Mohammad Shafkat AminHasan M Jamil
Evolutionary Bioinformatics Online
Jason T L WangWilliam H Piel
Computational Biology and Chemistry
In-Seon JeongHyeong-Seok Lim
© 2022 Meta ULC. All rights reserved