Video Captioning by Adversarial LSTM

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
Yang YangYanli Ji

Abstract

In this paper, we propose a novel approach to video captioning based on adversarial learning and Long-Short Term Memory (LSTM). With this solution concept we aim at compensating for the deficiencies of LSTM-based video captioning methods that generally show potential to effectively handle temporal nature of video data when generating captions, but that also typically suffer from exponential error accumulation. Specifically, we adopt a standard Generative Adversarial Network (GAN) architecture, characterized by an interplay of two competing processes: a "generator", which generates textual sentences given the visual content of a video, and a "discriminator" which controls the accuracy of the generated sentences. The discriminator acts as an "adversary" towards the generator and with its controlling mechanism helps the generator to become more accurate. For the generator module, we take an existing video captioning concept using LSTM network. For the discriminator, we propose a novel realization specifically tuned for the video captioning problem and taking both the sentences and video features as input. This leads to our proposed LSTM-GAN system architecture, for which we show experimentally to significantly outperform the exist...Continue Reading

References

Oct 23, 1997·Neural Computation·S Hochreiter, J Schmidhuber
Sep 25, 2014·IEEE Transactions on Cybernetics·Yang YangHeng Tao Shen
Aug 24, 2016·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society·Jingkuan SongNicu Sebe
Jun 24, 2017·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society·Mengqiu HuXuelong Li
Sep 8, 2017·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society· Mengqiu Hu Heng Tao Shen
Jul 12, 2018·IEEE Transactions on Cybernetics·Yi BinXuelong Li

❮ Previous
Next ❯

Citations

Jul 17, 2018·IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society·Mingxing ZhangTat-Seng Chua
Jul 1, 2020·Journal of X-ray Science and Technology·Sivamurugan Vellakani, Indumathi Pushbam
Aug 1, 2020·Sensors·Calvin Janitra Halim, Kazuhiko Kawamoto
Aug 27, 2021·PeerJ. Computer Science·Md Mushfiqur RahmanFazlul Hasan Siddiqui

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

IEEE Transactions on Neural Networks and Learning Systems
Jingkuan SongHeng Tao Shen
Neural Networks : the Official Journal of the International Neural Network Society
Yu-Jun ZhengSheng-Yong Chen
Proceedings of the ... ACM International Conference on Multimedia, with Co-located Symposium & Workshops
Xinyu LiRandall S Burd
Conference Proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Shota HaradalSeiichi Uchida
© 2022 Meta ULC. All rights reserved