Image Caption Generation: A Comprehensive Survey
Sailee P. Pawaskar1 , J. A. Laxminarayana2
- Computer Engineering Department, Goa College Of Engineering, Goa University, Farmagudi-Ponda, Goa, India.
- Computer Engineering Department, Goa College Of Engineering, Goa University, Farmagudi-Ponda, Goa, India.
Section:Survey Paper, Product Type: Journal Paper
Volume-6 ,
Issue-3 , Page no. 230-234, Mar-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6i3.230234
Online published on Mar 30, 2018
Copyright © Sailee P. Pawaskar, J. A. Laxminarayana . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: Sailee P. Pawaskar, J. A. Laxminarayana, “Image Caption Generation: A Comprehensive Survey,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.3, pp.230-234, 2018.
MLA Style Citation: Sailee P. Pawaskar, J. A. Laxminarayana "Image Caption Generation: A Comprehensive Survey." International Journal of Computer Sciences and Engineering 6.3 (2018): 230-234.
APA Style Citation: Sailee P. Pawaskar, J. A. Laxminarayana, (2018). Image Caption Generation: A Comprehensive Survey. International Journal of Computer Sciences and Engineering, 6(3), 230-234.
BibTex Style Citation:
@article{Pawaskar_2018,
author = {Sailee P. Pawaskar, J. A. Laxminarayana},
title = {Image Caption Generation: A Comprehensive Survey},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2018},
volume = {6},
Issue = {3},
month = {3},
year = {2018},
issn = {2347-2693},
pages = {230-234},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1788},
doi = {https://doi.org/10.26438/ijcse/v6i3.230234}
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i3.230234}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1788
TI - Image Caption Generation: A Comprehensive Survey
T2 - International Journal of Computer Sciences and Engineering
AU - Sailee P. Pawaskar, J. A. Laxminarayana
PY - 2018
DA - 2018/03/30
PB - IJCSE, Indore, INDIA
SP - 230-234
IS - 3
VL - 6
SN - 2347-2693
ER -
VIEWS | XML | |
1025 | 528 downloads | 305 downloads |
Abstract
From the viewpoint of humans and computers, images could be interpreted in different ways. In case of humans, an image could be simply some description or scene of an action or environment etc.; while with respect to computers, it is just some combination of pixels or digital numbers. The process of Image Captioning deals with assigning internal data in the form of captions or keywords to a digital image. This paper is a comprehensive survey of different methodologies to generate appropriate image captions. Here, we have compared various approaches available for implementation of image captioning. We have also described the evaluation metrics that could be used by such systems. Appropriate captions will assist the users to search images with long queries. Automatic image captioning could also be useful for visually impaired people in understanding pictures.
Key-Words / Index Term
Automatic image captioning, Deep CNN, Hidden Markov Model, LSTM, Neural Network, RNN
References
[1] Moses Soh, "Learning CNN-LSTM Architectures for Image Caption Generation ", 2016.
[2] Mathews, Alexander & Xie, Lexing & He, Xuming, " SentiCap: Generating Image Descriptions with Sentiments", 2015.
[3] Jianhui Chen, Wenqiang Dong, Minchen Li, "Image Caption Generator Based On Deep Neural Networks".
[4] X. Chen and C. L. Zitnick, "Mind`s eye: A recurrent visual representation for image caption generation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 2422-2431.
[5] J. Donahue et al., "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, Issue 4, pp. 677-691, April 1 2017.
[6] H. Fang et al., "From captions to visual concepts and back," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1473-1482.
[7] A. Karpathy and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image Descriptions," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 664-676, April 1, 2017.
[8] V. B. Kumar, T. R. Baadkar, and V. Joshi, "CRYPTANITE: A New Look to the World of Social Networks Using Deep Learning," 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Naples, 2016, pp. 358-364.
[9] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164.
[10] Geetika, Tulsi Jain, “Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective”, International Journal of Computer Sciences and Engineering, Vol. 5, Issue.10, pp.260-265, 2017.
[11] Arnab Ghoshal, Pavel Ircing, Sanjeev Khudanpur "Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video".
[12] Zajic R. Schwartz, D & Door, B & Schwartz, Richard "Automatic Headline Generation for Newspaper Stories", 2018.
[13] PHILO SUMI , ANU.T.P " A Systematic Approach for News Caption Generation", International Journal of Advanced Research in Computer Science & Technology (IJARCST 2014), Vol. 2, Issue 2, Ver. 1 (April - June 2014)
[14] K. Ramnath et al., "AutoCaption: Automatic caption generation for personal photos," IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, 2014, pp. 1050-1057.
[15] K. Shivdikar, A. Kak, and K. Marwah, "Automatic image annotation using a hybrid engine," 2015 Annual IEEE India Conference (INDICON), New Delhi, 2015, pp. 1-6.
[16] D. J. Kim, D. Yoo, B. Sim and I. S. Kweon, "Sentence learning on deep convolutional networks for image Caption Generation," 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Xi`an, 2016, pp. 246-247.