Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective
Geetika 1 , Tulsi Jain2
- Dept of CSE, National Institute of Technology, Kurukshetra, India.
- Dept. of CSE, Indian Institute of Technology, Delhi, India.
Correspondence should be addressed to: geetika.jain220694@gmail.com.
Section:Research Paper, Product Type: Journal Paper
Volume-5 ,
Issue-10 , Page no. 260-265, Oct-2017
CrossRef-DOI: https://doi.org/10.26438/ijcse/v5i10.260265
Online published on Oct 30, 2017
Copyright © Geetika, Tulsi Jain . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Citation
IEEE Style Citation: Geetika, Tulsi Jain, “Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.10, pp.260-265, 2017.
MLA Citation
MLA Style Citation: Geetika, Tulsi Jain "Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective." International Journal of Computer Sciences and Engineering 5.10 (2017): 260-265.
APA Citation
APA Style Citation: Geetika, Tulsi Jain, (2017). Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective. International Journal of Computer Sciences and Engineering, 5(10), 260-265.
BibTex Citation
BibTex Style Citation:
@article{Jain_2017,
author = {Geetika, Tulsi Jain},
title = {Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2017},
volume = {5},
Issue = {10},
month = {10},
year = {2017},
issn = {2347-2693},
pages = {260-265},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1510},
doi = {https://doi.org/10.26438/ijcse/v5i10.260265}
publisher = {IJCSE, Indore, INDIA},
}
RIS Citation
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v5i10.260265}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1510
TI - Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective
T2 - International Journal of Computer Sciences and Engineering
AU - Geetika, Tulsi Jain
PY - 2017
DA - 2017/10/30
PB - IJCSE, Indore, INDIA
SP - 260-265
IS - 10
VL - 5
SN - 2347-2693
ER -
![]() |
![]() |
![]() |
616 | 449 downloads | 346 downloads |




Abstract
This paper proposes a novel approach for image caption generation. Being able to describe the content of an image in natural language sentences is a challenging task, but it could have great impact because great amount of resources is required to meet the demands of vast availability of image dataset. The growing importance of image captioning is commensurate with requirement of image based searching, image understanding for visual impaired person etc. In this paper, we develop a model based on deep recurrent neural network that generates brief statement to describe the given image. Our models use a convolutional neural network (CNN) to extract features from an image. We used ranking objective to pay attention to subtle difference between the similar images to generate discriminatory captions. MS COCO dataset is used, nearly half of the dataset for training and one fourth of dataset for each validation and testing. For every image five captions are provided to train the model Our model consistently outperforms other models with on ranking objective. We evaluated our model based on BLEU, METEOR and CIDEr scores.
Key-Words / Index Term
Visual Geometry Group, Long Short Term Memory, Ranking Objective, Image Captioning
References
[1] Andrej Karpathy and Li Fei-Fei, “Deep visual-semantic alignments for generating image descriptions”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128–3137, 2015.
[2] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation”, arXiv preprint arXiv:1406.1078, 2014.
[3] Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel, “Multimodal neural language models”. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 595–603, 2014
[4] Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille, “Deep captioning with multimodal recurrent neural networks (m-rnn)”, arXiv preprint arXiv:1412.6632, 2014.
[5] Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel, “Unifying visual-semantic embeddings with multimodal neural language models”, arXiv preprint arXiv:1411.2539, 2014.
[6] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan, “Show and tell: A neural image caption generator”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164, 2015.
[7] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “Imagenet: A large-scale hierarchical image database”. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
[8] Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition”, arXiv preprint arXiv:1409.1556, 2014.
[9] Hochreiter, Sepp, and Jrgen Schmidhuber, “Long Short-Term Memory”, Neural Computation 9.8 (1997): 1735-780. Web. 23 Apr. 2016
[10] Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollr, and C. Lawrence Zitnick, “Microsoft COCO: Common Objects in Context” Computer Vision ECCV 2014 Lecture Notes in Computer Science (2014): 740-55. Web. 27 May 2016
[11] Papineni, Kishore, Salim Roukos, ToddWard, Wei-Jing Zhu, Bleu: a method for automatic evaluation of machine translation” Proceedings of the 40th Annual Meeting on Association for Computation Linguistics (ACL): 311-318 (2002). Web. 24 May 2016
[12] Karpathy, Andrej, and Li Fei-Fei, “Deep Visual-semantic Alignments for Generating Image Descriptions” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). Web. 29 May 2016
[13] Chen, Xinlei and C. Lawrence Zitnick, “Learning a Recurrent Visual Representation for Image Caption Generation”, CoRR abs/1411.5654 (2014). Web. 19 May 2016
[14] Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, and Kate Saenko, “Long-term Recurrent Convolutional Networks for Visual Recognition and Description”, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). Web. 20 Apr. 2016