Open Access   Article Go Back

Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective

Geetika 1 , Tulsi Jain2

  1. Dept of CSE, National Institute of Technology, Kurukshetra, India.
  2. Dept. of CSE, Indian Institute of Technology, Delhi, India.

Correspondence should be addressed to: geetika.jain220694@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-5 , Issue-10 , Page no. 260-265, Oct-2017

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v5i10.260265

Online published on Oct 30, 2017

Copyright © Geetika, Tulsi Jain . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

IEEE Style Citation: Geetika, Tulsi Jain, “Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.10, pp.260-265, 2017.

MLA Style Citation: Geetika, Tulsi Jain "Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective." International Journal of Computer Sciences and Engineering 5.10 (2017): 260-265.

APA Style Citation: Geetika, Tulsi Jain, (2017). Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective. International Journal of Computer Sciences and Engineering, 5(10), 260-265.

BibTex Style Citation:
@article{Jain_2017,
author = {Geetika, Tulsi Jain},
title = {Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2017},
volume = {5},
Issue = {10},
month = {10},
year = {2017},
issn = {2347-2693},
pages = {260-265},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1510},
doi = {https://doi.org/10.26438/ijcse/v5i10.260265}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v5i10.260265}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1510
TI - Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective
T2 - International Journal of Computer Sciences and Engineering
AU - Geetika, Tulsi Jain
PY - 2017
DA - 2017/10/30
PB - IJCSE, Indore, INDIA
SP - 260-265
IS - 10
VL - 5
SN - 2347-2693
ER -

VIEWS PDF XML
616 449 downloads 346 downloads
  
  
           

Abstract

This paper proposes a novel approach for image caption generation. Being able to describe the content of an image in natural language sentences is a challenging task, but it could have great impact because great amount of resources is required to meet the demands of vast availability of image dataset. The growing importance of image captioning is commensurate with requirement of image based searching, image understanding for visual impaired person etc. In this paper, we develop a model based on deep recurrent neural network that generates brief statement to describe the given image. Our models use a convolutional neural network (CNN) to extract features from an image. We used ranking objective to pay attention to subtle difference between the similar images to generate discriminatory captions. MS COCO dataset is used, nearly half of the dataset for training and one fourth of dataset for each validation and testing. For every image five captions are provided to train the model Our model consistently outperforms other models with on ranking objective. We evaluated our model based on BLEU, METEOR and CIDEr scores.

Key-Words / Index Term

Visual Geometry Group, Long Short Term Memory, Ranking Objective, Image Captioning

References

[1] Andrej Karpathy and Li Fei-Fei, “Deep visual-semantic alignments for generating image descriptions”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128–3137, 2015.
[2] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation”, arXiv preprint arXiv:1406.1078, 2014.
[3] Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel, “Multimodal neural language models”. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 595–603, 2014
[4] Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille, “Deep captioning with multimodal recurrent neural networks (m-rnn)”, arXiv preprint arXiv:1412.6632, 2014.
[5] Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel, “Unifying visual-semantic embeddings with multimodal neural language models”, arXiv preprint arXiv:1411.2539, 2014.
[6] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan, “Show and tell: A neural image caption generator”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164, 2015.
[7] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “Imagenet: A large-scale hierarchical image database”. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
[8] Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition”, arXiv preprint arXiv:1409.1556, 2014.
[9] Hochreiter, Sepp, and Jrgen Schmidhuber, “Long Short-Term Memory”, Neural Computation 9.8 (1997): 1735-780. Web. 23 Apr. 2016
[10] Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollr, and C. Lawrence Zitnick, “Microsoft COCO: Common Objects in Context” Computer Vision ECCV 2014 Lecture Notes in Computer Science (2014): 740-55. Web. 27 May 2016
[11] Papineni, Kishore, Salim Roukos, ToddWard, Wei-Jing Zhu, Bleu: a method for automatic evaluation of machine translation” Proceedings of the 40th Annual Meeting on Association for Computation Linguistics (ACL): 311-318 (2002). Web. 24 May 2016
[12] Karpathy, Andrej, and Li Fei-Fei, “Deep Visual-semantic Alignments for Generating Image Descriptions” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). Web. 29 May 2016
[13] Chen, Xinlei and C. Lawrence Zitnick, “Learning a Recurrent Visual Representation for Image Caption Generation”, CoRR abs/1411.5654 (2014). Web. 19 May 2016
[14] Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, and Kate Saenko, “Long-term Recurrent Convolutional Networks for Visual Recognition and Description”, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). Web. 20 Apr. 2016