Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization

Santanu Dam, Kamal Sarkar

Open Access Article Go Back

Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization

Santanu Dam¹ , Kamal Sarkar²

Section:Research Paper, Product Type: Journal Paper
Volume-07 , Issue-01 , Page no. 268-273, Jan-2019

Online published on Jan 20, 2019

Copyright © Santanu Dam, Kamal Sarkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Santanu Dam, Kamal Sarkar, “Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.268-273, 2019.

MLA Style Citation: Santanu Dam, Kamal Sarkar "Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization." International Journal of Computer Sciences and Engineering 07.01 (2019): 268-273.

APA Style Citation: Santanu Dam, Kamal Sarkar, (2019). Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization. International Journal of Computer Sciences and Engineering, 07(01), 268-273.

BibTex Style Citation:
@article{Dam_2019,
author = {Santanu Dam, Kamal Sarkar},
title = {Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {07},
Issue = {01},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {268-273},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=630},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=630
TI - Using WordNet-based Semantic Relatedness Measure for Reducing Redundancy and Improving Multi-document Text Summarization
T2 - International Journal of Computer Sciences and Engineering
AU - Santanu Dam, Kamal Sarkar
PY - 2019
DA - 2019/01/20
PB - IJCSE, Indore, INDIA
SP - 268-273
IS - 01
VL - 07
SN - 2347-2693
ER -

Abstract

Multi-document text summarization (MDS) is a task to generate a single summary from a set of articles related to the same topic or event. Since each input article is related to the same topic or event, the generated summary contains redundant sentences or the sentences that contain almost similar information. This paper presents a sentence similarity measure for reducing redundancy in multi-document summary. Our proposed similarity measure combines the WordNet based semantic sentence similarity measure with the traditional cosine similarity measure. We have conducted our experiments using DUC 2004 benchmark multi-document summarization dataset to judge whether the proposed similarity measure is useful for redundancy removal and improving multi-document text summarization performance or not. Our experiments reveal that our proposed similarity measure is effective for reducing redundancy and improving multi-document text summarization performance.

Key-Words / Index Term

Text Summarization; WordNet; Semantic Relatedness measure; Hybrid Sentence Similarity Measure; Redundancy Removal

References

[1] J. Goldstein, V. Mittal, J. Carbonell, and M.Kantrowitz,“Multi-document summarization by sentence extraction,” In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, pp. 40-48, Association for Computational Linguistics.
[2] V. K. Gupta, T. J Siddiqui,“Multi-document summarization using sentence clustering,” In Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on , pp. 1-5, IEEE, 2012.
[3] K. Sarkar,“A Keyphrase-Based Approach to Text Summarization for English and Bengali Documents,” International Journal of Technology Diffusion (IJTD), 5(2), pp. 28-38, 2014.
[4] K. Sarkar, “Sentence clustering-based summarization of multiple text documents,” Int. J. Comput. Sci. and Commun. Tech, 2(1), pp. 225-235, 2009.
[5] K. Sarkar, and S. Bandyopadhyay,“Generating headline summary from a document set,” In International Conference on Intelligent Text Processing and Computational Linguistics, pp. 649-652, Springer Berlin Heidelberg. 2005
[6] M. Banko, V. Mittal and M. Witbrock,“Headline generation based on statistical Translation,” In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, pp. 318–325, 2000.
[7] D. Zajic, B. Dorr and R. Schwartz, “Automatic Headline Generation for Newspaper Stories,” Workshop on Automatic Summarization. Philadelphia, PA, pp. 78-85, 2002.
[8] A. Rush, S. Chopra, and J. Weston,“A Neural Attention Model for Abstractive Sentence Summarization,” In Procedings of EMNLP 2015
[9] R. Nallapati, B. Zhou, and C. Santos,“Abstractive Text Summarization Using Sequence-to- RNNs and Beyond,” In Computation and Language, 2016.
[10] R. Nallapati, F. Zhai, and B. Zhou,“SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents,”, In Computation and Language, 2016.
[11] P. B. Baxendale,“Man-made index for technical literature—An experiment,” IBM Journal of Research and Development, 2(4), pp. 354–361, 1958.
[12] H. P. Edmundson,“New methods in automatic extracting. Journal of the Association for Computing Machinery,” 16(2), pp. 264–285, 1969.
[13] H. P Luhn,“The automatic creation of literature abstracts,” IBM Journal of Research Development, 2(2), pp. 159–165, 1958.
[14] K. Sarkar,“Using domain knowledge for text summarization in medical domain,” International Journal of Recent Trends in Engineering, 1(1), pp. 200-205, 2009.
[15] K. Sarkar, M. Nasipuri, and S. Ghose, “ Using machine learning for medical document summarization,” International Journal of Database Theory and Application. vol. 4, pp. 31-49, 2011.
[16] K. Sarkar, K. Saraf, A. Ghosh,“Improving graph based multidocument text summarization using an enhanced sentence similarity measure,” In Recent Trends in Information Systems (ReTIS). IEEE 2nd International Conference on, pp. 359-365, 2015. IEEE
[17] D.R. Radev, H. Jing, M. Styś and D.Tam,“Centroid-based summarization of multiple documents,” Information Processing & Management, 40(6), pp.919-938, 2004.
[18] J. G. Carbonell, J. Goldstein,“The use of MMR, diversity-based re-ranking for reordering documents and producing summaries,” In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 335–336, 1998
[19] D. Marcu, L. Gerber,“An inquiry into the nature of multi-document abstracts, extracts, and their evaluation,” In Proceedings of the NAACL-2001 Workshop on Automatic Summarization, Pittsburgh, June. NAACL, pp. 1–8, 2001.
[20] E. Boros, P. B. Kantor, D. J. Neu,“A Clustering Based Approach to Creating Multi-Document Summaries,” In Proceedings of the 24th ACM SIGIR Conference, LA, 2001.
[21] H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise, and X. Zhang,“Cross-document summarization by concept classification,” In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Finland, pp. 121–128, 2002.
[22] M. F. Moens, C. Uyttendaele, and J. Dumortier, “Abstracting of legal cases: the potential of clustering based on the selection of representative objects,” Journal of the American Society for Information Science, 50 (2), pp. 151-161, 1999.
[23] K. Sarkar,“Sentence clustering-based summarization of multiple text documents,” Int. J. Comput. Sci. and Commun. Tech, 2(1), pp. 225-235, 2009.
[24] G. C. Stein, A. Bagga, and G. B. Wise, “Multi-Document Summarization: Methodologies and Evaluations,” In Conference TALN 2000, Lausanne, 2000.
[25] V. Hatzivassiloglou, J. Klavans, and E. Eskin,“Detecting test similarity over short passages: Exploring linguistic feature combinations via machine learning,” In Proceedings of EMNLP,1999.
[26] V. Hatzivassiloglou, J. L. Klavans, M. L. Holcombe, R. Barzilay, M-Y. Kan, and K. R. McKeown,“SimFinder: A Flexible Clustering Tool for Summarization,” NAACL, Workshop on Automatic Summarization, Pittsburgh, PA, 2001.
[27] R. Barzilay, N. Elhadad, K. R. McKeown,“Sentence ordering in multidocument summarization,” In Proceedings of the first international conference on Human language technology research, Association for Computational Linguistics, pp. 1-7, 2001.
[28] G. Erkan, D. R. Radev,“LexRank: graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, pp. 457-479, 2004.
[29] R. Mihalcea, P. Tarau,“TextRank: Bringing order into texts,” In Proceedings of EMNLP2004, 2004.
[30] R. Mihalcea, P. Tarau, “A language independent algorithm for single and multiple document summarization,” In Proceedings of IJCNLP 2005.
[31] G. Erkan, D. R. Radev,“LexPageRank: Prestige in Multi-Document Text Summarization,” In Proceedings of EMNLP, 2004.
[32] C. Fellbaum, editor, “WordNet: An Electronic Lexical Database,” The MIT Press, Cambridge, MA, . 1998.
[33] A. Budanitsky and G. Hirst, “Evaluating WordNet-based measures of lexical semantic relatedness,” Computational Linguistics, 32(1), pp.13-47, 2006.
[34] D. Lin. “An Information-Theoretic Definition of Similarity,” Proc. Int’l Conf. Machine Learning, July 1998.
[35] P. Resnik,“Using information content to evaluate semantic similarity,” In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, Montreal, Canada, 1995.
[36] D. R. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A.Celebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu and J. Otterbacher,“MEAD-A Platform for Multidocument Multilingual Text Summarization,” In LREC, 2004.
[37] C. Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” In Proc. Workshop Text Summarization Branches Out, PostConf.Workshop ACL, pp. 25–26, 2004.
[38] T.Mikolov, I.Sutskever, K.Chen, G.S. Corrado, and J.Dean,“Distributed representations of words and phrases and their compositionality,” In Advances in neural information processing systems, pp. 3111-3119, 2013
[39] Z.Wu and M.Palmer, “Verb semantics and lexical selection,” In Proceedings ofthe 32nd Annual Meeting of the Associations for ComputationalLinguistics, pages 133–138, Las Cruces, NewMexico, 1994

Citations	2325
h-index	16
i10-index	47