Open Access   Article Go Back

Information Extraction from Unstructured Documents

R.Jayanthi 1 , D.Nirmala 2

Section:Research Paper, Product Type: Journal Paper
Volume-07 , Issue-05 , Page no. 146-151, Mar-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7si5.146151

Online published on Mar 10, 2019

Copyright © R.Jayanthi, D.Nirmala . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: R.Jayanthi, D.Nirmala, “Information Extraction from Unstructured Documents,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.05, pp.146-151, 2019.

MLA Style Citation: R.Jayanthi, D.Nirmala "Information Extraction from Unstructured Documents." International Journal of Computer Sciences and Engineering 07.05 (2019): 146-151.

APA Style Citation: R.Jayanthi, D.Nirmala, (2019). Information Extraction from Unstructured Documents. International Journal of Computer Sciences and Engineering, 07(05), 146-151.

BibTex Style Citation:
@article{_2019,
author = {R.Jayanthi, D.Nirmala},
title = {Information Extraction from Unstructured Documents},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {07},
Issue = {05},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {146-151},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=822},
doi = {https://doi.org/10.26438/ijcse/v7i5.146151}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.146151}
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=822
TI - Information Extraction from Unstructured Documents
T2 - International Journal of Computer Sciences and Engineering
AU - R.Jayanthi, D.Nirmala
PY - 2019
DA - 2019/03/10
PB - IJCSE, Indore, INDIA
SP - 146-151
IS - 05
VL - 07
SN - 2347-2693
ER -

           

Abstract

In todays scenario the organization of textual information has become a necessity due to the availability of various digital information. The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, makes the information contained in the text accessible to the various data mining (statistical and machine learning) algorithm. Information Extraction is the technique of automatically extracting information from unstructured and/or semi-structured machine-readable documents. An Information Extraction system target a specific topic or domain based on the user’s interest and searches for information that has more reliance to the domain. Information Extraction tools make it possible to pull information from text document, database, websites or multiple sources. Information Extraction depends on named entity recognition, a sub-tool used to find targets information to extract. This paper presents the review of various Information Extraction techniques such as Supervised, Unsupervised and Semi-supervised Information Extraction and its application.

Key-Words / Index Term

Text Mining, Information Extraction, Machine Learning, Supervised, Unsupervised, Semi-supervised

References

[1] K. Thilagavathi, V. Shanmuga Priya, “A SURVEY ON TEXT MINING TECHNIQUES”, International Journal of Research in Computer Applications and Robotics, ISSN 2320-7345.
[2] Vishal Gupta, Gurpreet S. Lehal, “A SURVEY OF TEXT MINING TECHNIQUES AND APPLICATION”, Journal of Emerging technologies in web intelligence, Vol.1, No. 1 August 2009.
[3] R. Sagayam, S. Srinivasan, S. Roshni, “A Survey of Text Mining: Retrieval, Extraction and Techniques, “International Journal of Computational Engineering Research (ijceronline.com) Vol.2 Issue.5, ISSN 2250-3005(online), September|2012.
[4] R. Janani, Dr. S. Vijayarani, “TEXT MINING RESEARCH: A SURVEY”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 4, Issue 4, April 2016.
[5] Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan, “Supervised Open Information Extraction”.
[6] Doug Downeva, Oren Etzionib, Stephen Soderlandb, “Analysis of a Probabilistic Model of Redudancy in Unsupervised Information Extraction”
[7] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. “Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91-134, 2005.
[8] O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. “Methods for domain-independent information extraction from the web: An experimental comparison.”, In Procs. Of the 19th National Conference on Artificial Intelligence (AAAI-04), pages 391- 398, San Jose, California, 2004.
[9] D. Freitag and A. McCallum, “Information Extraction with HMMs and shrinkage”, In proceedings of the AAAI-99 workshop on machine learning for information Extraction, Orlando, Florida, 1999.
[10] M. Banko, M. Cafarello, S. Soderland, M. Breadhead and O. Etzioni, “Open information extraction from the web, In procs, of IJCAI, 2007.
[11] M. Banko and O. Etzioni, “the tradeofits between traditional and open relation extraction”, In proceedings of ACL, 2008.
[12] Xiao Li, Ye-Yi Wang, Alex Acero, “Extracting Structure Information from User Queries with Semi-Supervised Conditional Random Fields”.
[13] C. Barr, R. Jones, and M. Regelson, “The linguistic structure of English web-search queries”, In proceedings of the 2008 conference on Empirical marhods in Natural Language Processing, page 1021-1030, 2008.
[14] P. Viola and M. Narasimhand, “Learning to extract information from semi-supervised text, using a discriminative context free grammer, In SIGIR’05: proceedings of the 28th annual International ACM SIGIR conference on Research and development in information retrieval, page 330-337, 2005.
[15] J. Zhu, B. Zhang, Z. Nie, J-R, wen, and H.W. Hon, ”Webpage understanding: an intergrated approach”, In proceeding of the 13th ACM SIGKDD international conference on knowledge Discovery and Data Mining, pages 903-912, 2007.
[16] T.-L. Wong, W. Lam, and T.-S. wong, “An unsupervised framework for extracting and normalizing product attributes from multiple websites”, In proceedings of the 31st annual International ACM SIGIR conference on Research and development in Information Retrieval, pages 35-42, 2008.
[17] Jinxiu Chen, Donghong Ji, Chew Lim Tan, Zhengyu Niu, “Relation Extraction Using Label Propagation Based Semi-Supervised Learning”.
[18] Jie Tang, Mingcal Hong, Duo Zhang, Bangyong Liang, and Juanzi Li, “Information Extraction: Methodologies and Applications”.
[19] Andrew Kehler, Jerry R. Hobbs, Douglas Applet, John Bear, Matthew Caywood, David Israel, Megumi Kameyama, David Martin, and Claire Monteleoni, “Information Extraction Research and applications: current progress and future directions”.