Open Access   Article Go Back

Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition

Rehan Khan1 , A.J. Singh2

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-10 , Page no. 235-248, Oct-2018

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v6i10.235248

Online published on Oct 31, 2018

Copyright © Rehan Khan, A.J. Singh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Rehan Khan, A.J. Singh, “Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.10, pp.235-248, 2018.

MLA Style Citation: Rehan Khan, A.J. Singh "Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition." International Journal of Computer Sciences and Engineering 6.10 (2018): 235-248.

APA Style Citation: Rehan Khan, A.J. Singh, (2018). Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition. International Journal of Computer Sciences and Engineering, 6(10), 235-248.

BibTex Style Citation:
@article{Khan_2018,
author = {Rehan Khan, A.J. Singh},
title = {Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2018},
volume = {6},
Issue = {10},
month = {10},
year = {2018},
issn = {2347-2693},
pages = {235-248},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3012},
doi = {https://doi.org/10.26438/ijcse/v6i10.235248}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i10.235248}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3012
TI - Developing & Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition
T2 - International Journal of Computer Sciences and Engineering
AU - Rehan Khan, A.J. Singh
PY - 2018
DA - 2018/10/31
PB - IJCSE, Indore, INDIA
SP - 235-248
IS - 10
VL - 6
SN - 2347-2693
ER -

VIEWS PDF XML
419 292 downloads 342 downloads
  
  
           

Abstract

The web is full of the content which is either in complete or semi unstructured form and retrieving the essential data out of this unstructured form is very difficult so the concept of the information extraction (IE) keeping in view the necessary parameters becomes highly essential. This paper presents a comparative study for how the problem of information extraction can be handled for a dataset by taking the first step towards IE of named entity recognition (NER) into consideration. Various classifiers/techniques and impact of pipeline on some of them is discussed in this paper for NER and based on the results with keeping the due response time into consideration the classifier/technique of conditional random fields for NER serves out to be the best with an average recall and precision of 0.97 each helping in predicting efficiently of whether a given word is a part of the named entity or not. The automation in the field of medical science for search of the patient for clinical trials from the clinical databases serves to be the most important area of concern at the present time & this paper provides an approach for choosing the technique according to parameters, also discussing the results of the novel algorithmic approach.

Key-Words / Index Term

Information extraction, Natural language processing, Named entity recognition, Conditional random fields

References

[1] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. “Improving Word Representations via Global Context and Multiple Word Prototypes.” In ACL, 2012.
[2] Charles Sutton and Andrew McCallum, “An Introduction to Conditional Random Fields”, Foundations and Trends in Machine Learning, Vol. 4, No.4, 267-373, 2012.
[3] Paul Anderson, Aspen Olmsted, Gayathri Parthasarathy, “NLP Pipeline for Temporal Information Extraction & Classification from free text Eligibility Criteria”, International Conference on Information Society, IEEE, 2016.
[4] Dekai Wu, Grace Ngai, Marine Carpuat, Jeppe Larsen, and Yongsheng Yang. “Boosting for named entity recognition.” In Dan Roth and Antal van den Bosch, editors, Proc. 6th Conf. on Computational Natural Language Learning (CoNLL), 2002.
[5] Yefeng Wang and Jon Patrick. “Cascading classifiers for named entity recognition in clinical notes.”In Proc. Workshop on Biomedical Information Extraction (WBIE), pages 42-49, 2009.
[6] Rehan Khan and A.J. Singh, "NLP: A Comparative Study with Algorithmic Approach for Information Extraction", International Journal of Emerging Technologies and Innovative Research, Vol.5, Issue 9, page no.970-979, September-2018.
[7] D. Koller and N. Friedman, “Probabilistic Graphical Models: Principles and Techniques”, MIT Press, 2009.
[8] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. “Natural Language Processing (Almost) from Scratch.” Journal of Machine Learning Research, 12:2493,2537, 2011.
[9] C. Sutton and A. McCallum, “Piecewise training for structured prediction,” Machine Learning, vol. 77, no. 2–3, no. 2–3, pp. 165–194, 2009.
[10] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost: Joint appearance, shape and context modeling for mulit-class object recognition and segmentation,” in European Conference on Computer Vision (ECCV), 2006.
[11] C. Sutton, K. Rohanimanesh, and A. McCallum, “Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data,” in International Conference on Machine Learning (ICML), 2004.
[12] Yoav Freund and Robert E. Schapire. “Large margin classication using the perceptron algorithm.”Machine Learning, 37(3):277 to 296, 1999.
[13] F. C. Peng and A. McCallum, “Accurate information extraction from research papers using conditional random fields,” in HLT-NAACL 2004: Main Proceedings, pp. 329–336, Association for Computational Linguistics, Boston, Mass, USA, 2004.
[14] A. Torralba, K. P. Murphy, and W. T. Freeman, “Contextual models for object detection using boosted random fields,” in Advances in Neural Information Processing Systems, vol. 17, pp. 1401–1408, 2005.
[15] Crammer K., Dekel O., Keshet J., Shalev-Shwartz S., Singer Y., “Online Passive-Aggressive Algorithms”, Journal of Machine Learning Research 7 (2006) 551–585.
[16] V. Vineet, J. Warrell, P. Sturgess, and P. H. S. Torr, “Improved initialisation and Gaussian mixture pairwise terms for dense random fields with mean-field inference,” in Proceedings of the 23rd British Machine Vision Conference (BMVC ’12), Surrey, UK, September 2012.
[17] S. Kumar and M. Hebert, “Discriminative random fields,” International Journal of Computer Vision, vol. 68, no. 2, pp. 179–201, 2006.
[18] N. Piatkowski and K. Morik, “Parallel loopy belief propagation in conditional random fields,” in Proceedings of the KDML Workshop of the LWA, Magdeburg, Germany, 2011.
[19] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, “Building a large annotated corpus of English: The Penn Treebank,” Computational Linguistics, vol. 19, no. 2, no. 2, pp. 313–330, 1993.
[20] GMB (Groningen Meaning Bank) corpus, http://gmb.let.rug.nl/.
[21] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
[22] Confusion Matrix, http:// scikit-learn.org/ stable/ modules/ generated/sklearn.metrics.confusion_matrix.html
[23] Scikit-learn, http://scikit-learn.org/ stable/ auto_examples/ model_selection/ plot_precision_recall.html # sphx-glr-auto-examples-model-selection-plot-precision-recall-py
[24] I. Sutskever and T. Tieleman, “On the convergence properties of contrastive divergence,” in Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[25] Erik F. Tjong Kim Sang. “Introduction to the CoNLL-2002 shared task:Language-independent named entity recognition.”In Dan Roth and Antal van den Bosch, editors, Proc. 6th Conf. on Computational Natural Language Learning (CoNLL), pages 155 to 158, 2002.