Open Access   Article Go Back

Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing

Prachi Rayate1 , Devendra Singh Thakore2

Section:Research Paper, Product Type: Journal Paper
Volume-4 , Issue-7 , Page no. 44-50, Jul-2016

Online published on Jul 31, 2016

Copyright © Prachi Rayate, Devendra Singh Thakore . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Prachi Rayate, Devendra Singh Thakore, “Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing,” International Journal of Computer Sciences and Engineering, Vol.4, Issue.7, pp.44-50, 2016.

MLA Style Citation: Prachi Rayate, Devendra Singh Thakore "Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing." International Journal of Computer Sciences and Engineering 4.7 (2016): 44-50.

APA Style Citation: Prachi Rayate, Devendra Singh Thakore, (2016). Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing. International Journal of Computer Sciences and Engineering, 4(7), 44-50.

BibTex Style Citation:
@article{Rayate_2016,
author = {Prachi Rayate, Devendra Singh Thakore},
title = {Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {7 2016},
volume = {4},
Issue = {7},
month = {7},
year = {2016},
issn = {2347-2693},
pages = {44-50},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=998},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=998
TI - Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing
T2 - International Journal of Computer Sciences and Engineering
AU - Prachi Rayate, Devendra Singh Thakore
PY - 2016
DA - 2016/07/31
PB - IJCSE, Indore, INDIA
SP - 44-50
IS - 7
VL - 4
SN - 2347-2693
ER -

VIEWS PDF XML
1753 1571 downloads 1519 downloads
  
  
           

Abstract

In software documentation, product knowledge and software requirement are very important to improve product quality. Reading of whole documentation of large corpus cannot be possible by developers in maintenance stage. They need to receive software documentation entities i.e. (development, designing and testing etc.) in a short period of time. In software documentation an important documents are able to record. There exists a space between information which developer wants and software documentation. This difference can be experimental whenever developers effort to discover the accurate information in the correct form at the exact time. To solve this problem, an approach for extracting relevant task of the documentation under four phases of software entities (i.e. documentation, development, testing and other etc.) is described. The main idea is task extracted from the software documentation, freeing the developer easily get the required data from software documentation with customize portal using Natural Language Processing (NLP) and then the category of task can be generated easily from existing applications. The machine learning approach that is based on supervised learning technique for training dataset in the form of text files based on text mining. Our approach use WordNet library to identify relevant tasks for calculating frequency of each word which allows developers in a piece of software to discover the word usage and also assigning Part-of Speech (POS) to each word. The result shows that task is extracted by calculating how many sentences, tokens and tasks appearing in a document and also shows task is relevant or not. It also reduced a live space between information which developers want and software documentation. This is used to improve the performance of system by taking feedback of developers. The result is identified through customize portal which helps to developers easily get information in a short period of time. The system is 80% precise to extract task by taking feedback of developers in the form of comment.

Key-Words / Index Term

Natural language processing, text mining, part-of-speech tagging, text files, machine learning techniques, WordNet library

References

[1] Christoph Treude, Martin P. Robillard, and Barth_el_emy Dagenais ,”Extracting Development Task To Navigate Software Documentation” in Proc, IEEE Soft,Vol.41 No.6,2015,pp,565-581, June 2015.
[2] S. Gupta, S. Malik, L. Pollock, and K. Vijay-Shanker, “Part-of speech tagging of program identifiers for improved text-based software engineering tools,” in Proc. 21st IEEE Int. Conf. Program Comprehension, pp. 3–12,2013 .
[3] M. Barouni-Ebrahimi and A. A. Ghorbani, “On query completion in web search engines based on query stream mining,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intell., pp. 317–320,2007.
[4] P. Mika, E. Meij, and H. Zaragoza, ”Investigating the semantic gap through query log analysis,” in Proc. 8th Int. Semantic Web Conf., pp. 441–455,2009.
[5] S.L.Abebe and P.Tonella,“Natural language parsing of program element names for concept extraction,” in Proc. 18 th IEEE Int. Conf. Program Comprehension, pp. 156–159,2010.
[6] C. Treude and M.-A. Storey, “Effective communication of software development knowledge through community portals,” in Proc. 8th Joint Meet. Eur. Soft. Eng. Conf. ACM SIGSOFT Symp. Found. Soft. Eng., pp. 91–101,2011.
[7] T. C. Lethbridge, J. Singer, and A. Forward, “How software engineers use documentation: The state of the practice,” IEEE Soft., vol. 20, no. 6, pp. 35–39, Nov./Dec. 2003.
[8] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The Stanford Core NLP natural language processing toolkit,” in Proc. 52 nd Annu. Meet. Assoc. Computat. Linguistics: Syst. Demonstrations, pp. 55–60,2014.
[9] G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker, “Identifying word relations in software: A comparative study of semantic similarity tools,” in Proc. 16th IEEE Int. Conf. Program Comprehension, pp. 123–132, 2008.
[10] H. Zhong, L. Zhang, T. Xie, and H. Mei, “Inferring resource specifications From natural language API documentation,” in Proc. 24th IEEE/ACM Int. Conf. Automated Soft. Eng., pp. 307–318,2011.
[11] S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia, and T. Menzies, “Automatic query reformulations for text retrieval in software engineering,” in Proc. 35th Int. Conf. Soft. Eng., pp. 842–851,2013.
[12] J. Yang and L. Tan, “Inferring semantically related words from software context,” in Proc. 9th Working Conf. Min. Softw. Repositories, pp. 161–170,2012.
[13] E. Hill, L. Pollock, and K. Vijay-Shanker, “Automatically capturing source code context of NL-queries for software maintenance and reuse,” in Proc. 31st Int. Conf. Soft. Eng., pp. 232–242,2009.
[14] M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker, “Automatically mining software-based, semantically-similar words from comment-code mappings,” in Proc. 10th Working Conf. Min. Softw. Repositories, pp. 377–386, 2013.
[15] James H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition”, by Prentice Hall ,January 2000.