Open Access   Article Go Back

Clustering as a Tool for Categorization of Unstructured Data

Ngor Gogo1 , E. O. Bennett2

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-8 , Page no. 116-121, Aug-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i8.116121

Online published on Aug 31, 2019

Copyright © Ngor Gogo, E. O. Bennett . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Ngor Gogo, E. O. Bennett, “Clustering as a Tool for Categorization of Unstructured Data,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.8, pp.116-121, 2019.

MLA Style Citation: Ngor Gogo, E. O. Bennett "Clustering as a Tool for Categorization of Unstructured Data." International Journal of Computer Sciences and Engineering 7.8 (2019): 116-121.

APA Style Citation: Ngor Gogo, E. O. Bennett, (2019). Clustering as a Tool for Categorization of Unstructured Data. International Journal of Computer Sciences and Engineering, 7(8), 116-121.

BibTex Style Citation:
@article{Gogo_2019,
author = {Ngor Gogo, E. O. Bennett},
title = {Clustering as a Tool for Categorization of Unstructured Data},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {8 2019},
volume = {7},
Issue = {8},
month = {8},
year = {2019},
issn = {2347-2693},
pages = {116-121},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4798},
doi = {https://doi.org/10.26438/ijcse/v7i8.116121}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i8.116121}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4798
TI - Clustering as a Tool for Categorization of Unstructured Data
T2 - International Journal of Computer Sciences and Engineering
AU - Ngor Gogo, E. O. Bennett
PY - 2019
DA - 2019/08/31
PB - IJCSE, Indore, INDIA
SP - 116-121
IS - 8
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
354 350 downloads 183 downloads
  
  
           

Abstract

The volume of information untapped are locked up in huge volume of text documents (unstructured data) that could aid the economy, government, individuals and corporate organisation to improve on the state of life and develop better working system cannot be overemphasized, therefore the need to extract this information and give a structure that will facilitate its proper storage and access when required becomes so important. The target of this research is to explore Clustering as a Tool for Categorizing Unstructured Data (Text document). The K-Prototype Algorithm was applied for the purpose of clustering these unstructured data to give structure to it. There are two major phases involved in this: first is the pre-processing phase (Tokenization, Stemming, and Stop Word Removal) and secondly the clustering phase. The system built performed better as shown from the result, that it can be use to categorise text documents for proper and easy storage and accessibility.

Key-Words / Index Term

Unstructured data, Clustering, Categorisation, K-Prototype Algorithm, pre-processing

References

[1]. Chakraborty, Goutam, Murali Pagolu, and Satish Garla. Text mining and analysis: practical methods, examples, and case studies using SAS. SAS Institute, 2014.
[2]. Praveen, P., and B. Rama. "A k-means Clustering Algorithm on Numeric Data." International Journal of Pure and Applied Mathematics Vol.117, Issue.7, pp.157-164, 2017.
[3]. Jain, Anil K. "Data clustering: 50 years beyond k-means." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.3-4. Springer, Berlin, Heidelberg, 2008.
[4]. Bhambri, . M. A. & Gupta, D. An Analysis of Document Clustering Algorithm,in ICCCCT-10, IEEE 2010, pp.402-406, 2013.
[5]. Goswami, J. A comparative Study on clustering and classification Algorithms, International Journal of Scientific and Applied Science (IJSEAS) Vol.1, issue.1, June 2015 ISSN: 2395-3470 pp.170-177, 2015.
[6]. Fredrick, J. & Leonardo S. Data Clustering, its application and benefits, Semantic Scholar, 2017.
[7]. Clifton, Chris, Robert Cooley, and Jason Rennie. "Topcat: Data mining for topic identification in a text corpus." IEEE transactions on knowledge and data engineering Vol.16, Issue.8 pp.949-964, 2004.
[8]. Malik, Hassan H., and John R. Kender. "Clustering web images using association rules, interestingness measures, and hypergraph partitions." In Proceedings of the 6th international conference on Web engineering, pp.48-55, 2006.
[9]. Wang, J. & Karypis, G., Efficient Summarizing Transactions for Clustering, In Proceedings of the Fourth IEEE International Conference on Data Mining, 2014.
[10]. Beil, Florian, Martin Ester, and Xiaowei Xu. "Frequent term-based text clustering." In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.436-442. 2002.
[11]. Fung, Benjamin CM, Ke Wang, and Martin Ester. "Hierarchical document clustering using frequent itemsets." In Proceedings of the 2003 SIAM international conference on data mining, Society for Industrial and Applied Mathematics, pp.59-70. 2003.
[12]. Yu, Hwanjo, Duane Searsmith, Xiaolei Li, and Jiawei Han. "Scalable construction of topic directory with nonparametric closed termset mining." In Fourth IEEE International Conference on Data Mining (ICDM`04), pp. 563-566. IEEE, 2004.
[13]. Xiong, Hui, Michael Steinbach, Pang-Ning Tan, and Vipin Kumar. "HICAP: Hierarchical clustering with pattern preservation." In Proceedings of the 2004 SIAM International Conference on Data Mining, pp.279-290. Society for Industrial and Applied Mathematics, 2004.
[14]. Parsons, Lance, Ehtesham Haque, and Huan Liu. "Subspace clustering for high dimensional data: a review." Acm Sigkdd Explorations Newsletter Vol.6, Issue.1, pp.90-105, 2004.
[15]. Fore, Neil Koberlein. "A Contrast Pattern Based Clustering Algorithm for Categorical Data." 2010.
[16]. Osinski, Stanislaw, and Dawid Weiss. "A concept-driven algorithm for clustering search results." IEEE Intelligent Systems Vol.20, Issue.3, pp.48-54, 2005.
[17]. Oyegoke, Adekunle. "The constructive research approach in project management research." International Journal of Managing Projects in Business, Issue.4, pp.4573-595, 2011.