Open Access   Article Go Back

Efficient Clustering of Text Documents for Feature Selection on the use of side Information

Sonal S.Deshmukh1 , R.N.Phursule 2

Section:Research Paper, Product Type: Journal Paper
Volume-3 , Issue-10 , Page no. 10-16, Oct-2015

Online published on Oct 31, 2015

Copyright © Sonal S.Deshmukh , R.N.Phursule . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Sonal S.Deshmukh , R.N.Phursule, “Efficient Clustering of Text Documents for Feature Selection on the use of side Information,” International Journal of Computer Sciences and Engineering, Vol.3, Issue.10, pp.10-16, 2015.

MLA Style Citation: Sonal S.Deshmukh , R.N.Phursule "Efficient Clustering of Text Documents for Feature Selection on the use of side Information." International Journal of Computer Sciences and Engineering 3.10 (2015): 10-16.

APA Style Citation: Sonal S.Deshmukh , R.N.Phursule, (2015). Efficient Clustering of Text Documents for Feature Selection on the use of side Information. International Journal of Computer Sciences and Engineering, 3(10), 10-16.

BibTex Style Citation:
@article{S.Deshmukh_2015,
author = {Sonal S.Deshmukh , R.N.Phursule},
title = {Efficient Clustering of Text Documents for Feature Selection on the use of side Information},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {10 2015},
volume = {3},
Issue = {10},
month = {10},
year = {2015},
issn = {2347-2693},
pages = {10-16},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=695},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=695
TI - Efficient Clustering of Text Documents for Feature Selection on the use of side Information
T2 - International Journal of Computer Sciences and Engineering
AU - Sonal S.Deshmukh , R.N.Phursule
PY - 2015
DA - 2015/10/31
PB - IJCSE, Indore, INDIA
SP - 10-16
IS - 10
VL - 3
SN - 2347-2693
ER -

VIEWS PDF XML
2761 2531 downloads 2409 downloads
  
  
           

Abstract

This paper presents efficient clustering with side information using probabilistic latent Semantic indexing. Meta information is available in many texts mining application. It may be useful or sometimes it is a risky approach to add side information. The aim of this work is to resolve clustering problem, for data mining problems, in which auxiliary information is available, to enhance the extraction of text document. The work proposed an approach, Probabilistic Latent Semantic Indexing, which gives more efficiency by considering class labels and also will be applicable for large number of clusters. The goal of this work is to utilize side information available with the documents for clustering, to improve the efficiency of the clusters and also to reduce the time required to form clusters.

Key-Words / Index Term

Text Mining, Side information, Clustering, LSI, Probabilistic Latent Semantic Indexing

References

[1] Charu C. Aggarwal and Yuchen Zhao, “On the Use of Side Information for Mining Text Data”, in IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 6, June 2014.
[2] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” in Proc. ACM SIGMOD Conf., New York, NY, USA, 1998, pp. 73–84.
[3] R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Proc. VLDB Conf., San Francisco, CA,USA, 1994, pp. 144–155.
[4] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD Conf., New York, NY, USA, 1996, pp. 103–114.
[5] Vilas V Pichad and Sachin N Deshmukh, "Role of Document Clustering For Forensic Analysis Investigation System", International Journal of Computer Sciences and Engineering, Volume-03, Issue-03, Page No (116-120), Mar -2015, E-ISSN: 2347-2693
[6] D. Cutting, D. Karger, J. Pedersen, and J. Tukey, “Scatter/Gather: A cluster-based approach to browsing large document collections,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329.
[7] C. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.
[8] H. Schutze and C. Silverstein, “Projections for efficient document clustering,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1997, pp. 74–81.
[9] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in Proc. Text Mining Workshop KDD, 2000, pp. 109–110.
[10] S. Elakkiya and T. Kavitha, "Detection of Text Using Connected Component Clustering and Nontext Filtering", International Journal of Computer Sciences and Engineering, Volume-03, Issue-04, Page No (53-57), Apr -2015, E-ISSN: 2347-2693
[11] S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes,” Inf. Syst., vol. 25, no. 5, pp. 345–366, 2000.
[12] A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1988.
[13] C. C. Aggarwal, S. C. Gates, and P. S. Yu, “On using partial supervision for text categorization,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 2, pp. 245–255, Feb. 2004.
[14] G. P. C. Fung, J. X. Yu, and H. Lu, “Classifying text streams in the presence of concept drifts,” in Proc. PAKDD Conf., Sydney, NSW, Australia, 2004, pp. 373–383.
[15] H. Frigui and O. Nasraoui, “Simultaneous clustering and dynamic keyword weighting for text documents,” in Survey of Text Mining, M. Berry, Ed. New York, NY, USA: Springer, 2004, pp. 45–70.
[16] C. C. Aggarwal and H. Wang, Managing and Mining Graph Data. New York, NY, USA: Springer, 2010.
[17] C. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.