Spam Detection on Social Media Text

G. Jain, Manisha, B. Agarwal

Open Access Article Go Back

Spam Detection on Social Media Text

G. Jain¹ , Manisha ² , B. Agarwal³

Department of Computer Science, Banasthali University, Banasthali, India.
Department of Computer Science, Banasthali University, Banasthali, India.
Department of Computer Science and Engineering, SKIT, Rajasthan University, India.

Correspondence should be addressed to: jain.gauri@gmail.com.

Section:Research Paper, Product Type: Journal Paper
Volume-5 , Issue-5 , Page no. 63-70, May-2017

Online published on May 30, 2017

Copyright © G. Jain, Manisha, B. Agarwal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: G. Jain, Manisha, B. Agarwal, “Spam Detection on Social Media Text,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.5, pp.63-70, 2017.

MLA Style Citation: G. Jain, Manisha, B. Agarwal "Spam Detection on Social Media Text." International Journal of Computer Sciences and Engineering 5.5 (2017): 63-70.

APA Style Citation: G. Jain, Manisha, B. Agarwal, (2017). Spam Detection on Social Media Text. International Journal of Computer Sciences and Engineering, 5(5), 63-70.

BibTex Style Citation:
@article{Jain_2017,
author = {G. Jain, Manisha, B. Agarwal},
title = {Spam Detection on Social Media Text},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2017},
volume = {5},
Issue = {5},
month = {5},
year = {2017},
issn = {2347-2693},
pages = {63-70},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1265},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1265
TI - Spam Detection on Social Media Text
T2 - International Journal of Computer Sciences and Engineering
AU - G. Jain, Manisha, B. Agarwal
PY - 2017
DA - 2017/05/30
PB - IJCSE, Indore, INDIA
SP - 63-70
IS - 5
VL - 5
SN - 2347-2693
ER -

VIEWS	PDF	XML
1270	748 downloads	556 downloads

Bar Line

Abstract

Communication has become stronger due to exponential increase in the usage of social media in the last few years. People use them for communicating with friends, finding new friends, updating any important activities of their life, etc. Among different types of social media, most important are social networking sites and mobile networks. Due to their growing popularity and deep reach, these mediums are infiltrated with huge Vol.of spam messages. In this paper, we have discussed 5 traditional machine learning techniques for detecting spam in the short text messages on two datasets: SMS Spam Collection dataset taken from UCI Repository and Twitter dataset. Twitter dataset is compiled by crawling the public live tweets using Twitter API. The BoW with TF and TF-IDF weighing schemes is used for feature selection. The performance of various classifiers is evaluated with the help of metrics like precision, recall, accuracy and F1 score. The results show that the Random Forest gave highest accuracy with 100 estimators.

Key-Words / Index Term

Spam Detection, machine learning, Traditional classifiers, Twitter spam, SMS spam, Text Classification

References

[1] L. F. Cranor, & B. A. LaMacchia, “Spam!”, Communications of the ACM, Vol 41, Issue 8, pp.74-83. 1998.
[2] G. Jain, & M. Sharma, “Social Media: A Review”, In Information Systems Design and Intelligent Applications, Springer India, pp. 387-395. 2016
[3] M. Nelson, “Spam Control: Problems and Opportunities”, Ferris Research, India, pp.23-82, 2003.
[4] G.V. Cormack, J.M.G. Hidalgo, E.P. Sánz, “Feature engineering for mobile (SMS) spam filtering”, In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, US, pp. 871-872, 2007.
[5] C. K. Grier, Thomas, V. Paxson, M. Zhang, “@ spam: the underground on 140 characters or less”, In Proceedings of the 17th ACM conference on Computer and communications security, Chicago, pp.27-37, 2010.
[6] H. Stern, “A Survey of Modern Spam Tools”, In Proceedings of the 5th Conference on Email and Anti-spam, CA, pp.1-10, 2008.
[7] A. Cournane, R. Hunt, “An analysis of the tools used for the generation and prevention of spam”, Computers & Security, Vol.23, Issue.2, pp.154-166, 2004.
[8] T. Kurt, C. Grier, J. Ma, V. Paxson, D. Song, “Design and evaluation of a real-time url spam filtering service”, In 2011 IEEE Symposium on Security and Privacy, USA, pp.447-462, 2011.
[9] J. Kim, K. Chung, K. Choi, “Spam filtering with dynamically updated URL statistics”, IEEE Security & Privacy, Vol.4, Issue.5, pp.33-39, 2007.
[10] J.R. Levine, “Experiences with Greylisting”, In Proceedings of 2nd Conference Email and Anti-Spam (CEAS 05), NY, pp1-2, 2005
[11] B. Agarwal, N. Mittal, “Prominent feature extraction for review analysis: an empirical study”, Journal of Experimental & Theoretical Artificial Intelligence, Vol.28, Issue.3, pp.485-498, 2016.
[12] B. Agarwal, N. Mittal, “Sentiment analysis using conceptnet ontology and context information”, In Prominent Feature Extraction for Sentiment Analysis (Springer), US, pp.63-75, 2016.
[13] L. Zhang, J. Zhu, T. Yao, “An evaluation of statistical spam filtering techniques”, ACM Transactions on Asian Language Information Processing (TALIP), Vol.3, Issue.4, pp.243-269, 2004.
[14] I. Rish, “An empirical study of the naive Bayes classifier”, In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol.3, Issue.22, pp.41-46, 2001.
[15] F. Sebastiani, “Machine learning in automated text categorization”, ACM computing surveys (CSUR), Vol.34, Issue.1, pp.1-47, 2002
[16] Z. Yang, X. Nie, W. Xu, J. Guo, “An approach to spam detection by naive Bayes ensemble based on decision induction”, In Sixth International Conference on Intelligent Systems Design and Applications, China, pp.861-866, 2006
[17] C. Kim, K. B. Hwang, “Naive Bayes classifier learning with feature selection for spam detection in social bookmarking”, In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/ PKDD), US, pp.32, 2008.
[18] I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, C. D. Spyropoulos, “An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages”, In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Greece, pp.160-167, 2000.
[19] T. A. Almeida, A. Yamakami, J. Almeida, “Evaluation of approaches for dimensionality reduction applied with naive bayes anti-spam filters”, International Conference on Machine Learning and Applications, Miami, pp.517-522, 2009.
[20] C. Cortes, & V. Vapnik. “Support-vector networks”, Machine learning, Vol.20, Issue.3, pp.273-297, 1995
[21] M. Mccord, M. Chuah, “Spam detection on twitter using traditional classifiers”, In International Conference on Autonomic and Trusted Computing, Heidelberg, pp.175-186, 2011.
[22] P. Kolari, T. Finin,, A. Joshi, March, “SVMs for the Blogosphere: Blog Identification and Splog Detection”, In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, Baltimore, pp.92-99, 2006.
[23] H.B. Wang, Y. Yu, Z. Liu, “SVM classifier incorporating feature selection using GA for spam detection”, In International Conference on Embedded and Ubiquitous Computing, Japan, pp.1147-1154, 2005
[24] C. Y. Tseng, M. S. Chen, “Incremental SVM model for spam detection on dynamic email social networks”, In Int. Conf. on Computational Science and Engineering, Vancouver, pp.128-135, 2009.
[25] M. Healy, S. J. Delany, A. Zamolotskikh, “An assessment of case base reasoning for short text message classification”, In N. Creaney (Ed.), Proceedings of 16th Irish Conference on Artificial Intelligence and Cognitive Science, Castlebar, pp.257-266, 2005.
[26] A. Harisinghaney, A. Dixit, S. Gupta, A. Arora, “Text and image based spam email classification using KNN Naïve Bayes and Reverse DBSCAN algorithm”, In Optimization Reliabilty and Information Technology (ICROIT), India, pp.153-155, 2014
[27] T.P. Ho, H.S. Kang, S.R. Kim, “Graph-based KNN Algorithm for Spam SMS Detection”, Journal of Universal Computer Science, Vol.19, Issue.16, pp.2404-2419, 2013.
[28] F. Barigou, B. Beldjilali, B. Atmani, “Using cellular automata for improving knn based spam filtering”, Internationa Arab Journal Information Technology, Vol.11, Issue.4, pp.345-353, 2014.
[29] A.T. Sabri, A. H. Mohammads, B. Al-Shargabi, M. A. Hamdeh, “Developing new continuous learning approach for spam detection using artificial neural network (CLA_ANN)”, European Journal of Scientific Research, Vol.42, Issue.3, pp.525-535, 2011.
[30] MR. Nagpure, SS. Mesakar, SR. Raut and Vanita P.Lonkar, "Image Retrieval System with Interactive Genetic Algorithm Using Distance", International Journal of Computer Sciences and Engineering, Vol.2, Issue.12, pp.109-113, 2014.
[31] D. DeBarr, & H. Wechsler, “Spam detection using clustering, random forests, and active learning”, In Sixth Conference on Email and Anti-Spam. Mountain View, California, pp.1-6, 2009.
[32] A. Karami, L. Zhou, “Improving static SMS spam detection by using new content-based features”, In 20th Americas Conference on Information systems (AMCIS), Savannah, pp.1-9, 2014.
[33] A. Garg, N. Batra, I. Taneja, A. Bhatnagar, A. Yadav, S. Kumar, "Cluster Formation based Comparison of Genetic Algorithm and Particle swarm Optimization Algorithm in Wireless Sensor Network", International Journal of Scientific Research in Computer Science and Engineering, Vol.5, Issue.2, pp.14-20, 2017.
[34] Y. Zhang, S. Wang, P. Phillips G. Ji, “Binary PSO with mutation operator for feature selection using decision tree applied to spam detection”, Knowledge-Based Systems, Vol.64, Issue.3, pp.22-31, 2014.
[35] DJ. Hand, K. Yu, “Idiot’s Bayes—not so stupid after all?”, International statistical review, Vol.69, Issue.3, pp.385-398, 2001
[36] LBreiman, “Random forests”, Machine learning, Vol.45, Issue.1, pp.5-32, 2001.
[37] M. Lichman, “UCI Machine Learning Repository”, School of Information and Computer Science University of California, California, pp.1-143, 2013.

Citations	2325
h-index	16
i10-index	47