Open Access   Article Go Back

Extraction of Sequential Patterns Using PREFIXSPAN

Elliot S.J.1 , Bennett E.O.2

  1. Department of Computer Science, Rivers State University, Port Harcourt, Nigeria.
  2. Department of Computer Science, Rivers State University, Port Harcourt, Nigeria.

Section:Research Paper, Product Type: Journal Paper
Volume-12 , Issue-6 , Page no. 21-29, Jun-2024

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v12i6.2129

Online published on Jun 30, 2024

Copyright © Elliot S.J., Bennett E.O. . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Elliot S.J., Bennett E.O., “Extraction of Sequential Patterns Using PREFIXSPAN,” International Journal of Computer Sciences and Engineering, Vol.12, Issue.6, pp.21-29, 2024.

MLA Style Citation: Elliot S.J., Bennett E.O. "Extraction of Sequential Patterns Using PREFIXSPAN." International Journal of Computer Sciences and Engineering 12.6 (2024): 21-29.

APA Style Citation: Elliot S.J., Bennett E.O., (2024). Extraction of Sequential Patterns Using PREFIXSPAN. International Journal of Computer Sciences and Engineering, 12(6), 21-29.

BibTex Style Citation:
@article{S.J._2024,
author = {Elliot S.J., Bennett E.O.},
title = {Extraction of Sequential Patterns Using PREFIXSPAN},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2024},
volume = {12},
Issue = {6},
month = {6},
year = {2024},
issn = {2347-2693},
pages = {21-29},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=5699},
doi = {https://doi.org/10.26438/ijcse/v12i6.2129}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v12i6.2129}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=5699
TI - Extraction of Sequential Patterns Using PREFIXSPAN
T2 - International Journal of Computer Sciences and Engineering
AU - Elliot S.J., Bennett E.O.
PY - 2024
DA - 2024/06/30
PB - IJCSE, Indore, INDIA
SP - 21-29
IS - 6
VL - 12
SN - 2347-2693
ER -

VIEWS PDF XML
21 19 downloads 9 downloads
  
  
           

Abstract

A great number of individuals are anxious to exploit the internet`s wealth of information. It can be employed to further enhance the existing data. However, the primary challenge lies in uncovering the valuable information that is concealed within HTML elements. This study proposes a framework for web usage mining that examines web server log files using sequential pattern mining approaches. Web log patterns reveal information about user behavior, preferences, and website interactions. Preprocessing of the web data was carried out. The primary objective of preprocessing is to enhance data integrity while decreasing the volume of information that requires evaluation. Prior to inputting the data into the pattern discovery phase, it is necessary to eliminate noise by resolving the challenge of distinguishing between different users and sessions. To identify frequent sequential access in large, low-support data sets, a method for mining sequential patterns is developed. A sequential pattern mining technique identifies recurring sequential patterns in multidimensional web log files with minimum support provided. Multidimensional sequential pattern mining is primarily concerned with enhancing the standard of the patterns the user received back. PrefixSpan algorithm has been used to extract tabular as well as unstructured data from HTML tag. Prefix prunes some web info by calculating the support value at different nodes in the represented projected sub-database and snipe away huge portions of the representation that are guaranteed not to create any outcomes. The system is implemented in Matlab programming language. In the domain of web mining, Matlab has been employed to extract valuable information from the web, including user records and content. When mining extensive sequences containing numerous records, in particular, the method substantially reduces execution time and eliminates enormous memory access costs. The PrefixSpan algorithm enhanced with the starting position and innertagcount parameters has better performance than Markov model and GSP algorithm with execution time of 2.35seconds.

Key-Words / Index Term

Web Usage Mining, Sequential Patterns, Web Access Pattern, Prefixspan, Web Server logs, Preprocessing

References

[1] B. J. Daher, “Sequential Pattern Generalization for Mining Multi-source Data,” Computer Science [cs]. Université de Lorraine, 2020.
[2] M. Berthold, & D. J. Hand, “Intelligent Data Analysis: An Introduction” Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[3] S. A Catanese, P. De-Meo, E. Ferrara, G. Fiumara & A. Provetti, “Crawling facebook for social network analysis purposes.” In Proc, International Conference on Web Intelligence, Mining and Semantics, Sogndal, Norway, ACM, 52 pp.1–8, 2011. https://doi.org/10.1145/1988688.1988749, 2011
[4] R. Bhaumik, R. Burke, & B. Mobasher, “Effectiveness of Crawling Attacks Against Web-based Recommender Systems”. In: Proceedings of the 5th workshop on intelligent techniques for web personalization (ITWP), 2007.
[5] M. J. Zaki, “SPADE; An Efficient Algorithm for Frequent Sequences”. Machine Learning, 42. pp.31-60, 2021.
[6] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu.. “FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining”. In Proceedings International Conference Knowledge Discovery and Data Mining (KDD), pp.355-359, 2000
[7] J. Pei, J. Han, H. Pinto, Q. Chen., U. Dayal & Hsu, M. C. “PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth”. Proceedings of 12th International Conference on Data Engineering, Heidelberg, Germany, pp.215-224, 2001.
[8] C. Antunes & A. L. Oliveira. “Sequential pattern mining algorithms: trade-offs between speed and memory”. In Workshop on Mining Graphs, Trees and Sequences (MGTS-ECML/PKDD), Pisa, Italy, pp.213-216, 2004.
[9] Z. Yang & M. Kitsuregawa. “LAPIN-SPAM: An improved algorithm for mining sequential pattern”. In Proceedings of the 21st International Conference on Data Engineering Workshops, Tokyo, Japan, pp.1222-1229, 2013.
[10] C. Marquardt, K. Becker & D. Ruiz. “A Preprocessing Tool for Web Usage Mining in the Distance Education Domain”. In Proceedings of the International Database Engineering and Application Symposium (IDEAS), pp.78-87, 2004.
[11] R. W. Cooley. “Web Usage Mining Discovery and Application of Interesting Pattern from Web Data”. PhD Thesis, University of Minnesota, 2000.
[12] F. Bonchi, C. Giannotti, G. Gozzi, M. Manco, D. Nanni, C. R. Pedreschi & S. Ruggieri. “Web Log Data Warehousing and Mining for Intelligent Web Caching”. Data Knowledge Engineering, 39(2), pp.165-189, 2011.
[13] Doja, M. N. "Web data mining in E-services–concepts and applications." Indian J. Comput. Sci. Eng, 8 pp.313-318, 2017.
[14] S. K. Girish. “Web Usage Mining for Comparing User Access Behaviour using Sequential Pattern,” .2015.
[15] N. K. Tyagi, A. K. Solanki & S. Sanjay Tyagi. “An Algorithmic Approach to Data Preprocessing in Web Usage Mining”, 2010.
[16] L. Choudhary, L & S. Swami. Exploring the Landscape of Web Data Mining: An In-depth Research Analysis. Current Journal of Applied Science and Technology, 42(24), pp.32-42, 2023.
[17] Rathi, Preeti, and Nipur Singh. "An efficient algorithm for data preprocessing and personalization in Web usage mining." International Journal of Computer Sciences and Engineering 7.5, pp.160-164, 2019.