Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure

G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy

Open Access Article Go Back

Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure

G. Appa Rao¹ , G. Srinivas² , K.Venkata Rao³ , P.V.G.D. Prasad Reddy⁴

Department of CSE, GIT, GITAM,VISAKHAPATNAM,INDIA.
Department of IT, ANITS, VISAKHAPATNAM, INDIA.
Department of CSSE, AndhraUniversity,VISAKHAPATNAM,INDIA.

Section:Research Paper, Product Type: Journal Paper
Volume-6 , Issue-4 , Page no. 400-404, Apr-2018

CrossRef-DOI: https://doi.org/10.26438/ijcse/v6i4.400404

Online published on Apr 30, 2018

Copyright © G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy, “Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure,” International Journal of Computer Sciences and Engineering, Vol.6, Issue.4, pp.400-404, 2018.

MLA Citation

MLA Style Citation: G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy "Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure." International Journal of Computer Sciences and Engineering 6.4 (2018): 400-404.

APA Citation

APA Style Citation: G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy, (2018). Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure. International Journal of Computer Sciences and Engineering, 6(4), 400-404.

BibTex Citation

BibTex Style Citation:
@article{Rao_2018,
author = {G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy},
title = {Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2018},
volume = {6},
Issue = {4},
month = {4},
year = {2018},
issn = {2347-2693},
pages = {400-404},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=1909},
doi = {https://doi.org/10.26438/ijcse/v6i4.400404}
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i4.400404}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=1909
TI - Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure
T2 - International Journal of Computer Sciences and Engineering
AU - G. Appa Rao, G. Srinivas, K.Venkata Rao, P.V.G.D. Prasad Reddy
PY - 2018
DA - 2018/04/30
PB - IJCSE, Indore, INDIA
SP - 400-404
IS - 4
VL - 6
SN - 2347-2693
ER -

VIEWS	PDF	XML
759	582 downloads	253 downloads

Bar Line

Abstract

The key predicament in the present circumstances is how to categorize the mathematically related keywords from a given text file and store them in one math text file. As the math text file contains only the keywords which are related to mathematics. The math dataset is a collection of huge amount of tested documents and stored in math text file. The dataset is trained with giant amount of text files and the size of dataset increases, training with various text samples. Finally the dataset contains only math-related keywords. The proposed approaches evaluated on the text containing individual formulas and repeated formulas. The two approaches proposed are one is Sequence matcher and another one is Levenshtein Distance, both are used for checking string similarity. The performance of the repossession is premeditated based on dataset of repetitive formulas and formulas appearing once and the time taken for reclamation is also measured.

Key-Words / Index Term

Levenshtein distance,Sequence matcher

References

[1] Kai Ma, Siu Cheung Hui and Kuiyu Chang “Feature Extraction and Clustering-based Retrieval for Mathematical Formulas”, pp. 372-377.
[2] Sidath Harshanath Samarasinghe and Siu Cheung Hui “Mathematical Document Retrieval for Problem Solving”, International Conference on Computer Engineering and Technology, pp.583-587,2009.
[3] J. Misutka and L. Galambos, “Mathematical Extension of Full Text Search Engine Indexer”, Proc. 3rd International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA 08), , pp. 1-6,April 2008.
[4] B.R. Miller and A. Youssef, “Technical Aspects of the Digital Library of Mathematical Functions”, in Annals of Mathematics and Artificial Intelligence, Springer Netherlands, pp. 121-136, 2003.
[5] H. Zhang, T.B. and M.S. Lin, “An Evolutionary Kmeans Algorithm for Clustering Time Series Data” ,Proc. International Conference on Machine Learning and Cybernetics, pp. 1282-1287, 2004.
[6] R. Munavalli and M.R. MathFind, “A Math-aware Search Engine”, Proc. Annual International ACM SIGIR Conference on Research and development in information retrieval, pp.735-735, 2006.
[7] M. Kohlhase. “Markup for Mathematical Knowledge,” An Open Markup format for Mathematical Documents”, Ver. 1.2, Lecture Notes in Computer Science, , Springer Berlin, pp. 13-23.
[8] G.AppaRao,K.Venkata Rao,PVGD Prasad Reddy and T.Lava Kumar,“An Efficient Procedure for Characteristic mining of Mathematical Formulas from Document”, International Journal of Engineering Science and Technology (IJEST), Vol. 10 No.03,pp152-157, Mar 2018

Citations	8797
h-index	34
i10-index	152

Impact Factor :	3.802
ISSN :	2347-2693 (Online)