Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates
S. Bharti1 , H. Singh2
Section:Research Paper, Product Type: Journal Paper
Volume-06 ,
Issue-05 , Page no. 43-49, Jun-2018
Online published on Jun 30, 2018
Copyright © S. Bharti, H. Singh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Citation
IEEE Style Citation: S. Bharti, H. Singh, “Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.05, pp.43-49, 2018.
MLA Citation
MLA Style Citation: S. Bharti, H. Singh "Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates." International Journal of Computer Sciences and Engineering 06.05 (2018): 43-49.
APA Citation
APA Style Citation: S. Bharti, H. Singh, (2018). Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates. International Journal of Computer Sciences and Engineering, 06(05), 43-49.
BibTex Citation
BibTex Style Citation:
@article{Bharti_2018,
author = {S. Bharti, H. Singh},
title = {Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2018},
volume = {06},
Issue = {05},
month = {6},
year = {2018},
issn = {2347-2693},
pages = {43-49},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=418},
publisher = {IJCSE, Indore, INDIA},
}
RIS Citation
RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=418
TI - Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates
T2 - International Journal of Computer Sciences and Engineering
AU - S. Bharti, H. Singh
PY - 2018
DA - 2018/06/30
PB - IJCSE, Indore, INDIA
SP - 43-49
IS - 05
VL - 06
SN - 2347-2693
ER -




Abstract
Clone research community has described several techniques to detect code duplicates present in the code base, mainly categorized into four classes viz. textual or text-based techniques, lexical or token-based techniques, syntactic techniques (including tree-based and metrics-based approaches) and semantic techniques. Literature lists various clone detector tools based on each category capable of detecting clones in batch mode as well as in real-time development environment. But, most of the tools use tokens as their intermediate representation of the source code upon which clone detection algorithms are applied. Thus, this paper will focus on this token-based intermediate representation and its pragmatic aspects towards code duplication detection. By discussing the practical process of converting source code into tokens as an intermediate code representation and how code duplicates are detected, authors will put light on the obscured pros and cons of this token-based approach that will help researchers to select as well as implement, or reject this approach as an intermediate representation for their duplication detection algorithms.
Key-Words / Index Term
Code Clone Detection, Clone Detection Techniques, Token-based Clone Detection Technique
References
[1] Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant` Anna, and Lorraine Bier, "Clone Detection Using Abstract Syntax Tree," in Proceedings of 14th International Conference on Software Maintenance(ICSM`98), Bethesda, Mayland, 1998, pp. 368 - 377.
[2] Stefan Bellon, Rainer Koschke, Giuliano Antoniol, Jens Krinke, and Ettore Merlo, "Comparision and Evaluation of Clone Detection Tools," IEEE Transaction on Software Engineering, vol. 33, no. 9, pp. 577 - 591, 2007.
[3] Chanchal K. Roy and James R. Cordy, "A Survey on Software Clone Detection Research," Queen`s University, Kingston, Technical Report 2007-541, 2007.
[4] Miryung Kim, Lawrence Bergman, Tessa Lau, and David Notkin, "An Ethnographic Study of Copy and Paste Programming Practices in OOPL," in Proceedings of the 2004 International Symposium on Empirical Software Engineering (ISESE’04), Redondo Beach, CA, USA, USA, 2004.
[5] Minhaz F. Zibran, Ripon K. Saha, Muhammad Asaduzzaman, and Chanchal K. Roy, "Analysing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study," in Proceedings of the 16th IEEE International Conference on Engineering of Complex Computer Systems, Las Vegas, USA, 2011, pp. 295-304.
[6] M. F. Zibran and Chanchal Kumar Roy, "The Road to Software Clone Management: A Survey," Department of Computer Science, University of Saskatchewan, Canada, Technical Report 2012.
[7] Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "CCFinder: A Multilinguistic Token-Based Code Clone Detection System For Large Scale Source Code," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, July 2002.
[8] Brenda Baker, "On Finding Duplication and Near Duplication in Large Software Systems," in Proceedings of the 2nd Working Conference on Reverse Engineering (WCRE`95), 1995, pp. 86 - 95.
[9] Zhenmin Li, Shan mar, Yuanyuan ZohuLu, and Suvda Myag, "CP-Miner: Finding Copy Paste and Related Bugs in Large Scale Software Code," IEEE Transaction on Software Engineering, vol. 32, no. 3, pp. 176 - 192, March 2006.
[10] Wikipedia.[Online]. https://en.wikipedia.org/wiki/Lexical_analysis
[11] Alfred V. Aho, Monica S. Lam, and Jeffrey D. Ullman Ravi Sethi, Compilers: Principles, Techniques, and Tools, 2nd ed.: Pearson.
[12] Raimer Falke, Pierre Frenzel, and Rainer Koschke, "Empirical Evaluation of Clone Detection using Syntax Suffix Trees," Empirical Software Engineering, vol. 13, no. 6, pp. 601 - 643, July 2008.
[13] Elizabeth Burd and John Bailey, "Evaluating Clone Detection Tools for Use during Preventative Maintenance," in Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM `02), Montreal, Canada, 2002, pp. 36-43.
[14] M. Rieger, "Effective Clone Detection without Language Barriers," University of Bern, Switzerland, Dissertation 2005.
[15] Chanchal Kumar Roy, James Cordy, and Rainer Koschke, "Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Quantitative Approach," Science of Computer Programming, vol. 74, no. 7, pp. 470 - 495, March 2009.