Movie Recommendation Model Using Stochastic Gradient Descent For Collaborative Filtering In Social Media Mining
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.1-7, Feb-2019
Abstract
Nowadays, many people appetite to watch TV-shows or - series anytime and anywhere they want. In recent years, online TV has experienced exponential growth. Netflix is one of the parties that jumped into the world of online streaming services. In this effort, many subsist movie recommendation approaches learn a user ranking model from user feedback with respect to the movie’s content. Unfortunately, this approach suffers from the sparsity problem inherent in SMR data. Collaborative filtering (CF) is the workhorse of recommender engines since it can perform feature learning on its own, meaning it learns for itself what features to use. CF can be split into Memory-Based Collaborative Filtering and Model-Based Collaborative filtering. Here compare results from memory-based CF, model-based CF and third approach which uses an algorithm called `Stochastic gradient descent` for collaborative filtering. The propose stochastic gradient descent algorithm using movie recommender system. In this propose system use movie lens dataset, one of the most common datasets used to implement and test recommender engines. It contains 100,000 movie ratings from 943 users and a selection of 1682 movies. Evaluate the results using the Root Mean Squared Error (RMSE) and Mean Absolute Error(MAE).
Key-Words / Index Term
Movie Recommendation System, Memory-Based Collaborative Filtering, Model-Based Collaborative Filtering, Stochastic Gradient Descent
References
[1] L. Canini, S. Benini, and R. Leonardi. Affective recommendation of movies based on selected connotative features. Circuits and Systems for Video Technology, IEEE Transactions on, 23(4):636–647, 2013.
[2] C. Chen, X. Zheng, Y. Wang, F. Hong, and D. Chen. Capturing semantic correlation for item recommendation in tagging systems.In AAAI, pages 108–114, 2016.
[3] C. Chen, X. Zheng, Y. Wang, F. Hong, Z. Lin, et al. Contextaware collaborative topic regression with social matrix factorization for recommender systems. In AAAI, volume 14, pages 9–15, 2014.
[4] A. M. Elkahky, Y. Song, and X. He.A multi-view deep learning approach for cross domain user modeling in recommendation systems. In WWW, pages 278–288, 2015.
[5] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua.Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, pages 173–182. International World Wide Web Conferences Steering Committee, 2017.
[6] X. He, H. Zhang, M.-Y.Kan, and T.-S. Chua. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 549–558. ACM, 2016.
[7] N. Koenigstein and U. Paquet. Xbox movies recommendations: Variationalbayes matrix factorization with embedded feature selection. In RecSys, pages 129–136. ACM, 2013.
[8] N. N. Liu, L. He, and M. Zhao. Social temporal collaborative ranking for context aware movie recommendation. TIST, 4(1):15, 2013.
[9] T. Mei, B. Yang, X.-S. Hua, and S. Li. Contextual video recommendation by multimodal relevance and user feedback. TOIS, 29(2):10, 2011.
[10] W. Pan and L. Chen. Gbpr: Group preference based bayesian personalized ranking for one-class collaborative filtering. In IJCAI, volume 13, pages 2691–2697, 2013.
[11] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.Item-based collaborative filtering recommendation algorithms. In WWW, pages 285–295. ACM, 2001.
[12] Y. Shi, M. Larson, and A. Hanjalic.Mining contextual movie similarity with matrix factorization for context-aware recommendation. TIST, 4(1):16, 2013.
[13] J. Tang, G.-J.Qi, L. Zhang, and C. Xu.Cross-space affinity learning with its application to movie recommendation. TKDE, 25(7):1510– 1519, 2013.
[14] S. Wei, X. Zheng, D. Chen, and C. Chen.A hybrid approach for movie recommendation via tags and ratings. Electronic Commerce Research and Applications, 18:83–94, 2016.
[15] M. Yan, J. Sang, and C. Xu. Unified youtube video recommendation via cross-network collaboration. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pages 19–26. ACM, 2015.
[16] X. Zhao, G. Li, M. Wang, J. Yuan, Z.-J.Zha, Z. Li, and T.-S.Chua. Integrating rich information for video recommendation with multitask rank aggregation. In ACM Multimedia, pages 1521–1524. ACM, 2011.
[17] X. Zhao, J. Yuan, R. Hong, M. Wang, Z. Li, and T.-S.Chua.On video recommendation over social network. In International Conference on Multimedia Modeling, pages 149–160. Springer, 2012.
[18] H. Li, R. Hong, D. Lian, Z. Wu, M. Wang, and Y. Ge, “A relaxed ranking-based factor model for recommender system from implicit feedback,” in Proceedings of IJCAI‟16, 2016, pp. 1683–1689.
[19] C. Wang and D. M. Blei, “Collaborative topic modeling for recommending scientific articles,” in Proceedings of KDD‟11. ACM, 2011, pp. 448–456.
[20] D. Lian, Y. Ge, N. J. Yuan, X. Xie, and H. Xiong, “Sparse bayesian content-aware collaborative filtering for implicit feedback,” in Proceedings of IJCAI‟16. AAAI, 2016.
Citation
C. Premila Rosy , "Movie Recommendation Model Using Stochastic Gradient Descent For Collaborative Filtering In Social Media Mining", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.1-7, 2019.
Social Media Mining : Retrieving , Preprocessing Storing and Analyzing Bone Cancer Related Tweets Using R
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.8-11, Feb-2019
Abstract
Social media provides easily an accessible platform for users to share information. Mining social media has its potential to extract actionable patterns that can be beneficial for business, users,and consumers. Social media data are vast, noisy, unstructured, and dynamic in nature, and thus novel challenges arise. This paper deals with social media mining in which we retrieved tweets ,preprocessed and store it in a csv file in order to compare with ontology related to cancer which is created using protégé. Also analysis made on preprocessed cancer related tweets using R.
Key-Words / Index Term
Social media, Mining, preprocess, csvfile, Ontology, Tweets, R
References
[1] Inna Novalija, Miha Papler, Dunja Mladenić ,“Towards Social Media Mining: Twitterobservatory,Artificial Intelligence Laboratory ,Jožef Stefan Institute. Jamova 39, 1000 Ljubljana, Slovenia
[2] S. Tamilarasan, P.K. Sharma, “A Survey on Dynamic Resource Allocation in MIMO Heterogeneous Cognitive Radio Networks based on Priority Scheduling”, International Journal of Computer Sciences and Engineering, Vol.5, No.1,pp.53-59, 2017.
[3] I.Hemalatha1 Dr. G. P Saradhi Varma2 Dr. A.Govardhan,” Preprocessing the Informal Text for efficient Sentiment Analysis”, International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) ,Volume 1, Issue 2, July – August 2012 .
[4] Akshi Kumar and Teeja Mary Sebastian,” Sentiment Analysis on Twitter”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 3, July 2012
[5] Pasapitch Chujai, Nittaya Kerdprasop, and Kittisak Kerdprasop, “On Transforming the ER Model to Ontology Using Protégé OWL Tool”, International Journal of Computer Theory and Engineering, Vol. 6, No. 6, December 2014.
[6] Lampe, C., Ellison, N. and Steinfield, C. (2008). Changes in use and perception of Facebook. CSCW 2008, 721-730.
[7] Java, A., Song, X., Finin, T., and Tseng, B. (2007). Why we Twitter: Understanding microblogging usage and communities. WebKDD/SNA-KDD 2007, 56-65.
[8] Zhao, D. and Rosson, M.B. (2009). How and why people Twitter: The role that microblogging plays in informal communication at work. Group 2009, 243-252.
[9] Naaman, M., Boase, J., & Lai, C. H. (2010). Is it really about me? Message content in social awareness streams. CSCW 2010, 189-192.
[10] Honeycutt, C. and Herring, S. (2009). Beyond microblogging: Conversation and collaboration via Twitter. HICSS 2009.
[11] Morris. M.R., Teevan, J., and Panovich, K. (2010). What do people ask their social networks, and why? A survey study of status message Q&A behavior. CHI 2010, 1739-1748.
Citation
S. Mahalakshmi, "Social Media Mining : Retrieving , Preprocessing Storing and Analyzing Bone Cancer Related Tweets Using R", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.8-11, 2019.
High Utility Text and Data Mining Methods
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.12-17, Feb-2019
Abstract
Text data has continuous growth of volumes of data, automate extract ion of implicit, previously unknown, and potentially useful information becomes more necessary to properly utilize this vast source of knowledge. Text mining corresponds to the extension of the data mining approach to textual data and is concerned with various tasks, such as extraction of information implicitly contained in collection of documents, or similarity-based structuring. This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this paper is to introduce some of the current text mining methods that are employed within this discipline area. In this paper we provide some of methods of text datamining.
Key-Words / Index Term
Text Mining, Text Mining Text Processing, Methods Text, Document clustering
References
[1]. BAEZA-YATES, R. AND RIBEIRO-NETO, B. (1990). Modern Information Retrieval. Addison Wesley.
[2].BERRY, M.W.(2003) . Survey of Text Mining: Clustering, classification and Retrieval (Hardcover). springer.
[3] PORTER, M.F. (1980). Algorithm for suffix striping, Program, 130-137.
[4]. DEERVESTER, S., DUMAIS, S.T., FURNAS, G. W., AND LANDAUER, T.K. (1990). Indexing by latent semantic analysis. Journal of the Am. Soc. for Information Science 41, 6, 391-407.
[5]. DUDA, R.O., HART, P.E., AND STORK, D.G. (2000). Pattern Classification, Second ed. Wiley Interscience. MR1802993.
[6]. MORRIS, S.A. AND YEN, G.G. (2004). Crossmaps: visualization of overlapping relationships in collections of journal papers. Proceedings of the National Academy of Sciences of the United states of America supplement 1 101, 5291-5296.
[7] Fayyad U, Piatetsky-Shapro G, and Smyth P 1996 from data mining to knowledge discovery: an overview. In Fayyad U, Piatetsky Shapiro G, Smyth P, and Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining. Cambridge, MA, AAAI/MIT press:1-34.
[8] Dunham, M.H. (2003). Data Mining- Interdictory and Advanced Topics. Prentice-Hall, NewJersey.
[9] Witten, I.A. and Frank, E. (2000). Data Mining-Practicall Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, SanFrancisco.
[10] Groth, R. (2000). Data Mining – Building competitive Advantage. Prentice-Hall, NewJersey.
[11] G. Giuffrida, W.W.Chu, D.M.Hassens, NOAH: An Algorithm for Mining Classification Rules from Datasets with Large Attribute Space. In Proceedings of 12th International Conference on Extending Database(EDBT), Konsta, Germenay, March2000.
[12] Q. Zou, W.W. Chu, D. Johnson, H.Chiu, A Pattern Decomposition Algorithm for Finding All frequent Patterns in Large Datasets. ICDM2001:673-674.
[13] W.W. Chu, K.Ching, C.C.Hsu, H.Yau, An Error based Conceptual Clustering Method for Providing Approximate Query Answers. Communications of the ACM, 39(13), December,1996.
[14] J.Han, J. Pei, Y.Yin, Mining Frequent Patterns without Candidate Generation. 2000 ACM SIGMOD Intl. Conference on Management ofData.
[15] C.M.Ho, P.H.Huang, J.lew, J.D.Mai, V.Lee, Y.C.Tai, Intelligent System Capable of Sensing-Computing-Actuating, Keynote Address, 4th Intl. Conference on Intelligent Materials, Society of Non-Traditional Technology. Tokyo, Japan, October 1998.
Citation
R. Kanimozhi, "High Utility Text and Data Mining Methods", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.12-17, 2019.
Data Mining Approaches to Predict the Factors that Affect the Agriculture Growth using Stochastic Model
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.18-23, Feb-2019
Abstract
In the recent times, there has been an increasing demand for efficient strategies in the data mining in agriculture prediction. Data mining is equipment to predict effectively by stochastic model sensing concept. This paper proposes an efficient factor that affects the agriculture growth using different data like rainfall, groundwater and temperature by adopting stochastic modeling and data mining approaches. Firstly, the novel model is proposed to predict the factors affecting the growth of agriculture using stochastic model and numerical illustrations are done and the various expected estimation the sternness of the proposed approach.
Key-Words / Index Term
Data Mining, Agriculture productions, Rainfall, Groundwater, Temperature and Stochastic model
References
[1] Rajesh, P. and Karthikeyan, M., “A comparative study of data mining algorithms for decision tree approaches using WEKA tool”, Advances in Natural and Applied Sciences, vol. 11(9), 2017, pp. 230-243.
[2] https://en.wikipedia.org/wiki/Economy_of_India.
[3] https://www.quora.com/An-example-of-stochastic-model
[4] Yan, S., Yu, S., Wu, Y., Pan, D., Dong, J., “Understanding groundwater table using a statistical model”, Water Science and Engineering, vol. 11(1), 2018, pp. 1-7.
[5] Rajesh, P. and M. Karthikeyan, “Prediction of Agriculture Growth and Level of Concentration in Paddy - A Stochastic Data Mining Approach”, Advances in Intelligent Systems and Computing, 2018, pp. 127-139.
[6] EI-Sayed Omran. E., “A stochastic simulation model to early predict susceptible areas to water table level fluctuations in North Sinai, Egypt”, The Egyptian Journal of Remote Sensing and Space Science, vol. 19(2), 2016, pp. 235-257.
[7] Rajesh, P. and Karthikeyan, M., “Predication of Labour Demand in Agriculture Based On Comparative Study of Different Data Using Data Mining and Stochastic Approach”, International Journal of Engineering Science Invention, vol. 2, 2018, pp. 86-89.
[8] Bartholomew, D. J., “The Stochastic model for social processes”,. 3nd ed., John Wiley and Sons, New York, 1982.
[9] Mucherino, A., Papajorgji, P.J., and Pardalos, P.M, “Data mining in agriculture”, Springer Science & Business Media., 2009.
[10] Adhikary, SK., Mahidur Rahman, Md., and Gupta, AD., “A Stochastic Modelling Technique for Predicting Groundwater Table Fluctuations with Time Series Analysis”, International Journal of Applied Sciences and Engineering Research, vol. 1(2), 2012, pp. 238-249.
[11] Mohammad Mirzavand, Seyed Javad Sadatinejad, Hoda Ghasemieh, Rasool Imani and Mehdi Soleymani Motlagh, “Prediction of Ground Water Level in Arid Environment Using a Non-Deterministic Model”, Journal of Water Resource and Protection, vol. 6, 2014, pp. 669-676.
[12] Korn, G. A. and Korn, T. M., “Mathematical Handbook for Scientists and Engineers”, 2nd ed., McGraw-Hill Companies, 2016.
[13] https://en.wikipedia.org/wiki/Laplace_transform.
[14] https://en.wikipedia.org/wiki/Convolution.
[15] http://mathworld.wolfram.com/Convolution.
Citation
P. Rajesh, M. Karthikeyan, "Data Mining Approaches to Predict the Factors that Affect the Agriculture Growth using Stochastic Model", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.18-23, 2019.
Earthquake Prediction using SVM based Time Predictable Technique
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.24-28, Feb-2019
Abstract
As with so many natural phenomena, earthquakes are the product of what scientists call "complex systems," or systems which are more than the sum of their parts. Not just speaking proverbially, but in truest ever sense, precise prediction of earthquakes has long been a question of Life & Death for the scared inhabitants of earthquake-prone areas and so is for the forecasters and scientists ranging from Nostradamus to Dr. Vladimir Kellis-Borok since last a few centuries. Though the experts still don’t know many of the details of the physical processes involved and how to predict these events, several prediction and chaos theories have been put forth with varying degrees of successes. In spite of the inherent complexities involved in such a complex system, the research is still on and on. The time- predictable model of earthquake prediction is based on the theory that earthquakes in fault zones are caused by the constant build-up and release of strain in the Earth`s crust. This model has become a standard tool for hazard prediction in many earthquake-prone regions and, therefore, it is not surprising that the scientists in the United States and other Pacific Rim countries, such as Japan and New Zealand, routinely use this technique for long-range hazard assessments when adequate data are available.
Key-Words / Index Term
Earthquakes;time-predictablemodel; forecasters
References
[1] A Review of Two Methods of Predicting Earthquakes , Chris Gray, University of Wisconsin,Madison,http://tc.engr.wisc.edu/uer/uer96/author3/index.html.
[2] Earthquakes, Animals and Man, B. G. Deshpande, Pune, India: The Maharashtra Association for the Cultivation of Science, 1987.
[3] The Prophecy of Nostradamus About the Recent Japan Earthquake ( Century: 1, Quatrain: 46), Very near Auch, Lectoure and Mirandea great fire will fall from the sky for three nights. The cause will appear both stupefying and marvelous; shortly afterwards there will be an earthquake.
[4] Variations of Trends of Indicators Describing Complex Systems: Change of Scaling Precursory to Extreme Events , Vladimir Keilis- Borok (University of California, Los Angeles) and Alexandre Soloviev (International Institute of Earthquake Prediction Theory and Mathematical Geophysics, Russian Academy of Sciences) appears in the (J.) CHAOS.
[5] Geometric Incompatibility in a Fault System, Andrei Gabrielov, Vladimir Kellis-Borok, & David D. Jackson, Earthquake Prediction: The Scientific Challenge (National Academy of Sciences Colloquium, United States), 1996, pp. 3838-3842.
[6] Nonlinear Dynamics of Lithosphere and Earthquake Prediction , Volume 2002, Dr. Vladimir Kellis Borok & Dr. Alexandra A. Soloviev, ISSN 0172-7389, ISBN 3-540-43528-X, Springer-Verlag Berlin Heideberg New York.
[7] Intermediate Term Prediction of Occurrence Times of Strong Earthquakes , Keilis- Borok, V.I., Knopoff, L., Rotwain, I. & Allen, C.R. (1988). Nature 335 (6192): 690–694.
[8] The Mechanics of the Earthquake, The California Earthquake of April 18, 1906, H.F. Reid, Report of the State Investigation Commission, Vol. 2, Carnegie Institution of Washington, Washington, D.C. 1910.
[9] Water Level and Strain Changes Preceding and Following the August 4, 1985 Kettleman Hills, California, Earthquake , Roeloffs, E. et al. (1997), Pure and Applied Geophysics 149 : 21– 60.
Citation
M.A. Shanti, "Earthquake Prediction using SVM based Time Predictable Technique", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.24-28, 2019.
Versatile Distributed Computing Taxonomy
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.29-35, Feb-2019
Abstract
As indicated by NIST meaning of distributed computing, it has five attributes: on-request self-benefit, broad network access, asset pooling, rapid elasticity, and measured services, while mobile computing figuring centers around gadget portability and setting mindfulness considering systems administration and versatile asset/information get to. Portable distributed computing is normally viewed as expanding on distributed computing and versatile registering; be that as it may, it has some one of a kind highlights, for example, benefit offloading, migration, composition Versatile distributed computing advances portable figuring innovations and use bound together flexible assets of fluctuated mists and system advances. This part gives a review of different vital ideas that are very identified with versatile distributed computing and outline their relations through genuine models.
Key-Words / Index Term
Data collection , Measurement sensor, Radiocommunication, Distributed system Network, protocol Energy consumption, Taxonomy
References
[1] P. Mell, T. Grance, The NIST definition of cloud computing, 2011.
[2] Y. Cui, X. Ma, H. Wang, I. Stojmenovic, J. Liu, A survey of energy efficient wireless transmission and modeling in mobilecloud computing, Mobile Networks and Applications 18 (1) (2013) 148–155.
[3] M. Satyanarayanan, P. Bahl, R. Caceres, N. Davies, The case for VM-based cloudlets in mobile computing, Pervasive Computing, IEEE 2009;8(4):14–23.
[4] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, A. Patti, CloneCloud: elastic execution between mobile device and cloud, Proceedings of the Sixth Conference on Computer Systems. ACM; 2011:301–314.
[5] S. Kosta, A. Aucinas, P. Hui, R. Mortier, X. Zhang, ThinkAir: dynamic resource allocation and parallel execution in the cloud for mobile code offloading, 2012 Proceedings IEEE INFOCOM. 2012:945–953.
[6] A.R. Khan, M. Othman, S.A. Madani, S.U. Khan, A survey of mobile cloud computing application models, Communications Surveys & Tutorials, IEEE 2014;16(1):393–413.
[7] K.M. Saipullah, A. Anuar, N.A. Ismail, Y. Soo, Measuring power consumption for image processing on android smartphone, American Journal of Applied Sciences 2012;9(12):2052.
[8] R. Kemp, N. Palmer, T. Kielmann, H. Bal, Cuckoo: a computation offloading framework for smartphones, in: Mobile Computing,Applications, and Services, Springer, 2010, pp. 59–79.
[9] Amazon EC2, https://aws.amazon.com/ec2/.
[10] E. Abebe, C. Ryan, Adaptive application offloading using distributed abstract class graphs in mobile environments, Journal of Systems and Software 2012;85(12):2755–2769.
Citation
J. Mary Ramya Poovizhi, "Versatile Distributed Computing Taxonomy", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.29-35, 2019.
Architecture for Automated Data Quality Checking in Big Data Migration Process
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.36-39, Feb-2019
Abstract
Data are gathered from different sources that have high quality issues. Increasing volume of information is there in the digital libraries. Most of the system may be affected by the replicas. Data cleaning is the important process to remove replicas using de-duplication. It consists of process of parsing, data transformation, duplicate elimination and statistical methods. It is one of the most challenging stages to clear repeated documents. It deals with the detection and removal of errors, filling in omitted values, smoothing noisy data to improve the quality of data. De-duplication is the key function in data integration which is from various sources. It is the process of determining all categories of information contained by a data set that indicate the same real world entity. This paper is going to introduce a methodology for automated data quality checking with de-duplication algorithm.
Key-Words / Index Term
Data Quality, Data Cleansing, De-Duplication
References
[1] Lalitha.L, Maheswari.B, Dr.Karthik.S, “A Detailed Survey on Various Record Deduplication Methods”, International Journal of Advanced Research in Computer Engineering and Technology, Volume 1, No.8, October 2012, ISSN: 2278-1323.
[2] VarshaWandhekar, ArtiMohanpurkar, “Validation Of Deduplication In Data Using Similarity Measure”, International Journal of Computer Applications, Volume 116, No.21, April 2015, ISSN: 0975-8887.
[3] A.F.Elgamal, N.A.Mosa, N.A.Amasha, “Application Of Framework For Data Cleaning To Handle Noisy Data In Data Warehouse”, International Journal of Soft Computing and Engineering, Volume 3, No.6, January 2014, ISSN: 2231-2307.
[4] Bilal Khan, AzharRauf, HumaJaved, Shah Khusro, “Removing Fully And Partially Duplicated Records Through K-Means Clustering”, International Journal of Engineering and Technology, Volume 4, No.6, December 2012.
[5] J.R.Waykole, S.M.Shinde, “A Survey Paper On Deduplication By Using Genetic Algorithm Alongwith Hash Based Algorithm”, International Journal of Engineering Research and Applications, Volume 4, Issue 1, January, 2014, ISSN: 2248 -9622.
[6] Rohitananthakrishna, SurajChaudhari, VenkateshGanthi, “Eliminating Fuzzy Duplicates In Data Warehouses”, Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
[7] Thilagavathi.S, “Record Linkage And Deduplication Using FEBRL Frameworl And Block, Sorting, Bigram Indexing Techniques”, International Journal of Innovative Trends and Emerging Technologies”, Volume 1, No.1, March 2014, ISSN: 2349-9842.
[8] BassmaS.Alsulami, MaysoonF.Abulkhir, FathyE.Eassa, “Near Duplicate Document Detection Survey”, International Journal of Computer Science and Communication Networks, Volume 2(2), 2012, 147-151, ISSN: 2249-5789.
[9] Nishand.K, Ramasami.S, T.Rajendran, “An Efficient Way Of Record Linkage System And Deduplication Using Indexing Techniques, Classification And FEBRL Framework”, International journal of Emerging Science and Engineering, Volume 01, Issue 07, May-2013, ISSN: 2319-6378.
[10] PrernaS.Kulkarni, Dr.J.W.Bakal, “Survey On Data Cleaning”, International Journal of Engineering Science and Innovative Technology”, Volume 3, Issue 4, No. 2, July -2014, ISSN: 2319 – 5967.
[11] Sapna Devi, Dr.ArvindKalia, “Study Of Data Cleaning & Comparison Of Data Cleaning Tools”, International Journal of Computer Science and Mobile Computing, Volume 4(3), pp. 360–370, March 2015.
[12] RajashreeY.Patil, Dr.R.V.Kulkarni, “A Review Of Data Cleaning Algorithms For Data Warehouse Systems”, International Journal of Computer Science and Information Technologies, Volume 3, Number 5, 2012. ISSN: 5212 -5214.
[13] seetalamDivyaManusha, ValivetiKarthik, PrathipatiRatna Kumar, “De-Duplication Of Citation Data By Genetic Programming Approach”, International journal of Recent Advances in Engineering & Technology, Volume 1, Issue 3, 2013, eISSN:2374-2812.
[14] L.Chitra Devi, S.M.Hansa, Dr.G.N.SureshBabu, “A Genetic Programming Approach For Record Deduplication”, International Journal of Innovative Research in Computer and Communication Engineering, Volume 1, No.4, June 2013, ISSN: 2320-9798.
[15] Y.SyedMudhasir, J.Deepika, S.Senthilkumar, and G.S.Mahalakshmi, “Near Duplicates Detection And Elimination Based On Web Provenance For Effective Web Search”, International Journal on Internet and Distributed Computing Systems, Volume 1, No.1, August 2011.
[16] SupriyaAllampallewar, J.Ratnaraja Kumar, “A Survey Study ForDeduplication In Large Scale Data”, International Journal of Advanced Research in Computer and Communication Engineering, Volume 5, No.2, February 2016.
[17] AnestisSitas, SarantosKapidakis, “Duplicate detection algorithms of bibliographic descriptions”, Library Hi Tech., Volume 26, No.2, 2008, ISSN:0737-8831.
[18] S.B.Kadus, H.A.Sawant, S.S.Tilekar and H.D.Zendage, “Data deduplication of election database using windowing algorithm”, International Journal of Current Research in Science and Technology, Volume 1, No.4, 2015.
Citation
V. Rathika, "Architecture for Automated Data Quality Checking in Big Data Migration Process", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.36-39, 2019.
Parameter-Free Algorithm for Mining Rare Association Rules
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.40-46, Feb-2019
Abstract
This paper exhibits a Parameter-Free grammar guided genetic programming algorithm for mining rare association rules. This algorithm utilizes a context-free grammar to represent individuals, encoding the solutions in a tree-shape conformant to the grammar, so they are more expressive and flexible. The algorithm here introduced has the advantages of utilizing evolutionary algorithms for mining rare association rules, and it also additionally takes care of the issue of tuning the tremendous number of parameters required by these algorithms. The principle highlight of this algorithm is the small number of parameters required, providing the possibility of discovering rare association rules in an easy way for non-expert users. We compare our approach to existing evolutionary and exhaustive search algorithms, obtaining important results and overcoming the drawbacks of both exhaustive search and evolutionary algorithms. The experimental stage reveals that this approach discovers infrequent and reliable rules without a parameter tuning.
Key-Words / Index Term
Genetic Programming, Association Rules, Free Parameters, Data Mining
References
[1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proceedings of the 20th International Conference on Very Large Data Bases, ser. VLDB ’94. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, pp. 487–499.
[2] C. Romero, J. M. Luna, J. R. Romero, and S. Ventura, “Rmtool: A framework for discovering and evaluating association rules,” Advances in Engineering Software, vol. 42, no. 8, pp. 566–576,2 011.
[3] X. Yan, C. Zhang, and S. Zhang, “Armga: Identifying interesting association rules with genetic algorithms,” Applied Artificial Intelligence, vol. 1 9, no. 7, pp. 677–689,2005.
[4] R. McKay, N. Hoai, P. Whigham, Y. Shan, and M. ONeill, “Grammar-based genetic programming: a survey,” Genetic Programming and Evolvable Machines, vol. 11, pp. 365–396, 2010.
[5] J. M. Luna, J. R. Romero, and S. Ventura, “Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules”, Knowledge and Information Systems, vol. 32, no. 1, pp. 53–76, 2012.
[6] J. M. Luna • J. R. Romero • S. Ventura “ On the adaptability of G3PARM to the extraction of rare association rules”, Knowledge and Information Systems, vol,38, pp 391-418, 2014.
[7] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining frequent patterns without candidate generation: A frequent-pattern tree approach,” Data Mining and Knowledge Discovery, vol.8, pp. 53–87,2004.
[8] T.Scheffer, “Finding association rules that trade support optimally against confidence,” in 5th European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 2001, pp. 424–435.
[9] A. Salleb-Aouissi, C. Vrain, and C. Nortet, “Quantminer: A genetic algorithm for mining quantitative association rules,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, ser. IJCAI ’97, Hyberadad, India, 2007, pp. 1035–1040.
[10] Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 3518:97–106
[11] [11] Szathmary L, Napoli A,Valtchev P(2007)Towards rare itemset mining.In:Proceedings of the 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 305–312.
[12] Szathmary L, Valtchev P, Napoli A (2010) Generating rare association rules using the minimal rare itemsets family. Int J Softw Inf 4(3):219–238.
[13] P. Tan and V. Kumar, “Interestingness measures for association patterns: A perspective,” in Workshop on Postprocessing in Machine Learning and Data Mining, ser. KDD ’00, New York, USA, 2000, pp. 293–313.
[14] S. Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Herv´as, “Jclec: A java framework for evolutionary computation,” Soft Computing, vol. 12, no. 4, pp. 381–392, 2008.
Citation
S. Selvarani, M. Jeyakarthic, "Parameter-Free Algorithm for Mining Rare Association Rules", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.40-46, 2019.
Prediction of Data Ware House Model using Dynamic Function Point Analysis
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.47-51, Feb-2019
Abstract
Approach for estimation of Data Ware House(DWH) Projects/Data Marts using Function Point Analysis is an ETL Development, Enterprise. The Objectives are the burden of maintaining the composite and enterprise data model by the data warehouse is not directly recognized. Often a development team’s “hidden” efforts in delivering the support architecture of a data warehouse are compared unfavorably with more traditional, and highly visible, user functionality. Unlike other traditional data base systems, a Data Warehouse uses other software systems as data sources and does not create new information, which generally would be more static in nature. So, applying Function Point Analysis to Data warehouse applications became more tedious as Data warehousing itself has some peculiarities of its own compared to traditional OLTP(Online Transactional Processing) applications. Data Warehouse/Data Mart are a special type of applications, with particular characteristics such as the fact that the users only use the software system for queries and report generation and not for data update, the fact that development is based on existing data of other systems without generating new information, and the fact that it follows a different development process than the traditional OLTP software systems. It is necessary, therefore, to adapt (rather than to exactly follow) the size measurement approach defined for the traditional OLTP systems so that they consider the specific characteristics of Data warehouse/Data Mart and generate more accurate estimations. The proposed approach helps in estimating Data warehouse/Data Mart Projects using Function Point Analysis especially for ETL operations, in a more traditional and systemati c way.
Key-Words / Index Term
FPA - Function Point Analysis, OLAP - Online Analytical Processing, ETL - Extraction, Transformation, Loading, OLTP - Online Transactional Processing, DWH - Datawarehouse
References
[1]. INMON, W.H., Definition of a Data Warehouse.1999.
[2]. KIMBALL, R.,THORNTHWAITE, W.,REEVES, L..ROSS, M.,The Data Warehouse Lifecycle Toolkit.
[3]. New York: John Wiley & Sons, 1998. FENTON,N., PFLEEGER, S. Software Metrics A Rigorous & Practical Approach. Boston: PWS Publishing Company, 1997.
[4]. ISO/IEC 9126:2001.Software engineering Product quality.2001.
[5]. IFPUG. International Function Point Users Group. Function Point Counting Practices Manual: Release 4.1. Ohio: IFPUG. 2000.
[6]. Adapting function point analysis to estimate data mart size by Angelica ToffanoCalazans, MarcalDe Oliveira, RildoRibeiroDos Santos.
Citation
K. Bhuvaneswari , "Prediction of Data Ware House Model using Dynamic Function Point Analysis", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.47-51, 2019.
Self Organzied Wireless Sensor Network Model for Military Decentralized Applications
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.52-58, Feb-2019
Abstract
Developments in integrated circuit design technology are expected to make the mass production of sensor devices relatively inexpensive, and hence such large sensor networks are likely to be common.A cluster-based scheme is proposed as a solution for this problem. The proposed scheme extends First Input High Energy (FIHE) clustering algorithm and enables multi-hop transmissions among the clusters by incorporating the selection of cooperative sending and receiving nodes. We propose a sensor network architecture based on the cluster-tree based multi-hop model with optimized cluster head election and the corresponding node design method to meet the tactical requirements. In the earlier system, such types of networks for transmission of information are available but there is no security mechanisms for providing the security for that transmitted information. Because several attackers may enter into the network without any authentication and they can attack the network and they can access the data or service they require. With the proposed WSN architecture, one can easily design the sensor network for military usage in remote large scale environments.
Key-Words / Index Term
Military sensor networks, Architecture, Design, Self-organization, Cluster head election
References
[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: A survey. Computer Networks (Elsevier) Journal, pages 393–422,March 2002.
[2] V. Mhatre, C. Rosenberg, D. Kofman, R. Mazumdar, and N. Shroff. A minimum cost surveillance sensor network with a lifetime constraint. IEEE Transactions on Mobile Computing, 4(1):4–15, January 2005.
[3] W. Ye, J. Heidemann, and D. Estrin, An Energy-efficient MAC Protocol for Wireless Sensor Networks, Proc. IEEE INFOCOM 2002, (Jun. 2002), pp. 15671576.
[4] W. Ye, J. Heidemann and D. Estrin, Medium Access Control with Coordinated Adaptive Sleeping for Wireless Sensor Networks, IEEE, ACM Transactions on, Networking, Vol 12, No 3, (Jun. 2004), pp. 493–506
[5] A. Tanenbaum, Computer Networks, 4th ed., Prentice Hall, 2003.
[6] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, Energy- Efficient Com- M. Miller and N. Vaidya, Minimizing Energy Consumption in Sensor Networksmunication Protocols for Wireless Microsensor Networks, Proc. of the Hawaii International Conference on Systems Sciences (HICCS 2000), (Jan. 2000).
[7] Y.-B. Ko and N. H. Vaidya, Location-aided routing (LAR) in mobile ad hoc networks, ACM/Baltzer WINET J., vol. 6, no. 4, 2000, pp. 307– 21
[8] A. Manjeshwar and D. P. Agrawal, "TEEN: A Routing Protocol for Enhanced efficiency in Wireless Sensor Networks," in Proc. 15th Int. Parallel and Distributed Processing Symp. (IPDPS 2001), San Francisco, CA, April 2001.
[9] W. R. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, "An application-specific protocol architecturefor wireless microsensor networks," IEEE Trans. Wireless Commun., Vol. I, No.4, pp. 660-670, Oct. 2002.
[10] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan,―Energyefficient communication protocol for wireless microsensor networks,‖ in Proceedings of the 33rd Annual Hawaii International Conference on System Sciences(HICSS), January 2000, pp. 3005–3014. [Online].Available:citeseer.ist.psu.edu/rabinerheinzelman00energyefficient.html
[11] Timothy J. Shepard, A channel access scheme for large dense packet radio networks (SIG COMM 96) pp 219-230.
Citation
T. Manivannan, "Self Organzied Wireless Sensor Network Model for Military Decentralized Applications", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.52-58, 2019.