An Efficient Duplicate Detection Algorithm Using Data Cleansing
Survey Paper | Journal Paper
Vol.07 , Issue.04 , pp.277-280, Feb-2019
Abstract
The aim of the technique is to minimize the data duplication in the web mining patterns during the time of web based search in large data mining applications. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this system present a novel method for XML duplicate detection, called XML Dup. XML Dup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the un optimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several data sets. XML Dup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of efficiency and of effectiveness.
Key-Words / Index Term
Duplicate Detection, Network Evaluation, Efficiency, Effectiveness
References
[1] S. R. Alenazi and Kamsuriah, “Record Duplication Detection in Database: A Review,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 6, no. 6, pp. 838–845, 2016.
[2] F. N. Mahmood and A. Ismail, “Semantic Similarity Measurement Methods: The State-of-the-art,” Res. J. Appl. Sci. Eng. Technol., vol. 8, no. 18, p. 1923–1932., 2014.
[3] A. Osama, Helmi, “A Comparative Study of Duplicate Record Detection Techniques,” Middle East, 2012.
[4] D. Vatsalan and P. Christen, “Privacy-Preserving Matching Of Similar Patients,” J. Biomed. Inform., vol. 59, pp. 285–298, 2016.
[5] Christenp and Timc, “Freely Extensible Biomedical Record Linkage,” 2013. [Online]. Available: https://sourceforge.net/projects/febrl/.
[6] M. G. Elfeky, V. S. Verykios, and A. K. Elmagarmid, “TAILOR: A Record Linkage Toolbox,” Proc. 18th Int. Conf. Data Eng., pp. 17–28, 2002.
[7] W. E. Yancey, “Big Match: A Program For Extracting Probably Matches From A Large File For Record Linkage,” Computing, vol. 1, no. 1, pp. 1– 8, 2002.
[8] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1–16, Jan. 2007.
[9] W. H.Gomaa and A. A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 13–18, Apr. 2013.
[10] R. T. Nakatsu and E. B. Grossman, “A Task-Fit Model of Crowdsourcing: Finding the Right Crowdsourcing Approach to Fit the Task,” J. Inf. Sci., pp. 1–11, 2014.
[11] “SemEval-2015 The 9th International Workshop on Semantic Evaluation,” New York 12571 USA, 2015.
[12] Nirmalrani V, E. P. Sim, and Arun PR, “Detection of near duplicate web pages using four stage algorithm,” in 2015 International Conference onCommunications and Signal Processing (ICCSP), 2015, pp. 0644–0648.
[13] Y. Jiang, G. Li, J. Feng, and W. Li, “String Similarity Joins: An Experimental Evaluation,” Vldb, pp. 625–636, 2014.
[14] P. A. V. Hall and G. R. Dowling, “Approximate String Matching,” ACM Comput.Surv., vol. 12, no. 4, pp. 381–402, 1980.
[15] J. L. Peterson, “Computer Programs For Detecting And Correcting Spelling Errors,” Commun. ACM, vol. 23, no. 12, pp. 676–687, Dec.1980.
[16] V. Wandhekar, “Validation of Deduplication in Data using Similarity Measure,” Int. J. Comput.Appl., vol. 116, no. 21, pp. 18–22, 2015.
[17] K. Williams and C. L. Giles, “Near Duplicate Detection In An Academic Digital Library,” Proc. 2013 ACM Symp. Doc. Eng. - DocEng ’13, pp. 91–94, 2013.
[18] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Comput.Networks ISDN Syst., vol. 29, no. 8, pp. 1157–1166, 1997.
[19] K. Dreßler and A.-C. N. Ngomo., “On the Efficient Execution of Bounded Jaro-Winkler Distances,” Semant. Web 8, vol. 0, no. 0, pp. 1–13, 2017.
[20] S. B. Needleman and C. D. Wunsch, “A General Method Applicable To The Search For Similarities In The Amino Acid Sequence Of Two Proteins,” J. Mol. Biol., vol. 48, no. 3, pp. 443–453, 1970.
[21] T. F. Smith and M. S. Waterman, “Identification Of Common Molecular Subsequences,” J. Mol. Biol., vol. 147, no. 1, pp. 195–197, Mar. 1981.
Citation
J. Selvi, R. Gayathri, "An Efficient Duplicate Detection Algorithm Using Data Cleansing", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.277-280, 2019.
A Review of Customer Churn Prediction Related Issues Using Data Mining Methods
Review Paper | Journal Paper
Vol.07 , Issue.04 , pp.281-284, Feb-2019
Abstract
Customer churn prediction is a challenging target but a very necessary and essential in emerging service-oriented businesses. It is also one of the important issues in customer relationship management. To predict a customer there is a number of data mining techniques applied for churn prediction, this paper reviews some recent developments and compares them in terms of data pre-processing and prediction techniques.
Key-Words / Index Term
Customer Churn, Customer Retention, Customer Relationship Management, Logistic regression, Linear regression, Knowledge discovery, Data mining
References
[1]. Xiaohua Hu, A Data Mining Approach for Retailing Bank Customer Attrition Analysis, Applied Intelligence 22, 47–60, 2005.
[2]. Vandana Ahuja, Yajulu Medury, Corporate blogs as tools for consumer segmentation-using cluster analysis for consumer profiling, Journal of Targeting, Measurement and Analysis for Marketing 19, 173 – 182, 2011.
[3]. David L. García, Àngela Nebot, Alfredo Vellido, Intelligent data analysis approaches to churn as a business problem: a survey, Knowl Inf Syst 51:719–774, 2017.
[4]. Nadeem Ahmad Naz, Umar Shoaib and M. Shahzad Sarfraz, A Review on Customer Churn Prediction Data Mining Modeling Techniques, Indian Journal of Science and Technology, Vol 11(27), 2018.
[5]. Aleksandar J. Petkovski, Biljana L. Risteska Stojkoska, Kire V. Trivodaliev, and Slobodan A. Kalajdziski, Analysis of Churn Prediction: A Case Study on Telecommunication Services in Macedonia, 24th Telecommunications forum TELFOR 2016 Serbia, Belgrade, November 22-23, 2016.
[6]. Praveen Asthana, A comparison of machine learning techniques forcustomer churn prediction, International Journal of Pure and Applied MathematicsVolume 119 No. 10, 1149-1169, 2018
Citation
S. Venkatesh, M. Jeyakarthic, "A Review of Customer Churn Prediction Related Issues Using Data Mining Methods", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.281-284, 2019.
K-Subspaces Quantization for Approximate Nearest Neighbour Search
Survey Paper | Journal Paper
Vol.07 , Issue.04 , pp.285-288, Feb-2019
Abstract
Approximate Nearest Neighbour (ANN) search has become a popular approach for performing fast and efficient retrieval on very large-scale datasets in recent years, as the size and dimension of data grow continuously. In this paper, we propose a novel vector quantization method for ANN search which enables faster and more accurate retrieval on publicly available datasets. We define vector quantization as a multiple affine subspace learning problem and explore the quantization centroids on multiple affine subspaces. We propose an iterative approach to minimize the quantization error in order to create a novel quantization scheme, which outperforms the state-of-the-art algorithms. The computational cost of our method is also comparable to that of the competing methods.
Key-Words / Index Term
Approximate Nearest Neighbour Search, Binary Codes, Large-Scale Retrieval, Subspace Clustering, Vector Quantization.
References
[1] P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” Proc. thirtieth Annu. ACM Symp. Theory Comput., pp. 604–613, 1998.
[2] J. Wang, H. T. Shen, J. Song, and J. Ji, “Hashing for Similarity Search: A Survey,” in arXiv preprint, 2014, p. :1408.2927.
[3] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-Sensitive Hashing Scheme Based on P-stable Distributions,” in SCG, 2004, p. 253.
[4] K. Terasawa and Y. Tanaka, “Spherical LSH for Approximate Nearest Neighbor Search on Unit Hypersphere,” in WADS, 2007, pp. 27–38.
[5] X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood Preserving Embedding,” in ICCV, 2005.
[6] H. Jegou, M. Douze, C. Schmid, and P. Perez, “Aggregating local descriptors into a compact image representation,” in CVPR, 2010, pp. 3304–3311.
[7] J. Heo, Y. Lee, and J. He, “Spherical hashing,” in CVPR, 2012.
[8] A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik, “Asymmetric distances for binary embeddings.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 33–47, Jan. 2014.
[9] W. Dong, M. Charikar, and K. Li, “Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces,” in SIGIR, 2008, p. 123.
[10] S. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129–137, 1982.
[11] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, 2010.
[12] H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 117–28, Jan. 2011.
[13] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach to learning binary codes,” in CVPR, 2011, pp. 817–824.
[14] J. Brandt, “Transform coding for fast approximate nearest neighbor search in high dimensions,” in CVPR, 2010, pp. 1815–1822.
[15] T. Ge, K. He, Q. Ke, and J. Sun, “Optimized Product Quantization.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, pp. 1–12, Dec. 2014.
[16] M. Norouzi and D. J. Fleet, “Cartesian K-Means,” in CVPR, 2013, pp. 3017–3024.
[17] J.-P. Heo, Z. Lin, and S.-E. Yoon, “Distance Encoded Product Quantization,” in CVPR, 2014, pp. 2139–2146.
[18] Y. Kalantidis and Y. Avrithis, “Locally Optimized Product Quantization for Approximate Nearest Neighbor Search,” in CVPR, 2014.
[19] J. Wang, J. Wang, J. Song, X.-S. Xu, H. T. Shen, and S. Li, “Optimized Cartesian K-Means,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 1, pp. 180–192, Jan. 2015.
[20] A. Babenko and V. Lempitsky, “Additive Quantization for Extreme Vector Compression,” in CVPR, 2014, pp. 931–938.
[21] T. Zhang, D. Chao, and J. Wang, “Composite Quantization for Approximate Nearest Neighbor Search,” in ICML, 2014.
[22] A. Babenko and V. Lempitsky, “Tree Quantization for Large-Scale Similarity Search and Classification,” in CVPR, 2015.
[23] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2325–2383, 1998.
[24] N. Kambhatla and T. K. Leen, “Dimension Reduction by Local Principal Component Analysis,” Neural Comput., vol. 9, no. 7, pp. 1493–1516, Oct. 1997.
[25] V. Gassenbauer, J. Křivánek, K. Bouatouch, C. Bouville, and M. Ribardière, “Improving Performance and Accuracy of Local PCA,” Comput. Graph. Forum, vol. 30, no. 7, pp. 1903–1910, Sep. 2011.
[26] P. Agarwal and N. Mustafa, “k-Means Projective Clustering,” in SIGMOD, 2004, pp. 155–165.
[27] C. M. Bishop, “Bayesian PCA,” in NIPS, 1999, vol. 11, pp. 382–388.
[28] E. C. Ozan, S. Kiranyaz, and M. Gabbouj, “M-PCA Binary Embedding For Approximate Nearest Neighbor Search,” in BigDataSE, 2015.
[29] M. Gallagher, “Proportionality, disproportionality and electoral systems,” Electoral Studies, vol. 10. pp. 33–51, 1991.
[30] A. Babenko and V. Lempitsky, “The inverted multi-index,” in CVPR, 2012, vol. 14, no. 1–3, pp. 3069–3076.
[31] H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg, “Searching in one billion vectors: Re-rank with source coding,” ICASSP, no. 3, pp. 861–864, 2011.
Citation
A. Ramya, S. Sangeetha, "K-Subspaces Quantization for Approximate Nearest Neighbour Search", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.285-288, 2019.
A Novel Recommender System based on Artificial Neural Network Learning Vector Quantization Classification Approach
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.289-299, Feb-2019
Abstract
Recommender systems have become more important in various domains for lessening the issue of information overload. Traditional Recommender Systems are Collaborative filtering method and Content based filtering method.However, these recommendation methods suffer from data sparsity and cold start problem. So this paper proposes an ANN based recommender system. Artificial Neural Network –Learning Vector Quantization (ANNLVQ) and Optimized Learning Vector Quantization (ANNOLVQ) algorithms are used todevelop a multi-categorical classification model that predicts the class of a rating in recommender systems. In this proposed research, the problem of predicting the rating as a multi-label classification problem is considered where each rating has treated a label. Book dataset used for this proposed research. ANN recommender systems accuracy compared with collaborative filtering method recommender system and ANN recommender systems predicts more accuracy than traditional collaborative filtering method.
Key-Words / Index Term
Artificial Neural Network, collaborative filtering, Learning Vector Quantization, Book Recommendation, Recommender systems.
References
[1]. Alharthi, H., Inkpen, D., & Szpakowicz, S. A survey of book recommender systems. Journal of Intelligent Information Systems, 51(1), 139-160, (2018).
[2]. Jakomin, M., Curk, T., & Bosnić, Z. Generating inter-dependent data streams for recommender systems. Simulation Modelling Practice and Theory, 88, 1-16,(2018).
[3]. Scholz, M., Dorner, V., Schryen, G., & Benlian, A. A configuration-based recommender system for supporting e-commerce decisions. European Journal of Operational Research, 259(1), 205-215, (2017).
[4]. Lin, K. P., Shen, C. Y., Chang, T. L., & Chang, T. M. A Consumer Review-Driven Recommender Service for Web E-Commerce. In Service-Oriented Computing and Applications (SOCA), 2017 IEEE 10th International Conference on (pp. 206-210), 2017.
[5]. Smith, B., & Linden, G. Two decades of recommender systems at Amazon. com. Ieee internet computing, 21(3), 12-18, (2017).
[6]. Zhang, F., Lee, V. E., Jin, R., Garg, S., Choo, K. K. R., Maasberg, M., ... & Cheng, C. Privacy-aware smart city: A case study in collaborative filtering recommender systems. Journal of Parallel and Distributed Computing, (2018).
[7]. Kaur, H., Kumar, N., & Batra, S. An efficient multi-party scheme for privacy-preserving collaborative filtering for healthcare recommender system. Future Generation Computer Systems, (2018).
[8]. Nilashi, M., Ibrahim, O., & Bagherifard, K. A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications, 92, 507-520, (2018).
[9]. Yang, B., Lei, Y., Liu, J., & Li, W. Social collaborative filtering by the trust. IEEE transactions on pattern analysis and machine intelligence, 39(8), 1633-1647, (2017).
[10]. Liu, Y., Wang, S., Khan, M. S., & He, J. A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering. Big Data Mining and Analytics, 1(3), 211-221, (2018).
[11]. Li, Y., Wang, D., He, H., Jiao, L., & Xue, Y. Mining intrinsic information by matrix factorization-based approaches for collaborative filtering in recommender systems — Neurocomputing, 249, 48-63, (2017).
[12]. Zafari, F., & Moser, I. Modeling socially-influenced conditional preferences over feature values in recommender systems based on factorized collaborative filtering. Expert Systems with Applications, 87, 98-117, (2017).
[13]. Bampis, C. G., Rusu, C., Hajj, H., & Bovik, A. C. Robust Matrix Factorization for Collaborative Filtering in Recommender Systems. In Asilomar Conf. on Signals, Systems, and Computers, (2017).
[14]. Mazze, A. (2017). Recommender system using ANN. Neural Networks & Machine Learning, 1(1), 3-3.
[15]. Lee, D. H., & Brusilovsky, P.Improving personalized recommendations using community membership information. Information Processing & Management, 53(5), 1201-1214, (2017).
[16]. Suglia, A., Greco, C., Musto, C., de Gemmis, M., Lops, P., & Semeraro, G. A deep architecture for content-based recommendations exploiting recurrent neural networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (pp. 202-211). ACM, (2017).
[17]. Devooght, R., & Bersini, H. Long and short-term recommendations with recurrent neural networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (pp. 13-21). ACM, (2017).
[18]. Yi, B., Shen, X., Zhang, Z., Shu, J., & Liu, H. Expanded autoencoder recommendation framework and its application in movie recommendation. In Software, Knowledge, Information Management & Applications (SKIMA), 2016 10th International Conference on(pp. 298-303). IEEE, (2016).
[19]. Veugen, T., & Erkin, Z. Content-based recommendations with approximate integer division. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 1802-1806), (2015).
[20]. Shen, X., Yi, B., Zhang, Z., Shu, J., & Liu, H. Automatic Recommendation Technology for Learning Resources with Convolutional Neural Network. In Educational Technology (ISET), 2016 International Symposium on (pp. 30-34), (2016).
[21]. Chen, L., & Wang, F. Explaining recommendations based on feature sentiments in product reviews. In Proceedings of the 22nd International Conference on Intelligent User Interfaces (pp. 17-28). ACM, (2017).
[22]. Paradarami, T. K., Bastian, N. D., & Wightman, J. L. A hybrid recommender system using artificial neural networks. Expert Systems with Applications, 83, 300-313, (2017).
[23]. Tewari, A. S., & Barman, A. G. Sequencing of items in personalized recommendations using multiple recommendation techniques. Expert Systems with Applications, 97, 70-82, (2018).
[24]. Liu, D. R., Chen, K. Y., Chou, Y. C., & Lee, J. H. Online recommendations based on a dynamic adjustment of recommendation lists. Knowledge-Based Systems, (2018).
[25]. de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Redondo-Expósito, L. Positive unlabeled learning for building recommender systems in a parliamentary setting. Information Sciences, 433, 221-232, (2018).
[26]. Lee, S. J., Xu, Z., Li, T., & Yang, Y. A novel bagging C4. 5 algorithm based on wrapper feature selection for supporting wise clinical decision making. Journal of biomedical informatics, 78, 144-155, (2018).
[27]. Li, T., Fu, K., Choi, M., Liu, X., & Chen, Y. Toward Robust and Efficient Training of Generative Adversarial Networks with Bayesian Approximation. In the Approximation Theory and Machine Learning Conference, (2018).
[28]. Liu, Y., Wang, S., Khan, M. S., & He, J. A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering. Big Data Mining and Analytics, 1(3), 211-221, (2018).
[29]. Li, Y., Wang, D., He, H., Jiao, L., & Xue, Y. Mining intrinsic information by matrix factorization-based approaches for collaborative filtering in recommender systems — Neurocomputing, 249, 48-63, (2017).
[30]. Zafari, F., & Moser, I. Modeling socially-influenced conditional preferences over feature values in recommender systems based on factorized collaborative filtering. Expert Systems with Applications, 87, 98-117, (2017).
[31]. Bampis, C. G., Rusu, C., Hajj, H., & Bovik, A. C. Robust Matrix Factorization for Collaborative Filtering in Recommender Systems. In Asilomar Conf. on Signals, Systems, and Computers, (2017).
[32]. Mazze, A. (2017). Recommender system using ANN. Neural Networks & Machine Learning, 1(1), 3-3.
[33]. Lee, D. H., & Brusilovsky, P.Improving personalized recommendations using community membership information. Information Processing & Management, 53(5), 1201-1214, (2017).
[34]. Suglia, A., Greco, C., Musto, C., de Gemmis, M., Lops, P., & Semeraro, G. A deep architecture for content-based recommendations exploiting recurrent neural networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (pp. 202-211). ACM, (2017).
[35]. Devooght, R., & Bersini, H. Long and short-term recommendations with recurrent neural networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (pp. 13-21). ACM, (2017).Yi, B., Shen, X., Zhang, Z., Shu, J., & Liu, H. Expanded autoencoder recommendation framework and its application in movie recommendation. In Software, Knowledge, Information Management & Applications (SKIMA), 2016 10th International Conference on(pp. 298-303). IEEE, (2016).
[36]. Veugen, T., & Erkin, Z. Content-based recommendations with approximate integer division. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 1802-1806), (2015).
[37]. Shen, X., Yi, B., Zhang, Z., Shu, J., & Liu, H. Automatic Recommendation Technology for Learning Resources with Convolutional Neural Network. In Educational Technology (ISET), 2016 International Symposium on (pp. 30-34), (2016).
[38]. Chen, L., & Wang, F. Explaining recommendations based on feature sentiments in product reviews. In Proceedings of the 22nd International Conference on Intelligent User Interfaces (pp. 17-28). ACM, (2017).
[39]. Paradarami, T. K., Bastian, N. D., & Wightman, J. L. A hybrid recommender system using artificial neural networks. Expert Systems with Applications, 83, 300-313, (2017).
[40]. Tewari, A. S., & Barman, A. G. Sequencing of items in personalized recommendations using multiple recommendation techniques. Expert Systems with Applications, 97, 70-82, (2018).
[41]. de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Redondo-Expósito, L. Positive unlabeled learning for building recommender systems in a parliamentary setting. Information Sciences, 433, 221-232, (2018).
[42]. Lee, S. J., Xu, Z., Li, T., & Yang, Y. A novel bagging C4. 5 algorithm based on wrapper feature selection for supporting wise clinical decision making. Journal of biomedical informatics, 78, 144-155, (2018).
[43]. Li, T., Fu, K., Choi, M., Liu, X., & Chen, Y. Toward Robust and Efficient Training of Generative Adversarial Networks with Bayesian Approximation. In the Approximation Theory and Machine Learning Conference, (2018).
[44]. Li, T., Fu, K., Choi, M., Liu, X., & Chen, Y. Toward Robust and Efficient Training of Generative Adversarial Networks with Bayesian Approximation. In the Approximation Theory and Machine Learning Conference, (2018).
[45]. Belka, A., Fischer, M., Pohlmann, A., Beer, M., & Höper, D. (LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach. Virus research, 258, 55-63, (2018)
Citation
S. Prasanna Priya, M. Karthikeyan, "A Novel Recommender System based on Artificial Neural Network Learning Vector Quantization Classification Approach", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.289-299, 2019.
Web Network Statistics Generator and Analyzer
Survey Paper | Journal Paper
Vol.07 , Issue.04 , pp.300-303, Feb-2019
Abstract
This project “Web Network Statistics generator and analyzer” has been developed to provide the better security mechanism for intranet services and reporting. There are a range of functions and reports that provide extensive information on complete process of official mail management and file blocker for different set of users those who are logged into this system. It also handles website status and produce a modification guides. This web tracking system is developing by using the PHP as front end with SQLSERVER as back end. The project consists of various modules such as admin, user, registration, web upload, web user monitoring, file transfer watcher, Email blocker. It verifies the hidden images and verifies links & URL’s Connections. It includes the secured gateway system for the entire server system and transactions performing on the process based actions with perspective to the web product development. This system has a range of functions and reports that provide extensive information on all website status and produce a modification guides. This system is platform independent and can be used by any kind of learner. It is designed with robust and easy to implement in any network places. It is highly scalable and provides the feasible updates to make this project efficient and handle by any kind of user.
Key-Words / Index Term
web network, statistics, generator, gateway, file blocker
References
[1]. Advanced PHP Programming by Schlossnagle.Sams. Paperback- October 2003.
[2]. Beginning PHP, MySQL and Apache. Wrox Press Ltd. Paperback- 1 June, 2003.
[3]. Making Use of PHP by Appu.John Wiley & Sons Inc. Paperback- 24 July, 2002
[4]. PHP and MySQL Web Development by Luke Welling, Laura Thomson.Sams. Paperback- 30 March, 2001.
[5]. PHP Bible by Converse.John Wiley & Sons Inc. Paperback- 4 October, 2002.
Citation
B. Sivaranjani, M. Priyadharshini, "Web Network Statistics Generator and Analyzer", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.300-303, 2019.
Green Wave Sleep Scheduling Algorithm in Wireless Network
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.304-306, Feb-2019
Abstract
The nodes in a wireless network to sleep periodically can save energy, it also incurs higher latency and lower throughput. We consider the problem of designing optimal sleep schedules in wireless networks, and show that finding sleep schedules that can minimize the latency over a given subset of source-destination pairs is NP-hard. We proposed green-wave sleep-scheduling (GWSS)—inspired by synchronized traffic lights—for scheduling sleep-wake slots and routing data on duty-cycling wireless adhoc networks. We also derive a latency lower bound given by d + O(1/p) for any sleep schedule with a required active rate (i.e., the fraction of active slots of each node) p, and the shortest path length d. We offer a novel solution to optimal sleep scheduling using green-wave sleep scheduling (GWSS), inspired by coordinated traffic lights, which is shown to meet our latency lower bound (hence is latency-optimal) for topologies such as the line, grid, ring, torus and tree networks, under light traffic. For high traffic loads, we propose non-interfering GWSS, which can achieve the maximum throughput scaling law given by T(n,p) = ¿(p/¿n) bits/sec on a grid network of size n, with a latency scaling law D(n,p) = O(¿n) + O(1/p).
Key-Words / Index Term
Wireless Network, Sleep Scheduling, Greenwave, High Latency, Low Throughput
References
[1] S. Guha, C.-K.Chau, and P. Basu, “Green wave: Latency and capacity efficient sleep scheduling for wireless networks,” in Proc. IEEE INFOCOM, 2010.
[2] W. Ye, J. Heidemann, and D. Estrin, “Medium access control with coordinated adaptive sleeping for wireless sensor networks,” IEEE/ACM Trans. Netw., vol. 12, pp. 493–506, June 2004.
[3] J. Redi, S. Kolek, K. Manning, C. Partridge, R. Rosales-Hain, R. Ramanathan, and I. Castineyra, “Javelen: An ultra-low energy ad hoc wireless network,” Ad Hoc Networks Journal, vol. 5, no. 8, 2008.
[4] P. Basu and C.-K.Chau, “Opportunistic forwarding in wireless networks with duty cycling,” in Proc. ACM Workshop on Challenged Networks (CHANTS), September 2008.
[5] J. Polastre, J. Hill, and D. Culler, “Versatile low power media access for wireless sensor networks,” in Proc. ACM SenSys, 2004.
Citation
K.R. Devi, K. Harini Priya, "Green Wave Sleep Scheduling Algorithm in Wireless Network", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.304-306, 2019.
Crawling Hidden Objects with KNN Queries
Survey Paper | Journal Paper
Vol.07 , Issue.04 , pp.307-309, Feb-2019
Abstract
Many websites offering Location Based Services (LBS) provide a k NN search interface that returns the top-k nearest-neighbor objects (e.g., nearest restaurants) for a given query location. This paper addresses the problem of crawling all objects efficiently from an LBS website, through the public k NN web search interface it provides. Specifically, we develop crawling algorithm for 2D and higher-dimensional spaces, respectively, and demonstrate through theoretical analysis that the overhead of our algorithms can be bounded by a function of the number of dimensions and the number of crawled objects, regardless of the underlying distributions of the objects. We also extend the algorithms to leverage scenarios where certain auxiliary information about the underlying data distribution, e.g., the population density of an area which is often positively correlated with the density of LBS objects, is available. Extensive experiments on real-world datasets demonstrate the superiority of our algorithms over the state-of-the-art competitors in the literature.
Key-Words / Index Term
Knn, Hidden Objects, Location Based Services, Crawling, Density
References
[1] Mcdonalds, “Mcdonalds page, http://www.mcdonalds.com/,” [Accessed: Aug. 6, 2014]. [Online]. Available: url{http://www.mcdonalds.com/us/en/restaurant locator.html}
[2] S. Byers, J. Freire, and C. T. Silva, “Efficient acquisition of web data through restricted query interfaces,” in Poster Proceedingsof the Tenth International World Wide Web Conference, WWW 10,Hong Kong, China, May 1-5, 2001, 2001. [Online].Available:http://www10.org/cdrom/posters/1051.pdf
[3] W. D. Bae, S. Alkobaisi, S. H. Kim, S. Narayanappa, and C. Shahabi, “Web data retrieval: solving spatial range queries using k-nearest neighbor searches,” Geoinformatica, vol. 13, no. 4, pp. 483–514, 2009.
[4] G. E. Glasses, “Great eye glasses page, http://www.greateyeglasses.com/shop/search.php,” [Ac cessed: Jan. 20, 2014]. [Online]. Available: url{http://www.greateyeglasses.com/shop/search.php}
[5] Yahoo, “Yahoo local page, https://local.yahoo.com/,”[Accessed: Dec. 2012]. [Online]. Available: url{https://local.yahoo.com/}
[6] U. Census, “Us census, http://www.census.gov/cgibin/geo/shapefiles2013/layers.cgi,” [Accessed: Dec. 2013].[Online]. Available: url{http://www.census.gov/cgi-bin/geo/shapefiles2013/layers.cgi}
[7] L. Devroye, “Sample-based non-uniform random variate generation,” in Proceedings of the 18th conference on Winter simulation. ACM, 1986, pp. 260–265.
[8] L. Barbosa and J. Freire, “Siphoning hidden-web data throughkeyword-based interfaces,” in SBBD, 2004, pp. 309–321.
[9] A. Ntoulas, P. Pzerfos, and J. Cho, “Downloading textualhidden web content through keyword queries,” in DigitalLibraries, 2005.JCDL’05.Proceedings of the 5th ACM/IEEE-CSJoint Conference on. IEEE, 2005, pp. 100–109.
[10] K. Vieira, L. Barbosa, J. Freire, and A. Silva, “Siphon++: ahidden-webcrawler for keyword-based interfaces,” in Proceedings of the 17th ACM conference on Information and knowledgemanagement. ACM, 2008, pp. 1361–1362.
[11] L. Jiang, Z. Wu, Q. Feng, J. Liu, and Q. Zheng, “Efficient deep web crawling using reinforcement learning,” in Advances inKnowledge Discovery and Data Mining. Springer, 2010, pp. 428–439.
[12] S. Raghavan and H. Garcia-Molina, “Crawling the hidden web,” in VLDB 2001, Proceedings of 27th InternationalConference on Very Large Data Bases, September 11-14, 2001,Roma, Italy, 2001, pp. 129–138. [Online]. Available: http://www.vldb.org/conf/2001/P129.pdf
[13] S. W. Liddle, D. W. Embley, D. T. Scott, and S. H. Yau, “Extracting data behind web forms,” in ConceptualModeling - ER 2002, 21st International Conference on ConceptualModeling, Tampere, Finland, October 7-11, 2002, Proceedings,
2002, pp. 402–413. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-45275-1 35
[14] P. Wu, J. Wen, H. Liu, and W. Ma, “Query selection techniques for efficient crawling of structured web sources,” in Proceedingsof the 22nd International Conference on Data Engineering, ICDE2006, 3-8 April 2006, Atlanta, GA, USA, 2006, p. 47. [Online]. Available: http://dx.doi.org/10.1109/ICDE.2006.124
[15] M. Alvarez, J. Raposo, A. Pan, F. Cacheda, F. Bellas, and ´ V. Carneiro, “Crawling the content hidden behind web forms,” in Computational Science and Its Applications–ICCSA 2007. Springer, 2007, pp. 322–333.
Citation
P. Krithika, G. Sowmiya, "Crawling Hidden Objects with KNN Queries", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.307-309, 2019.
Predicting Ticket Sales Using Web-Based External Factors and Box-Office Data
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.310-312, Feb-2019
Abstract
Posting online reviews and rating their satisfaction purchased products has become an increasingly popular way to share the information for anonymous candidates who has interest in purchasing the product. In addition, people leave their interests and near-future purchasing plan on the web such as search history and search query volume. From this phenomenon, the prediction of sales performance is possible in many products by mining the data sets which are left on the web by consumers’ online activities. In this paper, we focused on the movie ticket sales which word-of-mouth effect is prominent, and our goal is to forecast the sales performance of the near-weekend using box-office data and external factors such as online reviews, star ratings and search volume. For this work, we gather 1.7 million online reviews and movie ratings, and we also gather the daily search volume of movies’ title for past three years. Using machine learning techniques and linear modeling, we develop a model for high-accuracy predicting of ticket sales on near-future. We also analyze a relationship between ticket sales performance on weekends and box-office data, online reviews, star ratings, and search volume. Through this work, we support to decide the ideal number of screens for a given weekend, thus it contributes to a substantial increase in the rate of profit on movie markets.
Key-Words / Index Term
Box-Office Data, Online Reviews, Star Ratings, Ticket Sales
References
[1] W. Duan, B. Gu, and A. B. Whinston, “Do online reviews matter ?An empirical investigation of panel data,” vol. 45, pp. 1007–1016, 2008.
[2] S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts, “What Can Search Predict ?”WWW ’10, 2010.
[3] X. Yu, Y. Liu, X. Huang, and A. An, “Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 720–734, Apr. 2012. [Online]. Available: http: //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5677530
[4] G. Kulkarni, P. K. Kannan, and W. Moe, “Using online search data to forecast new product sales,” Decision Support Systems, vol. 52, no. 3, pp. 604–611, 2012.[Online]. Available: http: //dx.doi.org/10.1016/j.dss.2011.10.017
[5] L. Zhang, J. Luo, and S. Yang, “Forecasting box office revenue of movies with BP neural network,” Expert Systems with Applications, vol. 36, no. 3, pp. 6580–6587, Apr. 2009. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S095741740800496X
[6] K. J. Lee and W. Chang, “Bayesian belief network for box-office performance: A case study on Korean movies,” Expert Systems withApplications, vol. 36, no. 1, pp. 280–291, Jan. 2009. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0957417407004228
[7] MPAA. 2012 theatrical statistics summary.[Online]. Available: http:// www.mpaa.org/resources/3037b7a4-58a2-4109-8012-58fca3abdf1b.pdf
[8] R. Sharda and D. Delen, “Predicting box-office success of motion pictures with neural networks,” Expert Systems with Applications, vol. 30, no. 2, pp. 243–254, Feb. 2006. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0957417405001399
[9] E. V. Karniouchina, “Impact of star and movie buzz on motion picture distribution and box office revenue,” International Journal of Researchin Marketing, vol. 28, no. 1, pp. 62–74, Mar. 2011. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0167811610000881
[10] H. Rui, Y. Liu, and A. Whinston, “Whose and what chatter matters? The effect of tweets on movie sales,” Decision SupportSystems, vol. 55, no. 4, pp. 863–870, Nov. 2013.[Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0167923612003880
[11] J. Du, H. Xu, and X. Huang, “Box office prediction based on microblog,” Expert Systems with Applications, vol. 41, no. 4, pp. 1680–1689, Mar. 2014. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0957417413006866
[12] S. Moon, P. K. Bergey, and D. Iacobucci, “Dynamic effects among movie ratings, movie revenues, and viewer satisfaction,” Journal ofMarketing, vol. 74, no. 1, pp. 108–121, 2010.
[13] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” Advances in neural informationprocessing systems, vol. 9, pp. 155–161, 1997.
[14] I. Basheer and M. Hajmeer, “Artificial neural networks: fundamentals, computing, design, and application,” Journal of microbiological methods, vol. 43, no. 1, pp. 3–31, 2000.
Citation
M. Keerthana, J. Rabacca Cinthiya, "Predicting Ticket Sales Using Web-Based External Factors and Box-Office Data", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.310-312, 2019.
Web Broadcasting and Multi User System
Research Paper | Journal Paper
Vol.07 , Issue.04 , pp.313-315, Feb-2019
Abstract
This system has been successfully developed and hosted in the different server configurations and undergoes different kind of tests in the environment of multi user systems any number of user can enjoy the videos with multi access. It has enough functionality beyond web TV to any home entertainment system. It has a quick access to the thousands of video files available in various categories. It provides a high quality performance for the videos to be watched out by the viewers. When multiple video sources are live-encoded and transmitted over a common wireless network, each stream needs to adapt its encoding parameters to wireless channel fluctuations, so as to avoid congesting the network. We present a stochastic system model for analyzing multi-user congestion control for live video coding and streaming over a wireless network. Variations in video content complexities and wireless channel conditions are modeled as independent Markov processes, which jointly determine the bottleneck queue size of each stream. Interaction among multiple users is captured by a simple model of random traffic contention. Using the model, we investigate two distributed congestion control policies: the approach based on stochastic dynamic programming (SDP) and a greedy heuristic. Compared to fixed-quality coding with no congestion control, performance gains in the range of 0.5-1.3 dB in average video quality are reported for both schemes from simulation results under various video and channel characteristics.
Key-Words / Index Term
Broadcasting, Multiuser, Congestion Control, Fixed Quality, Channel
References
[1] TutunJuhana, On the design of FM broadcasting remote monitoring system, Proceeding of 2015 1st International Conference on Wireless and Telematics, ICWT 2015
[2] http://www.audemat.com/
[3] http://www.devabroadcast.com/
[4] http://davicom.com
[5] butt - broadcast using this tool, diaksesmelalui http://www.danielnoethen.de/
[6] Icecast, diaksesmelalui http://icecast.org/
Citation
A. Sangeetha, M. Tamilarasi, "Web Broadcasting and Multi User System", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.313-315, 2019.
Appraisal Evaluation Tool for College Staff
Survey Paper | Journal Paper
Vol.07 , Issue.04 , pp.316-319, Feb-2019
Abstract
Appraisal is an expert estimate of the value of something. Staff appraisal can be tricky. If you get this wrong, you will negatively impact your staff. This intranet web application will provide the feasible solution to boost the production of each of your staffs, regardless of their current achievement level. An effective staff appraisal system ought to validate by the students of the department. This aspect of the performance appraisal process will begin anew unless the employee is scheduled to participate in the Supervisory Performance Appraisal process or the process is warranted as a result of the Performance Growth Plan, or the staff’s job description has changed significantly. The evaluation cycle outlined below provides the employer with the opportunity to assess and evaluate the performance of the teacher on the district-adopted teacher performance evaluation criteria. Throughout the course of the evaluation cycle, strengths and areas of growth will be identified and communicated to staff. The assessment tool used for staff contains criteria like specific behaviours, knowledge, presentation skills that pertain to all staff. It is assumed that all staffs of the college are professional and, as such, will perform duties with integrity, and maintain a positive, vigilant attitude toward student physical safety and emotional well being.
Key-Words / Index Term
Appraisal System, Supervisory Performance, Svm, College Staff, Evaluation Tool
References
[1] H. Vogt, “On the mechanism of the anode effect in aluminium electrolysis,” Metallurgical & Materials Transactions B, vol. 31, pp 1225–1230, December 2000.
[2] K. Zhou, Z. Lin, D. Yu, B. Cao, Z. Wang, et al., “Cell resistance slope combined with LVQ neural network for prediction of a nodeeffect,” Sixth International Conference on Intelligent Control and Information Processing (ICICIP), pp 47-51, November 2015.
[3] N.A.A. Majid, M.P. Taylor, J.J.J. Chen and B.R. Young, “Multivariate statistical monitoring of the aluminium smelting process,” Computers & Chemical Engineering, vol. 35, pp 2457-2468, November 2011.
[4] S. Zeng, L. Ding, “Frequency characteristic and its application of the cell resistance in aluminum electrolysis,” Seventh International Conference on Computer Science &Education(ICCSE), pp 165-168, July 2012.
[5] J. Li, H. Wu, J. Pian, “The application of the equipment fault diagnosis based on modified Elman neural network,” International Conference on Electronic and Mechanical Engineering and Information Technology, vol. 8, pp 4135-4137, August 2011.
[6] J. Li, F. Qiao, T. Guo, “Neural network fault prediction and its application,” 8th World Congress on Intelligent Control and Automation (WCICA), pp 740-743, July 2010.
[7] A. Meghlaoui, J. Thibault, R.T. Buia, L. Tikasza, R. Santerre, “Neural networks for the identification of the aluminium electrolysis process,” Computers & Chemical Engineering, vol. 22, pp 1419-1428, September 1998.
[8] K. Zhou, D. Yu, Z. Lin, S. Guo, “Anode effect prediction of aluminium electrolysis using GRNN,” Chinese Automation Congress (CAC), pp 853–858, November 2015.
[9] J. Yi, D. Huang, S. Fu, H. He, T. Li, “Optimized Relative Transformation Matrix Using Bacterial Foraging Algorithm for Process Fault Detection,“ IEEE Transactions on Industrial Electronics, vol. 63, pp 2595-2605, 2016.
[10] N.A.A. Majid, M.P. Taylor, J.J.J. Chen, Y. Wei, B.R. Young, “Diagnosing faults in aluminium processing by using multivariate statistical approaches, Journal of Materials Science,” vol. 47, pp 1268- 1279, February 2012.
[11] N.A.A. Majid, M.P. Taylor, J.J.J. Chen, M.A. Stam, A. Mulder, B.R. Young, “Aluminium process fault detection by Multiway Principal Component Analysis,” Control Engineering Practice, vol. 19, pp 367- 379, April 2011.
[12] L. Qu, H. Zhou, “The Multi-class SVM Is Applied in Transformer Fault Diagnosis,” 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science, pp 477-480, 2015.
[13] C. Zanchettin, B.L.D. Bezerra, W.W. Azevedo, “A KNN-SVM hybrid model for cursive handwriting recognition,” International Joint Conference on Neural Networks (IJCNN), pp 1-8, June 2012.
[14] S. Shang, M. Shi, W. Shang, Z. Hong, “Improved Feature Weight Algorithm and Its Application to Text Classification,“ Mathematical Problems in Engineering, March 2016.
[15] D. You, X. Gao, S. Katayama, “WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM,” IEEE Transactions on Industrial Electronics, vol. 62, pp 628-636, January 2015.
[16] J. Li, Q. Zhang, K. Wang, J. Wang, T. Zhou, et al., “Optimal dissolved gas ratios selected by genetic algorithm for power transformer fault diagnosis based on support vector machine,” IEEE Transactions on Dielectrics & Electrical Insulation, vol. 23, pp 1198-1206, 2016.
[17] L. Ren, W. Lv, S.W. Jiang, Y. Xiao, “Fault Diagnosis Using a Joint Model Based on Sparse Representation and SVM,” IEEE Transactions on Instrumentation & Measurement, vol. 65, pp 2313-2320, October 2016.
Citation
N. Brindha, P. Dhivya, "Appraisal Evaluation Tool for College Staff", International Journal of Computer Sciences and Engineering, Vol.07, Issue.04, pp.316-319, 2019.