Handling of Class Imbalanced Problem in Big Data Sets: An Experimental Evaluation (UCPMOT)
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.1-9, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.19
Abstract
The huge amount of NoSQL data has acknowledged a new provision of context for processing. A new trail of data handling technologies with massive resources assists to store and process these gigantic data sets. The current attention is to determine the undisclosed information by assimilating this data bulks & handling it as per use. Further they are pre-processed and converted for needful analysis. The volume and variety of these data sets endure rising relentlessly. Moreover, imbalanced in many real-worlds vast data sets have elevated a point of concern in the research domain. The skewed distribution of classes in the data sets poses a difficulty to learn using traditional classifiers. They tend more towards majority classes. In recent years, numerous solutions have been proposed to address imbalanced classification. However, they fail to address the various data characteristics such as overlapping, redundancy involving classification performance. A rational over_sampling technique i.e. Updated Class Purity Maximization Over_Sampling Technique using Safe-Level based synthetic sample creation is proposed to efficiently handle imbalanced data sets. The newly suggested Lowest versus Highest method addresses the handling of multi-class data sets. The data sets from the UCI repository are processed using the mapreduce based programming on Hadoop framework. The evaluation parameters viz. F-measure and AUC are used to authenticate the performance of proposed technique over benchmarking techniques. The results attained evidently quote the dominance of the proposed technique.
Key-Words / Index Term
Imbalanced datasets, Big Data, Over_sampling techniques, Multi-class, Safe-Level based Synthetic Samples
References
[1] X. Wu et al., “Data mining with big data”, IEEE Transaction on Knowledge and Data Engineering, Vol.26, Issue.1, pp.97–107, 2014.
[2] A. Gandomi, M. Haider, “Beyond the hype: Big data concepts, methods, and analytics” International Journal of Information Management, Vol.35, Issue.2, pp.137–144, 2015.
[3] D. Agrawal et al., “Challenges and Opportunity with Big Data”, Community White Paper, pp.01-16, 2012.
[4] W. Zhao, H. Ma, Q. He., “Parallel k-means clustering based on mapreduce”, CloudCom, pp.674-679, 2009.
[5] X.-W. Chen et al., “Big data deep learning: Challenges and perspectives”, IEEE Access Practical Innovations: open solutions, Vol.2, pp.514 -525, 2014.
[6] “Big Data: Challenges and Opportunities, Infosys Labs Briefings - Infosys Labs,” http://www.infosys. com/infosys-labs/publications/ Documents/bigdata-challenges-opportunities.pdf.
[7] N. Japkowicz, S. Stephen, “The class imbalance problem: a systematic study”, ACM Intelligent Data Analysis Journal, Vol.6, Isuue.5, pp.429–449, 2002.
[8] H. He, E. Garcia, “Learning from Imbalanced Data”, IEEE Transaction on Knowledge and Data Engineering, Vol.21, Isuue.9, pp.1263–1284, 2009.
[9] Y. Sun, A. Wong, M. Kamel, “CLASSIFICATION OF IMBALANCED DATA: A REVIEW”, International Journal of Pattern Recognition Artificial Intelligence, Vol.23, Issue.4, pp.687–719, 2009.
[10] P. Byoung-Jun, S. Oh, W. Pedrycz, “The design of polynomial function-based neural network predictors for detection of software defects”, Elsevier: Journal of Information Sciences, pp.40-57, 2013.
[11] V. López et al., “An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics”, Elsevier: Journal of Information Sciences, Vol.250, pp.113–141, 2013.
[12] M. A. Nadaf, S. S. Patil, “Performance Evaluation of Categorizing Technical Support Requests Using Advanced K-Means Algorithm”, IEEE International Advance Computing Conference, pp.409-414, 2015.
[13] R. C. Bhagat, S. S. Patil, “Enhanced SMOTE algorithm for classification of imbalanced bigdata using Random Forest” IEEE International Advance Computing Conference, pp.403-408, 2015.
[14] R. Sara, V. Lopez, J. Benitez, F. Herrera, “On the use of MapReduce for imbalanced big data using Random Forest”, Elsevier: Journal of Information Sciences, pp.112-137, 2014.
[15] H. Jiang, Y. Chen, Z. Qiao, “Scaling up MapReduce-based Big Data Processing on Multi-GPU systems”, SpingerLink Cluster Computing, Vol.18, Issue. 1, pp.369–383, 2015.
[16] G. Batista, R. Prati, M. Monard, “A study of the behaviour of several methods for balancing machine learning training data”, ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets, Vol.6, Issue. 1, pp.20–29, 2004.
[17] N. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique”, Journal of Artificial Intelligence Research, Vol.16, pp.321- 357, 2002.
[18] H. Han, W. Wang, B. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning”, Proceedings of the 2005 International Conference on Intelligent Computing, Vol.3644 of Lecture Notes in Computer Science, pp.878–887, 2005.
[19] B. Chumphol, K. Sinapiromsaran, C. Lursinsap, “Safe-level-smote: Safelevel- synthetic minority over-sampling technique for handling the class imbalanced problem”, AKDD Springer Berlin Heidelberg, pp.475-482, 2009.
[20] H. He et al., “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning”, IEEE International Joint Conference on Neural Networks, pp.1322-1328, 2008.
[21] S. Garcia et al., “Evolutionary-based selection of generalized instances for imbalanced classification”, Elsevier: Journal of Knowledge-Based Systems, pp.3-12, 2012.
[22] H. Feng, L. Hang, “A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE”, Hindawi: Mathematical Problems in Engineering, 2013.
[23] N. Chawla, L. Aleksandar, L. Hall, K. Bowyer, “SMOTEBoost: Improving prediction of the minority class in boosting”, PKDD Springer Berlin Heidelberg, pp.107-119, 2003.
[24] H. Xiong, Y. Yang, S. Zhao, “Local clustering ensemble learning method based on improved AdaBoost for rare class analysis”, Journal of Computational Information Systems, Vol.8, Issue.4, pp.1783-1790, 2012.
[25] F. Alberto, M. Jesus, F. Herrera, “Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning”, Springer IPMU, pp.89–98, 2010.
[26] J. Hanl, Y. Liul, X. Sunl, “A Scalable Random Forest Algorithm Based on MapReduce”, IEEE, pp.849-852, 2013.
[27] J. Kwak, T. Lee, C. Kim, “An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data”, IEEE Transactions on Semiconductor Manufacturing, Vol.28, Issue.3, pp.318-328, 2015.
[28] S. Kim, H. Kim, Y. Namkoong, “Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Services”, IEEE Intelligent Systems, Vol.31, Issue.5, pp.50-56, 2016.
[29] M. Chandak, “Role of big-data in classification and novel class detection in data streams”, Springer Journal of Big Data, pp.1-9, 2016.
[30] S. Patil, S. Sonavane, “Enhanced Over_Sampling Techniques for Imbalanced Big Data Set Classification”, Data Science and Big Data: An Environment of Computational Intelligence: Studies in Big Data, Springer International Publishing AG, Vol.24, pp.49-81, 2017.
[31] W. A. Rivera, O. Asparouhov, “Safe Level OUPS for Improving Target Concept Learning in Imbalanced Data Sets”, Proceedings of the IEEE Southeast Conference, pp.1-8, 2015.
[32] S. Yen, Y. Lee, “Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset”, ICIC 2006, LNCIS 344, pp.731 – 740, 2006.
[33] C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, “DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique”, Springer Journal of Applied Intelligence, pp.664-684, 2012.
[34] H. Guo et al.,“Learning from class-imbalanced data: Review of methods and applications”, Elsevier Expert Systems With Applications, Vol.73, pp.220 – 239, 2017.
[35] Z. Zhang et al.,“Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data”, Elsevier Knowledge-Based Systems, Vol.106, pp.251 – 263, 2016.
[36] A. Vorobeva, “Examining the Performance of Classification Algorithms for Imbalanced Data Sets in Web Author Identification” Proceeding of the 18th Conference of FRUCT-ISPIT Association, pp.385 – 390, 2016.
[37] Machine Learning Repository, Center for Machine Learning and Intelligent Systems, US (NFS). https://archive.ics.uci.edu/ml/ datasets.html
[38] K. Yoon, S. Kwek, “An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics”, IEEE: International Conference on Hybrid Intelligent Systems, pp.1-6, 2005.
[39] M. Bach et al., “The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis”, Elsevier Journal of Information Sciences, Vol.384, pp.174–190, 2017.
[40] D. Li et al., “Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge”, Elsevier: Journal of Computation and Operational Research,Vol.34, pp.966–982, 2007.
[41] S. Barua et al., “MWMOTE—Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning”, IEEE: Transaction on Knowledge and Data Engineering, Vol.26, pp.405–425, 2014.
[42] X. Ai et al., “Immune Centroids Over-Sampling Method for Multi-class Classification”, T. Cao, E. Lim, Z. Zhou., T. Ho, D. Cheung, H. Motoda, Advances in Knowledge Discovery and Data Mining (eds), PAKDD 2015, Springer,Vol.9077, pp.251–263, 2015.
Citation
S.S. Patil, S. P. Sonavane, "Handling of Class Imbalanced Problem in Big Data Sets: An Experimental Evaluation (UCPMOT)", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.1-9, 2018.
Exploiting Social Relations for Efficient Routing in Delay Tolerant Network Environment
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.10-18, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.1018
Abstract
DTN is subclass of mobile ad hoc network (MANET) where instantaneous end-to-end connectivity is not available in source and destination nodes. Nodes in DTN are sparsely distributed. Frequent disconnections along with limited resources make routing in DTN more challenging. This paper proposes two routing protocols. One is Buddy Router with Time Window, which exploits social relations to maximize delivery probability. Another variant presented is Buddy Router with Replication, where controlled replication approach is used, along with social metric for message forwarding. Detailed formulation of proposed work, along with comparative analysis, based on simulations is presented. The paper also presents impact of buffer size variation and TTL variation on routing performance of different routing protocols.
Key-Words / Index Term
Delay Tolerant Network (DTN), Routing, Opportunistic Routing and Pocket Switched Networks (PSN)
References
[1] Kevin Fall, “A Delay-Tolerant Network Architecture for Challenged Internets,” Intel Research Berkley, 2003.
[2] https://irtf.org/dtnrg
[3] Maurice J. Khabbaz, Chadi M. Assi, and Wissam F. Fawaz, “Disruption-Tolerant Networking: A Comprehensive Survey on Recent Developments and Persisting Challenges” IEEE Communications Surveys & Tutorials, Vol. 14, No. 2, Second Quarter 2012
[4] Yue Cao and Zhili Sun, Member, IEEE “Routing in Delay/Disruption Tolerant Networks: A Taxonomy, Survey and Challenges” IEEE Communications Surveys & Tutorials, Accepted For Publication.
[5] R. J. D’Souza, Johny Jose,NIT Surathkal, “Routing Approaches in Delay Tolerant Networks: A Survey” 2010 International Journal of Computer Applications (0975 - 8887)
[6] Artemios G. Voyiatzis, Member, IEEE, “A Survey of Delay- and Disruption-Tolerant Networking Applications” JOURNAL of Internet Engineering, vol. 5, no. 1, June 2012
[7] Ying Zhu, Bin Xu , Xinghua Shi, and Yu Wang “A Survey of Social-Based Routing in Delay Tolerant Networks: Positive and Negative Social Effects” IEEE Communications Surveys & Tutorials, Vol. 15, No. 1, First Quarter 2013
[8] Kaimin Wei, Xiao Liang, and Ke Xu, “A Survey of Social-Aware Routing Protocols in Delay Tolerant Networks:Applications, Taxonomy and Design-Related Issues” IEEE Communications Surveys & Tutorials, Accepted For Publication
[9] Paulo Rogerio Pereira, Augusto Casaca, Joel J. P. C. Rodrigues, Vasco N. G. J. Soares, Joan Triay, and Cristina Cervello-Pastor “From Delay-Tolerant Networks to Vehicular Delay-Tolerant Networks” IEEE Communications Surveys & Tutorials, Vol. 14, No. 4, Fourth Quarter 2012
[10] Amin Vahdat and David Becker “Epidemic Routing for Partially-Connected Ad Hoc Networks” Technical Report CS-200006, Duke University, April 2000.
[11] A. Lindgren, A. Doria “Probabilistic Routing Protocol for Intermittently Connected Networks” DTN Research Group, “ ITRF 2012
[12] J. Lakkakorpi, M. Pitkanen, and J. Ott, ” Adaptive Routing in Mobile Opportunistic Networks” ACM MSWiM 2010, Bodrum, Turkey, Oct. 2010, pp. 101-109
[13] P. Basu and S. Guha, “Effect of Limited Topology Knowledge on Opportunistic Forwarding in Ad Hoc Wireless Networks,” Eighth International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WIOPT), Avignon, France, June 2010Buffer Management
[14] T. Spyropoulos, K. Psounis and C. S. Raghvendra "Spray and Wait Efficient routing in intermittently connected Networks," in Proceeding of Mobile Computer and Communication review Vol. 7,no. 3, July 2003.
[15] Burgess, J., Gallagher, B., Jensen, D., & Levine, B.N. (2006). MaxProp: Routing for Vehicle-based Disruption-Tolerant Networks. 25th IEEE International Conference on Computer Communications (INFOCOM 2006), 1-11.
[16] Henri Dubois-Ferriere , Matthias Grossglauser , Martin Vetterli, Age matters: efficient route discovery in mobile ad hoc networks using encounter ages, Proceedings of the 4th ACM international symposium on Mobile ad hoc networking & computing, June 01-03, 2003, Annapolis, Maryland, USA [doi>10.1145/778415.778446]
[17] Pan Hui, Jon Crowcroft, and Eiko Yoneki, “BUBBLE Rap: Social-Based Forwarding in Delay-Tolerant Networks” IEEE Transactions on Mobile Computing, Vol. 10, No. 11, November 2011
[18] Eyuphan Bulut and Boleslaw K. Szymanski, “Exploiting Friendship Relations for Efficient Routing in Mobile Social Networks” IEEE Transactions On Parallel And Distributed Systems, Vol. 23, No. 12, December 2012
[19] Tamer Abdelkader, Kshirasagar Naik, Amiya Nayak, Nishith Goel, and Vineet Srivastava “SGBR: A Routing Protocol for Delay Tolerant Networks Using Social Grouping” IEEE Transactions On Parallel And Distributed Systems (Accepted for Final Publication)
[20] Shengling Wang And Min Liu, Xiuzhen Cheng, “Routing In Pocket Switched Networks” IEEE Wireless Communications, Feb 2012.
[21] The Opportunistic Network Environment simulator. http://www.netlab.tkk.fi/tutkimus/dtn/theone/
[22] J. Scott, R. Gass, J. Crowcroft, P. Hui, C. Diot, and A. Chaintreau, “Data set cambridge/haggle,” http://crawdad.cs.dartmouth.edu/ cambridge/haggle, may 2009.
[23] A. Pentland, R. Fletcher, and A. Hasson, “Daknet: Rethinking Connectivity In Developing Nations,” Computer, vol. 37, no. 1, pp. 78 – 83, Jan. 2004.
[24] A. Mtibaa, M. May, C. Diot and M. Ammar "Peoplerank: Social opportunistic forwarding", IEEE INFOCOM ’10, 2010
[25] Ajit Patil, Prakash Kulkarni “Buddy Router: Novel DTN Routing Algorithm using Multiparameter Composite Metric” RSC 2016
Citation
Ajit S. Patil, Prakash J. Kulkarni, "Exploiting Social Relations for Efficient Routing in Delay Tolerant Network Environment", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.10-18, 2018.
Factored Language Modeling
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.19-25, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.1925
Abstract
Language modeling is a technique for finding the next most probable word in a sentence. It is first and essential task for successful implementation of some natural language processing applications like machine translation and speech recognition. It ensures for correctness and fluency of the target output in these applications. N-gram is a traditional way to implement language model in which only previous words in the sentence are used to predict the probable next word in the sentence. Factored language modeling is a method to utilize linguistic knowledge of the word along with the word itself for constructing the language model. The paper describes the factored language modeling technique and compares the results obtained against the traditional n-gram technique using perplexity as a measure.
Key-Words / Index Term
Language model, Perplexity, Factored language model, Backoff.
References
[1] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?”, In the Proceedings of the 2000 IEEE Intenational conferance, Vol. 88, Issue. 8 pp. 1270–1278, 2000.
[2] S. F. Chen, J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling” , In the Proceedings of the 1996 Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp 310-318, 1996.
[3] J.A. Bilmes, K. Kirchhoff, “Factored Language Models and Generalized Parallel Backoff ”, In the Proceedings of the 2003 HLT/NAACL, pp 4-6, 2003.
[4] K. Kirchhoff, J. Bilmes, K. Duh, “Factored Language Models Tutorial”, University of Washington, 2016.
[5] A. E. Axelrod, “Factored Language Models for Statistical Machine Translation ”, University of Edinburgh, 2006.
[6] A. Stolcke, “SRILM- an Extensible Language Modeling Toolkit”, In the Proceedings of the 2002 International Conference on Spoken Language Processing, Denver, Colorado, September 2002.
[7] A. Stolcke, J. Wheng, W. Wang, V. Abrash, “SRILM at Sixteen: Update and Outlook”, In the Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, 2011.
[8] K. Duh, K. Kirchhoff, “Automatic Learning of Language Model Structure”, In the Proceedings of the 2004 International Conference on Computational Linguistics (COLING), 2004.
[9] E. M. deNovais, “Portuguese Text Generation Using Factored Language Models”, J. Brazilian Computation Society, Vol. 19, Issue. 2, pp 135–146, 2013.
[10] M. Laz ̆ar, D. Militaru, “A Romanian Language Modeling Using Linguistic Factors” , In the Proceedings of the 2013 7th Conference in Speech Technology and Human - Computer Dialogue (SpeD), Cluj-Napoca, , pp. 1–6, 2013.
[11] I. Kipyatkova, A. Karpov, “Study of Morphological Factors of Factored Language Models for Russian ASR”, In the Proceedings of the 2014 SPECOM 2014, Novi Sad, pp. 451–458, 2014.
[12] H. Sak, M. Saraçlar, T. Güngör, “Morphology Based and Sub Word Language Modeling for Turkish Speech Recognition”, In the Proceedings of the 2010 ICASSP, Dallas, pp. 5402–5405, 2010.
[13] A. Mousa, M. Shaik, R. Schlüter, H. Ney, “Morpheme Based Factored Language Models for German LVCSR”, In the Proceedings of the 2011 INTERSPEECH, Florence, pp. 1053–1056, 2011.
[14] Z. Alumae, “Sentence Adapted Factored Language Model for Transcribing Stonian Speech”, In the Proceedings of the 2006 ICASSP, Toulouse, pp. 429–432, 2006.
[15] T. Hirsimaki, J. Pylkkonen, M. Kurimo, “Importance of High-Order N-Gram Models in Morph-Based Speech Recognition”, IEEE Trans. Audio, Speech, Lang. Process. , Vol. 17, Issue. 4, pp. 724–732, 2009.
[16] H. Adel, NT. Vu, K. Kirchhoff, D. Telaar, T. Schultz, “Syntactic and Semantic Features for Code-Switching Factored Language Models”, IEEE/ACM Trans. Audio, Speech, Lang. Process, Vol. 23, Issue. 3, pp. 431–440, 2015.
Citation
A.R. Babhulgaonkar, S.P. Sonavane, "Factored Language Modeling", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.19-25, 2018.
Outdoor Natural Scene Object Classification Using Probabilistic Neural Network
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.26-31, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.2631
Abstract
Region labeling for outdoor scenes to identify sky, green land, water, snow etc. facilitates content-based image retrieval systems. This paper presents use of multiple features to classify various objects of the outdoor natural scene image. Proposed system aims to classify images of the sky, water and green land. As all these nature components are irregular in shape, they can be classified using color and texture features. Color features of the object are extracted by using segmentation in La*b* color space. In the process of texture feature calculation, the image is initially divided into smaller grids. Global GLCM based statistical texture features are calculated using statistical features of these local grids. Results show that color and statistical texture features are not sufficient to differentiate sky and water body. To achieve discrimination between these two objects, a new edge-based horizontal line texture feature is proposed. The proposed feature is used to differentiate between sky and water objects based on the density of horizontal lines. All these features are used together to train probabilistic neural network for classification. The system has achieved improvement of 5% to 8% in F-measure, when all these features are used together for classification of natural scene objects.
Key-Words / Index Term
Color feature, Statistical texture features,Horizontal line texture feature, Image classification, PNN
References
[1] Da Silva Júnior, João Augusto, Rodiney Elias Marçal, and Marcos Aurélio Batista. "Image Retrieval: Importance and Applications." Workshop de Vis~ ao Computacional-WVC. 2014.
[2] http://www.leeds.ac.uk/educol/documents/00001240.htm#_Toc442192675
[3] Bora, Dibya Jyoti, Anil Kumar Gupta, and Fayaz Ahmad Khan. "Comparing the performance of L* A* B* and HSV color spaces with respect to color image segmentation." arXiv preprint arXiv:1506.01472 (2015).
[4] He, Zhen, et al. "Robust road detection from a single image using road shape prior." Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 2013.
[5] Bappy, Jawadul H., et al. "Real Estate Image Classification." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017.
[6] Jomaa, Hadi S., Yara Rizk, and Mariette Awad. "Semantic and Visual Cues for Humanitarian Computing of Natural Disaster Damage Images." Signal-Image Technology & Internet-Based Systems (SITIS), 2016 12th International Conference on. IEEE, 2016.
[7] Park, Soo Beom, Jae Won Lee, and Sang Kyoon Kim. "Content-based image classification using a neural network." Pattern Recognition Letters 25.3 (2004): 287-300.
[8] Varior, Rahul Rama, and Gang Wang. "A data-driven color feature learning scheme for image retrieval." Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
[9] Kang, Sanggil, and Sungjoon Park. "A fusion neural network classifier for image classification." Pattern Recognition Letters30.9 (2009): 789-793.
[10] Chow, Tommy WS, and M. K. M. Rahman. "A new image classification technique using tree-structured regional features." Neurocomputing 70.4 (2007): 1040-1050.
[11] Aditya Vailaya, Anil K. Jain, "Detecting sky and vegetation in outdoor images", Proc. SPIE 3972, Storage and Retrieval for Media Databases 2000.
[12] Luo, Jiebo, and Andreas Savakis. "Indoor vs outdoor classification of consumer photographs using low-level and semantic features." Image Processing, 2001. Proceedings. 2001 International Conference on. Vol. 2. IEEE, 2001.
[13] Serrano, Navid, Andreas Savakis, and A. Luo. "A computationally efficient approach to indoor/outdoor scene classification." Pattern Recognition, 2002. Proceedings. 16th International Conference on. Vol. 4. IEEE, 2002.
[14] Feng, Xiaojuan, Christopher KI Williams, and Stephen N. Felderhof. "Combining belief networks and neural networks for scene segmentation." IEEE Transactions on Pattern Analysis and Machine Intelligence 24.4 (2002): 467-483.
[15] Boutell, Matthew R., Jiebo Luo, and Christopher M. Brown. "Improved semantic region labeling based on scene context." Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 2005.
[16] Quelhas, Pedro, et al. "A thousand words in a scene." IEEE transactions on pattern analysis and machine intelligence 29.9 (2007).
[17] Perronnin, Florent. "Universal and adapted vocabularies for generic visual categorization." IEEE Transactions on pattern analysis and machine intelligence 30.7 (2008): 1243-1256.
[18] Mylonas, Phivos, et al. "Using visual context and region semantics for high-level concept detection." IEEE Transactions on Multimedia 11.2 (2009): 229-243.
[19] Patra, Prashant Kumar, et al. "Probabilistic neural network for pattern classification." Neural Networks, 2002. IJCNN`02. Proceedings of the 2002 International Joint Conference on. Vol. 2. IEEE, 2002.
[20] Othman, Mohd Fauzi, and Mohd Ariffanan Mohd Basri. "Probabilistic neural network for brain tumor classification." Intelligent Systems, Modelling and Simulation (ISMS), 2011 Second International Conference on. IEEE, 2011.
[21] Sawant, Shreepad S., and Preeti S. Topannavar. "Introduction to Probabilistic Neural Network─ Used For Image Classifications." International Journal of Advanced Research in Computer Science and Software Engineering 5.4 (2015): 279-283.
[22] Sridhar, D., and IV Murali Krishna. "Face image classification using combined classifier." Signal Processing Image Processing & Pattern Recognition (ICSIPR), 2013 International Conference on. IEEE, 2013.
[23] Beale, Mark Hudson, Martin T. Hagan, and Howard B. Demuth. "Neural network toolbox 7." User’s Guide, MathWorks 2 (2010): 77-81.
Citation
C.A. Laulkar, P.J. Kulkarni, "Outdoor Natural Scene Object Classification Using Probabilistic Neural Network", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.26-31, 2018.
An Encrypted Neural Network Learning to Build Safe Trained Model
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.32-36, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.3236
Abstract
Neural network learning is a technique that is used to solve problems of classification, prediction, clustering, modelling based on variety of data inputs in the form of structured, semi-structured and unstructured data. Learning accuracy is considered as key performance index in these neural network based learning algorithms. Many organization that involves huge amount of data would want to outsource it to cloud for artificial intelligence based services. Various organization who wish to train neural network model on their complex and huge data usually outsource the learning model on cloud. Outsourcing of learning model on cloud creates security concerns for input data and the learned model. In this paper, we propose a practical system that will train a neural network model that is encrypted during training process. The training is performed on the unencrypted data. The output of the system is a neural network model that possesses two properties. First, neural network model is protected from the malicious users, hence allows the users to train the model in insecure environments at no cost of risk. Second, the neural network model can make only encrypted predictions. We make use of homomorphic encryption techniques to fulfill the objectives and test our results on sentiment analysis dataset.
Key-Words / Index Term
Homomorphic encryption, neural network
References
[1] S. Chow, Y. He, and et al. Spice - simple privacy-preserving identity-management for cloud environment. In ACNS 2012, volume 7341 of Lecture Notes in Computer Science. Springer, 2012.
[2] Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, No. 1, January 2014.
[3] N. Schlitter, A Protocol for Privacy Preserving Neural Network Learning on Horizontal Partitioned Data, Proc. Privacy Statistics in Databases (PSD 08), Sept. 2008
[4] Erich Schikuta and Erwin Mann, N2Sky - Neural Networks as Services in the Clouds. arXiv:1401.2468v1 [cs.NE] 10 Jan 2014.
[5] T. Chen and S. Zhong,Privacy-Preserving Backpropagation Neural Network Learning, IEEE Trans. Neural Network, vol. 20, no. 10, pp. 1554-1564, Oct. 2009.
[6] Mohammad Ali Kadampur, Somayajulu D.V.L.N. A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining, Journal of Computing, Volumen 2, Issue 1, January 2010, ISSN 2151-9617
[7] Yong Liu, Yeming Xiao, Li Wang, Jielin Pan, Yonghong Yan. Parallel Implementation of Neural Networks Training on Graphic Processing Unit, 2012 5th International Conference on BioMedical Engineering and Informatics (BMEI 2012)
[8] Pelin Angin, Bharat Bhargava, Rohit Ranchal, Noopur Singh. An Entity-centric Approach for Privacy and Identity Management in Cloud Computing, 2010 29th IEEE International Symposium on Reliable Distributed Systems.
[9] Scretan J, Georgiopoulos, M. A privacy preserving probabilistic neural network for horizontally partitioned databases. International Joint Conference on Neural Networks. Aug 2007.
[10] Barni M, Failla P, Sadeghi A. Privacy Preserving ECG Classification with branching programs and neural networks.IEEE Transaction. Information Forensics and Security. Volume 6, Issue 2, June 2011.
[11] Samet S. Privacy Preserving protocols for perceptron learning algorithm in neural networks. IEEE Conference on Intelligent Systems, Sept 2008.
[12] Mahmoud Barhamgi, Arosha K. Bandara, and Yijun Yu, Protecting Privacy in the Cloud: Current Practices, Future Directions, Computer IEEE Society February 2016.
[13] Majid Bashir Malik, A model for Privacy Preserving in Data Mining using Soft Computing Techniques. March 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).
[14] Reza Shokri, Privacy-Preserving Deep Learning,, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) Oct 2015.
[15] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, Michael Naehrig and John Wernsing, CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy 29 December 2015
[16] Ryan Hayward , Chia-Chu Chiang, Parallelizing fully homomorphic encryption for a cloud environment. Journal of Applied Research and Technology 13 (2015) 245-252
[17] Bengio. Learning deep architectures for AI. Foundations and trends in machine learning, 2(1):1– 127, 2009.
[18] L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal and Information Processing, 3, 2014.
[19] A. Graves, A.-R. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP , 2013.
[20] Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al. Deepspeech: Scaling up end-to-end speech recognition. arXiv:1412.5567 , 2014.
[21] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine , 29(6):82–97, 2012.
[22] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS , 2012.
[23] P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Document Analysis and Recognition , 2013.
[24] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR , 2014.
[25] Angel Yu, Wai Lok Lai, James Pay or Efficient Integer Vector Homomorphic Encryption, May 2015.
Citation
S. S. Sayyad, D. B. Kulkarni, "An Encrypted Neural Network Learning to Build Safe Trained Model", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.32-36, 2018.
On Applying Document Similarity Measures for Template based Clustering of Web Documents
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.37-42, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.3742
Abstract
World Wide Web is the useful and easy way to get the source of information on the Internet. In order to reduce the content generation and publishing time, templates are used to populate the contents in web documents. Template provides easy access to the web document contents through their layout and structures. However, for search engines, due to its irrelevant terms, the templates degrade search engines accuracy and performance. Also the templates are used by wrapper induction tools used in information extractor to extract and integrate information from various E-commerce sites. Thus it has received a lot of attention to improve the search engines performance and content integration. In this paper we have discussed how heterogeneous web documents i.e. web documents generated from different templates, can be clustered. We have applied document similarity measures to cluster the heterogeneous web documents generated from templates. Our experimental results on real data sets show that cosine distance similarity measure is more suitable for template based clustering of heterogeneous web documents.
Key-Words / Index Term
Template, Clustering, Cosine, Jaccard, Agglomerative Hierarchical Clustering
References
[1] Bar-Yossef, Z., Rajagopalan, S,“Template detection via data mining and its applications”,WWW ’02: Proceedings of the 11th International Conference on World Wide Web, New York, NY, USA, ACM Press 580–591, 2002.
[2] Lin, S.H., Ho, J.M,“Discovering informative content blocks from web documents”, KDD ’02: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, ACM Press 588–593, 2002.
[3] Debnath, S., Mitra, P., Giles, C.L,”Automatic extraction of informative blocks from webpages”, SAC ’05: Proceedings of the 2005 ACM Symposium on Applied Computing, New York, NY, USA, ACM Press 1722–1726,2005.
[4] Yi, L., Liu, B., Li, X,”Eliminating noisy information in web pages for data mining”, KDD ’03: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, ACM Press 296–305, 2003
[5] [5] Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.F,”Automatic web news extraction using tree edit distance”, WWW ’04: Proceedings of the 13th International Conference on World Wide Web, New York, NY, USA, ACM Press 502–511,2004
[6] Gibson, D., Punera, K., Tomkins, A,”The volume and evolution of web page templates”,WWW ’05: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, New York, NY, USA, ACM Press ,830–839,2005
[7] Cruz, I.F., Borisov, S., Marks, M.A., Webbs, T.R,”Measuring structural similarity among webdocuments: preliminary results”, EP ’98: Proceedings of the 7th international Conference on Electronic Publishing, Artistic Imaging, and Digital Typography,.513 – 524, 1998
[8] Buttler, D,”A short survey of document structure similarity algorithms”, IC ’04: Proceedings of theInternational Conference on Internet Computing, CSREA Press 3–9, 2004
[9] Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G,”Syntactic clustering of the web”,ComputerNetworks 29(8-13) 1157–1166, 1997
[10] A. Arasu and H. Garcia-Molina,“Extracting Structured Data from Web Pages”, Proc. ACM SIGMOD, 2003.
[11] M. de Castro Reis, P.B. Golgher, A.S. da Silva, and A.H.F. Laender,“Automatic Web News Extraction Using Tree Edit Distance”, Proc. 13th Int’l Conf. World Wide Web (WWW), 2004.
[12] M.N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim,“Xtract: A System for Extracting Document Type Descriptors from Xml Documents”, Proc. ACM SIGMOD, 2000.
[13] Y. Zhai and B. Liu,“Web Data Extraction Based on Partial Tree Alignment”, Proc. 14th Int’l Conf. World Wide Web (WWW), 2005.
[14] V. Crescenzi, G. Mecca, and P. Merialdo,“Roadrunner: Towards Automatic Data Extraction from Large Web Sites”, Proc. 27th Int’l Conf. Very Large Data Bases (VLDB), 2001.
[15] K. Vieira, A.S. da Silva, N. Pinto, E.S. de Moura, J.M.B. Cavalcanti, and J. Freire,“A Fast and Robust Method for Web Page Template Detection and Removal”, Proc. 15th ACM Int’l Conf. Information andKnowledge Management (CIKM), 2006.
[16] S. Zheng, D. Wu, R. Song, and J.-R. Wen,“Joint Optimization of Wrapper Generation and Template Detection”, Proc. ACMtiSIGKDD, 2007.
[17] Chulyun Kim and Kyuseok Shim,”TEXT: Automatic Template Extraction from Heterogeneous Web Pages”,IEEE Transaction on Knowledge and Data Engineering, 2011
Citation
T.I. Bagban, P. J. Kulkarni, "On Applying Document Similarity Measures for Template based Clustering of Web Documents", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.37-42, 2018.
Construction of Basis Matrices for (k, n) and Progressive Visual Cryptography Schemes
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.43-47, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.4347
Abstract
Security of digital information plays important role to keep the integrity of original media. A secret is something which is kept away from the knowledge of any but those who are privileged to access it. Secret sharing scheme provides a mechanism for sharing secrets among different users securely, where each user receives his part of encoded secret information called as a share. Sufficient number of shares need to be combined together to reconstruct secret information. Text, images, audio and video can be used for sharing secret information in secret sharing scheme. Secret sharing scheme in which secret information is encoded in form of concealed images is called as Visual Cryptography. There are various Visual Cryptography Schemes. Visual Cryptography Scheme’s functionality is dependent on their basis matrices. Constructions of basis matrices for various OR-based and XOR-based Visual Cryptography Schemes are elaborated in this paper.
Key-Words / Index Term
Secret sharing scheme, Visual Cryptography, Data hiding
References
[1] Shamir, A. 1979. How to Share a Secret. Communications of the ACM. 22: 612-613.
[2] Blakely, G. R. 1979. Safeguarding Cryptographic Keys. Proceedings of the National Computer Conference, American Federation of Information Processing Societies Proceedings. 48: 313-317.
[3] Moni Naor and Adi Shamir, “Visual cryptography”. In Proceedings of Advances in Cryptology, EUROCRYPT 94, Lecture Notes in Computer Science, 1995, (950):pp. 1-12.
[4] S. J. Shyu, S. Y. Huanga,Y. K. Lee, R. Z. Wang, andK. Chen, “Sharing multiple secrets in visual cryptography”, Pattern Recognition, Vol. 40, Issue 12, pp. 3633 - 3651, 2007.
[5] Nakajima, M. and Yamaguchi, Y., “Extended visual cryptography for natural images” Journal of WSCG. v10 i2. 303-310.
[6] Jin, W. Q. Yan, and M. S. Kankanhalli, “Progressive color visual cryptography,” J. Electron. Imag., vol. 14, no. 3, pp. 1–13, 2005.
[7] Pim Tuyls, Henk D. L. Hollmann, Jack H. van Lint, and Ludo M. G. M. Tolhuizen. XOR-based visual cryptography schemes. Designs, Codes and Cryptography, 37(1):169–186, 2005
[8] C.-N. Yang and D.-S. Wang, “Property analysis of XOR-based visual cryptography,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 2, pp. 189–197, Feb. 2014.
[9] X. Wu and W. Sun, “Extended capabilities for XOR-based visual cryptography,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 10, pp. 1592–1605, Oct. 2014.
[10] E. Verheuland H. V. Tilborg, ”Constructions And Properties Of K Out Of N Visual Secret Sharing Schemes.”Designs, Codes and Cryptography, 11(2), pp.179–196, 1997.
[11] G. Ateniese, C. Blundo, A. DeSantis, and D. R. Stinson, “Visual cryptography for general access structures”, Proc. ICAL96, Springer, Berlin,1996,pp.416-428.
[12] Jin, W. Q. Yan, and M. S. Kankanhalli, “Progressive color visual cryptography,” J. Electron. Imag., vol. 14, no. 3, pp. 1–13, 2005.
Citation
S.B. Bhagate, P.J. Kulkarni, "Construction of Basis Matrices for (k, n) and Progressive Visual Cryptography Schemes", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.43-47, 2018.
Improved Genetic Particle Swarm Optimization and Feature Subset Selection for Extreme Learning Machine
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.48-54, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.4854
Abstract
Particle Swarm Optimization (PSO) is a heuristic global optimization method, which is most commonly used for feature subset selection problem. However, PSO requires the fixed number of optimal features as an input. It is a very critical task to analyze initially that how many features are relevant and non-redundant present in the given dataset. To solve the said problem this paper has proposed Improved Genetic – PSO (IG-PSO) algorithm for Extreme Learning Machine (ELM) which returns optimal features as well as an optimal number of features. The IG-PSO algorithm is experimented on six benchmarked dataset for handling medical dataset classification which improves the classification accuracy by using optimal features. Also, the simulation results demonstrate that IG-PSO algorithm has the capability to handle optimization, dimensionality reduction and supervised binary classification problems.
Key-Words / Index Term
Feature Subset Selection Problem, Pattern Classification Problem, Extreme Learning Machine, Particle Swarm Optimization
References
[1] L. Yu and H. Liu, “Efficient feature selection via analysis of relevance and redundancy”, Journal of machine learning research, pp. 1205-1224, 2014.
[2] Kittler,J. and aan den Rijn, Netherlands, “Feature Set Search Algorithms”, , Pattern Recognition and Signal Processing, Chapter pp. 41-60, 1978.
[3] D. Koller and M. Sahami, Toward optimal feature selection, Tech.rep. Stanford InfoLab, 1996.
[4] Zhi-Hui Zhan, Jun Zhang, Yun Li and Henry Shu-Hung Chung, “ Adaptive Particle Swarm Optimization”, IEEE Trans. On Systems, Man, and Cybernetics- Part B, vol. 39,no. 6, December 2009.
[5] Iftikhar Ahmad,”Feature Selection Using Particle Swarm Optimization in Intrusion Detection”,International Journal of Distributed Sensor Network, January 2015.
[6] G.-B. Huang, Q.-Y. Zhu and C.-K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks”, In proceedings. IEEE International Joint Conference., vol. 2., pp. 985-990, 2004.
[7] G.-B. Huang, Q.-Y. Zhu and C.-K. Siew, “Extreme learning machine:theory and applications”, Neurocomputing 70 (1), pp. 4
[8] 89501, 2006.
[9] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden node”, IEEE Tranactions on. Neural Network, vol. 17., no. 4., pp. 879-892, 2006.
[10] G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme learning machine for regression and multiclass classification”, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42 (2), pp. 513-529, 2012.
[11] G.-B. Huang, “An insight into extreme learning machines: Randomneurons, random features and kernels”, Cognit. Computat, vol. 6., no.3., pp. 376-390, 2014.
[12] R. Eberhart and J. Kennedy, “New optimizer using particle swarm theory,” In Proceedings International Symposium on Micro Machine and Human Science, pp. 39–43,October 1995.
[13] B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimization for feature selection in classification: amulti-objective approach,” IEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 1656–1671, 2013.
[14] B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms”, Applied Soft Computing Journal, vol. 18, pp. 261–276, 2014.
[15] M. Lichman, UCI machine learning repository. URL http://archive.ics.uci.edu/ml, 2013. UCI repository
[16] Akusok, A., Bj¨ork, K.-M., Miche, Y., Lendasse, A., “High-performance extreme learning machines: a complete toolbox for big data applications”. IEEE Access 3, 1011–1025, 2015.
[17] G. Karakaya, S. Galelli, S. D. Ahipa sao glu and R. Taormina, “Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach”, IEEE transactions on cybernetics 46 (6), pp. 1424-1437,2016.
[18] Nahato, K. B., Nehemiah, K. H., Kannan, A, “ Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical datasets”, Elsevier Journal of Informatics in Medicine Unlocked 2, 1–11, 2016.
[19] Han, J., Pei, J., Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.
[20] Mahdiyah, U., Irawan, M. I., Imah, E. M., “Integrating data selection and extreme learning machine for imbalanced data”. Procedia Computer Science 59, 221–229, 2015.
[21] Parikh, R., Mathai, A., Parikh, S., Sekhar, G. C., Thomas, R., Understanding and using sensitivity, specificity and predictive values. Indian journal of ophthalmology 56 (1), 45, 2008.
[22] Archana Kale and Shefali Sonavane, “Optimal Feature Subset Selection for Fuzzy Extreme Learning Machine using Genetic Algorithm with Multilevel Parameter Optimization”, IEEE International conference on Signal and Image Processing Applications pp.445-450, Septmber 2017.
[23] A. Kale and S. Sonavane, “Hybrid Feature Subset Selection Approach for Fuzzy-Extreme Learning Machine”, Springer journal of Computational Intelligence and Complexity - Data Enabled and Discovery Applications, September 2017.
[24] D. C¸ alis¸ir, E. Do˘gantekin, “An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier”, Expert Syst. Appl. 38(7), 8311–8315, 2011.
[25] H. Temurtas, N. Yumusak, F. Temurtas, “A comparative study on diabetes disease diagnosis using neural networks”. Expert Syst. Appl. 36(4), 8610–8615, 2009.
[26] C.V. Subbulakshmi, S.N. Deepa, “Medical dataset classification: a machine learning paradigm integrating particle swarm optimization with extreme learning machine classifier”, Scientific World Journal 2015.
[27] F.J. Marti´ınez-Estudillo, C. Herv´as-Mart´ınez, P.A. Guti´errez, A.C. Mart´ınez-Estudillo, “Evolutionary product-unit neural networks classifier”s. Neurocomputing 72(1), 548–561, 2008.
[27] C. Herv´as-Mart´ınez, F.J. Mart´ınez-Estudillo, M. Carbonero-Ruz, “Multilogistic regression by means of evolutionary product-unit neural networks”. Neural Netw. 21(7), 951–961, 2008.
Citation
A.P. Kale (IEEE and IEICE Student Member), S.P. Sonavane, "Improved Genetic Particle Swarm Optimization and Feature Subset Selection for Extreme Learning Machine", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.48-54, 2018.
Architecture for Personalized Meta Search Engine
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.55-59, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.5559
Abstract
Information available on the web is growing rapidly. A major problem in web search is that the interactions between the users and search engines are limited by the factors like unknown capabilities of search engines adopted, and ill-constructed query by the user. Hence the user has to repeatedly apply the several queries till he reaches the pages of most interest. Any search engine can give its best performance if well-constructed and detailed queries are used. As a result, the users tend to submit shorter/ insufficient/ ambiguous queries yielding unwanted search lists. In order to return highly relevant results to the users, search engines must be able to profile the users’ interests and personalize the search results according to the users’ profiles. This paper discusses the need and specific requirements of personalized search engine, its architecture, the prototype model developed and the results obtained. Also sample sessions performed on the designed model have been given for selected user profile.
Key-Words / Index Term
Web Search Engines, Personalized Web Searching, Meta Search Engines
References
[1] K Wai-Ting Leung, D Lee, W Lee, “PMSE: A Personalized Mobile Search Engine”, IEEE Transactions On Knowledge And Data Engineering, Vol. 25, Issue: 4, pp.820-834, April 2013.
[2] S. Prakasha, H.Shashidhar, G.T. Raju, “Structured Intelligent Search Engine for Effective Information Retrieval using Query Clustering Technique and Semantic Web”, International Conference on Contemporary Computing and Informatics (IC3I), 688 695, DOI: 10.1109/IC3I.2014.7019820.
[3] A Annadurai, “Architecture of personalized web search engine using suffix tree clustering”, International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011), pp. 604-608, 2011.
[4] K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based Clustering of Search Engine Queries,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1505-1518, Nov. 2008.
[5] J. Teevan, S.T. Dumais, and E. Horvitz., “Personalizing Search via Automated Analysis of Interests and Activities. Proceedings of the 28th Annual International ACM SIGIR” Conference on Research and development in information retrieval (SIGIR`05), pages 449–456, 2005.
[6] Adah, S.; Bufi, C.; Temtanapat, Y., “Integrated Search Engine”, @IEEE Knowledge and Data Engineering Exchange Workshop, 1997. Pages: 140 – 147.
[7] O. Zamir, O.Etzioni, “A Dynamic Clustering interface to Web search results,” Computer Networks, Netherlands, Amsterdam, 31(11-16):1361-1374, 1999.
[8] M. Ilic, P. Spalevic, M. Veinovic, “Suffix Tree Clustering – Data mining algorithm”, Twenty-Third International Electrotechnical and Computer Science Conference ERK`2014, Portorož, ISSN 1581-4572, pp. 15-18, September 22-24, 2014.
[9] K A Heller, Z Ghahramani. “Bayesian hierarchical clustering”, Proceedings of the 22nd international conference on Machine learning, pp. 297-304, 2005.
[10] R.E. Ruviaro Christ, E. Talavera, C. Maciel, “Gaussian Hierarchical Bayesian Clustering Algorithm”, ISDA 2007, pp. 133-13.
Citation
N. A. Borkar, S. V. Kulkarni, "Architecture for Personalized Meta Search Engine", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.55-59, 2018.
MPI performance guidelines for scalability
Research Paper | Journal Paper
Vol.06 , Issue.01 , pp.60-65, Feb-2018
CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si1.6065
Abstract
MPI (Message Passing Interface) is most widely used parallel programming paradigm. It is used for application development on small as well as large high-performance computing systems. MPI standard provides a specification for different functions but it does not specify any performance guarantee for implementations. Nowadays, its various implementations from both vendors and research groups are available. Users are expecting consistent performance from all implementations and on all platforms. In literature, performance guidelines are defined for MPI communication, IO functions and derived data types. By using these guidelines as a base we have defined guidelines for scalability of MPI communication functions. Also, we have verified these guidelines by using benchmark application and on different MPI implementations such as MPICH, open MPI. The experimental results show that point to point communication functions are scalable. It is quite obvious as in point to point communication the only pair of processes is involved. Hence these guidelines are defined as performance requirement by considering the semantics of these functions. All processes are involved in collective communication functions; therefore defining performance guidelines for collective communication is difficult. In this paper, we have defined the performance guidelines by considering the amount of data transferred in the function. Also, we have verified our defined guidelines and reasons for violations of these guidelines are elaborated.
Key-Words / Index Term
Performance guidelines for MPI functions, Scalability of MPI functions, High-performance computing
References
[1] A. Mallón, Guillermo L. Taboada, Carlos Teijeiro, Juan Touriño, Basilio B. Fraguela, Andrés Gómez, Ramón Doallo, J. Carlos Mouriño, “Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures”, Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2009. Lecture Notes in Computer Science, pp. 174–184, 2009.
[2] William D. Gropp, Rajeev Thakur, “Self-consistent MPI performance guidelines”, IEEE Transaction on parallel and distributed systems, 2005.
[3] William D. Gropp, Dries Kimpe, Robert Ross, Rajeev Thakur and Jesper Larsson Traff, “Self-consistent MPI-IO performance requirements and expectations”, Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008. Lecture Notes in Computer Science, 2008.
[4] William D. Gropp, Dries Kimpe, Robert Ross, Rajeev Thakur and Jesper Larsson Traff, “Performance Expectations and Guidelines for MPI Derived Datatypes”, Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, 2011.
[5] Sascha Hunold, Alexandra Carpen-Amarie, Felix Donatus Lübbe, and Jesper Larsson Träff TU Wien, “Automatic verification of self-consistent MPI performance guidelines”, Parallel Processing, Euro-Par 2016. Lecture Notes in Computer Science, 2016.
[6] Ralf Reussner, Peter Sanders, and Jesper Larsson Träff, “SKaMPI: A Comprehensive Benchmark for Public Benchmarking of MPI,” Journal of Scientific Programming, vol. 10, issue 1, pp. 55-65, 2002.
[7] WCE Rock Cluster, High performance computing cluster, URL: http://wce.ac.in/it/landing-page.php?id=9.
[8] J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and D. K. Panda, “Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects” , Proceedings of 11th Symposium on High Performance Interconnects, 2003.
[9] Hunold, S., Carpen-Amarie, A., “Reproducible MPI benchmarking is still not as easy as you think”, IEEE Transactions on Parallel and Distributed Systems , vol. 27, issue 12, 2016.
[10] Subhash Saini, Robert Ciotti,Brian T. N. Gunney, Thomas E. Spelce, Alice Koniges, Don Dossa, Panagiotis Adamidis, Rolf Rabenseifner, Sunil R. Tiyyagura, Matthias Mueller, “Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks”, Journal of Computer and System Sciences, vol. 74, issue 6, 2008.
Citation
K.B. Manwade, D.B. Kulkarni, "MPI performance guidelines for scalability", International Journal of Computer Sciences and Engineering, Vol.06, Issue.01, pp.60-65, 2018.