Open Access   Article Go Back

A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing

Kamaljit Kaur1 , Kuljit Kaur2

Section:Survey Paper, Product Type: Journal Paper
Volume-7 , Issue-5 , Page no. 787-800, May-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i5.787800

Online published on May 31, 2019

Copyright © Kamaljit Kaur, Kuljit Kaur . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Kamaljit Kaur, Kuljit Kaur, “A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, pp.787-800, 2019.

MLA Style Citation: Kamaljit Kaur, Kuljit Kaur "A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing." International Journal of Computer Sciences and Engineering 7.5 (2019): 787-800.

APA Style Citation: Kamaljit Kaur, Kuljit Kaur, (2019). A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing. International Journal of Computer Sciences and Engineering, 7(5), 787-800.

BibTex Style Citation:
@article{Kaur_2019,
author = {Kamaljit Kaur, Kuljit Kaur},
title = {A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2019},
volume = {7},
Issue = {5},
month = {5},
year = {2019},
issn = {2347-2693},
pages = {787-800},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4315},
doi = {https://doi.org/10.26438/ijcse/v7i5.787800}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.787800}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4315
TI - A Survey on Energy-Aware Fault Tolerant Strategies in Cloud Computing
T2 - International Journal of Computer Sciences and Engineering
AU - Kamaljit Kaur, Kuljit Kaur
PY - 2019
DA - 2019/05/31
PB - IJCSE, Indore, INDIA
SP - 787-800
IS - 5
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
285 175 downloads 122 downloads
  
  
           

Abstract

With the advent of technology, the computational demands of users are increasing day by day. Cloud Computing is among the most trending technologies satisfying the computationally intensive demands of users. Cloud computing has exploited virtualization technology to provide on demand provisioning of resources, results in increased complexity of cloud infrastructure, thus faults are inevitable. These faults may result in failure causing serious loss to the organizations. Techniques used for fault management usually require additional resources increasing the consumption of energy. Moreover, cloud infrastructure also consumes a lot of energy and is the major contributor to carbon content. Growing demands and limited renewable resources had led to serious energy crises. Thus energy efficient fault tolerant solutions are needed to tolerate faults and provide reliable, scalable and flexible availability of cloud services, preventing system failure and minimizing energy consumption at the same time. Fault tolerance and energy efficiency are the crucial issues which must be simultaneously considered in order to ensure availability, performance, and reliability of the cloud computing services. This paper describes the basic concepts of faults, errors, and failures. It also discusses different fault tolerance strategies and the trade-off between energy efficiency and fault tolerance.

Key-Words / Index Term

Checkpointing, Energy efficiency, Fault Tolerance, Migration, Replication

References

[1] F. Salfner, M. Lenk, M. Malek, “A survey of online failure prediction methods”, ACM Computing Surveys (CSUR),vol no 42(3), pp.10,2010
[2] R. Jhawar,V. Piuri, “Fault tolerance management in IaaS clouds” In Satellite Telecommunications (ESTEL), 2012 IEEE First AESS European Conference, pp. 1-6,2012.
[3] A. Avižienis, J.C. Laprie, B. Randell, and C. Landwehr. “Basic concepts and taxonomy of dependable and secure computing,” IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004.
[4] M. Grottke, R. Matias, and K. Trivedi, “The fundamentals of software aging,” in Proc. IEEE International Conference on Software Reliability Engineering Workshops, Nov. 2008.
[5] S. Marc, Robert W. Wisniewski, J.A. Abraham, S.V.Adve, S. Bagchi, P. Balaji, J. Belak et al. "Addressing failures in exascale computing." International Journal of High Performance Computing Applications, vol. 28, no. 2, pp. 129-173, 2014.
[6] F.Gartner, “Fundamentals of fault-tolerant distributed computing in asynchronous environments,” ACM Computing Surveys, vol. 31, no. 1, pp. 1–26, Mar. 1999.
[7] B. Javadi, J. Abawajy, and R. Buyya, “Failure-aware resource provisioning for hybrid cloud infrastructure,” Journal of Parallel and Distributed Computing, vol. 72, no. 10, pp. 1318 – 1331, 2012.
[8] M.Lackovic, D.Talia, R.T. Calasanz, J.Banares, and O.Rana, “A taxonomy for the analysis of scientific workflow faults,” Proceedings of the 13th IEEE International Conference on Computational Scienceand Engineering,pp.398–403, 2010
[9] J. Wei, L. Rashid, K. Pattabiraman and S. Gopalakrishnan, "Comparing the effects of intermittent and transient hardware faults on programs", Dependable Systems and Networks Workshops, IEEE, pp. 53-58, 2011.
[10] M. Castro , B. Liskov, “Practical byzantine fault tolerance and proactive recovery”, ACM Transactions on Computer Systems (TOCS), v.20 n.4, p.398-461, 2002.
[11] R. Jhawar, V. Piuri, "Fault Tolerance and Resilience in Cloud Computing Environments" in Computer and Information Security Handbook, Morgan Kaufmann, 2013.
[12] Q. Zhang, L. Cheng, R. Boutaba, “Cloud computing: state-of-the-art and research challenges”, J Internet Serv Appl , springer, 2010, pp. 7–18, 2010.
[13] B.P. Rimal , E. ChoI “A taxonomy and survey of Cloud Computing Systems”, IEEE Fifth International Joint Conference on INC, IMS and IDC, pp. 44-51, 2009
[14] C. N. Höfer, G. Karagiannis, “Cloud computing services: taxonomy and comparison”, Journal of Internet Services and Applications, Springer, pp.81-94, 2011.
[15] R. Buyya and Chee Shin Yeo, "Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility", Future Generation Computer Systems, vol. 25, no. 6, pp. 599-616, 2009
[16] C.C. Meixner, C.Develder, M.Tornatore, B. Mukherjee, “A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications”, IEEE Communiation Sureys, pp. 2244-2281 2016
[17] P.D. Kaur, Kanupriya, “Fault Tolerance Techniques and Architectures in Cloud Computing-A Comparative Analysis”, International Conference on Green Computing and Internet of Things, IEEE, pp. 1090-1095, 2015
[18] T. Mastelic , A. Oleksiak , H. Claussen , I. Brandic , J.M. Pierson , A. V. Vasilakos, “Cloud Computing: Survey on Energy Efficiency”, ACM Computing Surveys (CSUR), v.47 n.2, p.1-36, 2015
[19] E. Feller, L. Rilling, C. Morin, R. Lottiaux, and D. Leprince "Snooze: A Scalable, Fault-Tolerant and Distributed Consolidation Manager for Large-Scale Clusters," Proc. 2010 IEEE/ACM Int`l Conference on Green Computing and Communications & Int`l Conference on Cyber, Physical and Social Computing, pp. 125-132, 2010
[20] A. Beloglazov, R. Buyya, “Energy Efficient Allocation of Virtual Machines in Cloud Data Centers”, 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010
[21] D. Sun,Guiran, Chang, C. Miao, X. Wang, “Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments”, The Journal of Supercomputing 66, Vol no. 1, pp. 193-228, 2013
[22] C. Engelmann, G. R. Vallee, T. Naughton, and S. L. Scott “Proactive Fault Tolerance Using Preemptive Migration”, 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, IEEE, pp 252-257, 2009
[23] Liu, Jialei, S. Wang, Ao Zhou, S.A P Kumar, and R. Buyya “Using Proactive Fault - Tolerance Approach to Enhance Cloud Service Reliability”, IEEE Transactions on Cloud Computing, 2017
[24] Goiri, F. Julia, J. Guitart, and J. Torres, "Checkpoint-based fault tolerant infrastructure for virtualized service providers," Proc. IEEE/IFIP Network Operations and Management Symposium (NOMS`10), pp. 455-462, 2010
[25] P. Das, P. M. Khilar, “VFT: A Virtualization and Fault Tolerance Approach for Cloud Computing”, Proceedings of 2013 IEEE Conference on Information and Communication Technologies(ICT 2013) , (pp. 473-478) , 2013
[26] A. Zhou, S. Wang, Z. Zheng, C. Hsu, M. Lyu, and F. Yang, "On cloud service reliability enhancement with optimal resource usage," IEEE Transactions on Cloud Computing, pp. 452-466, 2016
[27] H. Jin, L. Deng, S. Wu, X. Shi, and X. Pan. “Live virtual machine migration with adaptive, memory compression”, In Cluster Computing and Workshops, 2009. CLUSTER ’09. IEEE Inter-national Conference on, pages 1 –10, 2009
[28] F. Ma, F. Liu, and Z. Liu, "Virtual machine migration based on improved pre-copy approach," In Proc. IEEE Int`l Conf. Software Engineering and Service Sciences, pp.230-233, 2010.
[29] X. Zhang, Z. Huo, Jie Ma, Dan Meng, “Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration”, 2010 IEEE International Conference on Cluster Computing, pp 88-97, 2010
[30] B. Hu, Z. Lei, Y. Lei, D. Xu, and J. Li, “A time-series based precopy approach for live migration of virtual machines”, IEEE 17th International Conference on Parallel and Distributed Systems, pp. 947-952, 2011
[31] J. Arputharaj Johnson “Optimization of migration downtime of virtual machine in Cloud”IEEE, pp. 1-5, 2013
[32] Y. Ma, H. Wang, J. Dong, Y. Li,and S. Cheng, “ME2: efficient live migration of virtual machine with memory exploration and encoding”, IEEE International Conference on Cluster Computing, pp 610-613, 2012.
[33] D. Jung, S. Chin, K. Chung, H. Yu, "VM migration for fault tolerance in spot instance based cloud computing," in Grid and Pervasive Computing, Springer, vol. 7861, pp. 142-151, 2013.
[34] B. Jiang, J. Wu, X. Zhu, D. Hu, “Priority-Based Live Migration of Virtual Machine”, Springer, pp. 376–385, 2013.
[35] C. Kim, C. Jeon, W. Lee, S. Yang, “A Parallel Migration Scheme for Fast Virtual Machine Relocation on a Cloud Cluster.” The Journal of Supercomputing, Springer, pp: 4623–4645, 2015.
[36] Y. Zhang, Z. Zheng, and M. R. Lyu. "BFTCloud: A byzantine fault tolerance framework for voluntary-resource cloud computing." In Cloud Computing (CLOUD), 2011 IEEE International Conference, pp. 444-451, 2011.
[37] B. Egger, Y. Cho, C. Joe, E.Park, J. Lee “Effcient Checkpointing of Live Virtual Machine Migration”, IEEE Transactions on Computers, pp. 3041 – 3054, 2016.
[38] M. Zhao, F. DUgard, K.A. Kwait, C.A. Kamhoua “Multi-level VM replication based survivability for mission-critical cloud computing”, IEEE International Symposium on Inegrated Network Management, 2015.
[39] D. Bruneo, S. Distefano, F. Longo, A. Puliafito, M. Scarpa. "Workload-based software rejuvenation in cloud systems."IEEE Transactions on Computers, vol. 62, no. 6, pp.1072-1085, 2013.
[40] L. Silva, J. Alonso, and J. Torres, “Using Virtualization to Improve Software Rejuvenation,” IEEE Trans. Computers, vol. 58, no. 11, pp. 1525-1538, 2009.
[41] K. Kourai, S.Chiba “A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines”, IEEE International Conference on Dependable Systems and Networks, pp. 245-255, 2007.
[42] Y. Abe, R. Geambasu, K. Joshi, M. Satyanarayanan “Urgent Virtual Machine Eviction with Enlightened Post-Copy”, 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 51-64, 2016.
[43] M.R. Hines, K. Goplanan “Post-Copy Based Live Virtual Machine Migration Using Adaptive Pre-Paging and Dynamic Self-Ballooning”, Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pp.51-60, 2009.
[44] K. Su, W. Chen, G. Li, Z. Wang “RPFF: A Remote Page-fault Filter for Post-copy Live Migration”, IEEE International Conference on Smart City/SocialCom/SustainCom together with DataCom 2015 and SC2 2015, pp.936-943, 2015.
[45] I. Egwutuoha, S. Chen, D. Levy, B. Selic, R. Calvo, “Energy efficient fault tolerance for high performance computing (hpc) in the cloud”, Sixth International Conference on Cloud Computing (CLOUD), IEEE, , pp. 762–769, 2013.
[46] H. Asai “P2V Migration with Post-copy Hot Cloning for Service Downtime Reduction”, IEEE Third International Conference on Cloud and Green Computing, pp. 1-8, 2013.
[47] J. Ansel, K. Arya, and G. Cooperman, “DMTCP: Transparent checkpointing for cluster computations and the desktop,” in Proc. of the Int’l Parallel and Distributed Processing Symp. (IPDPS). Rome, Italy: IEEE, 2009, pp. 1–12.
[48] S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira. Adaptive incremental checkpointing for massively parallel systems. ICS Proceedings of the 18th Annual International Conference on Supercomputing, pp. 277–286, 2004.
[49] R. Rajachandrasekar, A. Venkatesh, K. Hamidouche, D. K.Panda “Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters”, 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 261-271, 2015
[50] P. Chi, Cong Xu, T. Zhang, X. Dong, Y. Xie, “Using Multi-Level Cell STT-RAM for Fast and Energy-Efficient Local Checkpointing”, IEEE/ACM International Conference, pp. 301-308, 2014
[51] R. Melhem, D. Mosse, E. Elnozahy, “The Interplay of Power Management and Fault Recovery in Real-Time Systems” IEEE Transactions on Computers,, VOL. 53, NO. 2, pp.217-231, 2004
[52] M. Salehi, M. K. Tavana, S. Rehman, M. Shafique, A. Ejlali, and J. Henkel, “Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time Systems”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol 24, No 7, pp.2426-243, 2016
[53] S. Di, Y. Robert, F. Vivien, D. Kondo, C. Wang, F. Cappello1, “Optimization of Cloud Task Processing with Checkpoint-Restart Mechanism”, High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference, pp. 1-12, 2013
[54] B. Nicolae, F. Cappello “BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Cloudsusing Virtual Disk Image Snapshots”, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, pp. 1-36,2011
[55] D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, A. Zomaya, "Energy-Efficient Data Replication in Cloud Computing Datacenters", CCSNA Cluster Computing, Vol no. 18(1), pp.385-402, 2015
[56] M. A. Haque, H. Aydin, D. Zhu, "Energy-Aware Standby-Sparing "Technique for Periodic Real-Time Applications", Sparing "Technique for Periodic Real-Time Applications", Computer Design (ICCD), 2011 IEEE 29th International Conference, pp. 190-197, 2011.
[57] B. Mills, T. Znati, R. Melhem, K. B. Ferreira, R. E. Grant, “Energy Consumption of Resilience Mechanisms in Large Scale Systems” 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, IEEE, pp. 528-535, 2014
[58] W. Lang, J.M. Patel, J.F.Naughton “On Energy Management, Load Balancing and Replication”, ACM SIGMOD Record, pp. 35-42, 2009
[59] H. Goudarzi, M. Pedram “Energy-Efficient Virtual Machine Replication and Placement in a Cloud Computing System” 2012 IEEE Fifth International Conference on Cloud Computing, pp. 750-757, 2012
[60] X. Cui, T. Znati, R. Melhem, ”Adaptive and Power-Aware Resilience for Extreme-Scale Computing” 2016 Intl IEEE Conferences, pp 1-9, 2016
[61] Y. Lin, H. Shen, “EAFR: An Energy-Efficient Adaptive File Replication System in Data-Intensive Clusters”, IEEE Transactions on Parallel and Distributed Systems, pp. 1017-1030, 2017
[62] X. You, L. Zhou, J. Huang, J. Zhang, C. Jiang and J. Wan, “E2ARS : An Energy-Effective Adaptive Replication Strategy in Cloud Storage System, Applied Mathematics & Information Sciences An International Journal, pp. 2409-24019, 2013
[63] D. Ibtesham, D. DeBonis, D. Arnold, K.B. Ferreira. "Coarse-Grained Energy Modeling of Rollback/Recovery Mechanisms." InDependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on, pp. 708-713. IEEE, 2014.
[64] N. El-Sayed, B. Schroeder. "To checkpoint or not to checkpoint: Understanding energy-performance-I/O tradeoffs in HPC checkpointing." Cluster Computing (CLUSTER), 2014 IEEE International Conference, pp. 93-102, 2014.
[65] E.A.S Sarma, J.N. Maggo, A.S. Sachdeva, “India’s energy scenario in 2020” 17th World Energy Congress, Houston, TX, pp. 13-18, 1998
[66] F. Ahmad, T. N. Vijaykumar “Joint optimization of idle and cooling power in data centers while maintaining response time”, ACM SIGPLAN, pp. 243–256, 2010
[67] Indian Energy Outlook 2015 , http://www.worldenergyoutlook.org/india
[68] W. Deng, F.Liu, H. Jin, B. Li, D. Li, “Harnessing renewable energy in cloud datacenters: opportunities and challenges” IEEE Network, pp. 48-55, 2014.
[69] M. Ghamkhari, H. Mohsenian-Rad, “Energy and Performance Management of Green Data Centers: A Profit Maximization Approach”, IEEE Transactions on Smart Grid, pp. 1017-1025, 2013
[70] E.K. Lee, H. Viswanathan, D. Pompili, “Proactive Thermal-aware Resource Management in Virtualized HPC Cloud Datacenters”, IEEE Transactions on Cloud Computing, pp. 1-14, 2015
[71] A. Beloglazov, J. Abawajy, R. Buyya, “Energy-aware resource allocation heuristics for efficient management of data centers for Cloud Computing”, Future generation Computer System, pp.755-768, 2012
[72] A. Dalvandi, M. Gurusamy, K.C. Chua “Time-aware VMFlow Placement, Routing and Migration for Power Efficiency in Data Centers” IEEE Transactions on Network and Service Management, pp. 349-362, 2015
[73] A. Beloglazov, R. Buyya, “Optimal Online Deterministic Algorithms and Adaptive Heuristics for Energy and Performance Efficient Dynamic Consolidation of Virtual Machines in Cloud Data Centers”, Concurrency and Computation: Practice and Experience (CCPE), Wiley Press, pp. 755-768, 2012.
[74] F. Farahnakian, A. Ashraf, T. Pahikkala, P. Liljeberg, J. Plosila, I. Porres, H. Tenhunen, “Using Ant Colony System to Consolidate VMs for Green Cloud Computing”, IEEE Transactions on Services Computing, pp. 184-198, 2015
[75] K.K. Nguyen, M. Cheriet “Environment-aware Virtual Slice Provisioning in Green Cloud Environment”, ” IEEE Transactions on Service Computing, pp. 507-519, 2015
[76] L. Wang, F. Zhang, J. A. Aroca, A.V. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks”, IEEE Journal on Selected Areas in Communications, pp. 4-15, 2014
[77] S. Wang, A. Zhou, C.H Hsu, X. Xiao, F. Yang, “Provision of Data-intensive Services through Energy- and QoS-aware Virtual Machine Placement in National Cloud Data Centers”, IEEE Transactions on Emerging Topics in Computing, pp. 290-300, 2016
[78] J. Moore , J.S. Chase, P. Ranganathan, “Weatherman: Automated, Online, and Predictive Thermal Mapping and Management for Data Centers”, IEEE International Conference on Autonomic Computing, pp. 155-164, 2006
[79] K. Singh, S. Kaushal, “Energy Efficient Resource Provisioning Through Power Stability Algorithm in Cloud Computing”, Proceedings of the International Congress on Information and Communication Technology, Springer, pp 255-263, 2016
[80] E. Sheme , N Frashëri , S. Holmbacka, S. Lafond, D. Lucanin , “Datacenters powered by renewable energy: A case study for 60 degrees latitude north”, Software, Telecommunications and Computer Networks (SoftCOM), 2016.
[81] D.Zhu, R. Melhem, D. Mossé. "The effects of energy management on reliability in real-time embedded systems." IEEE/ACM International Conference on Computer Aided Design, ICCAD, pp. 35-40, 2004.
[82] V. Sundriyal, M. Sosonkina, “Initial Investigation of a Scheme to Use Instantaneous CPU Power Consumption for Energy Savings Format”, E2SC `13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, 2013
[83] Y. Ren, J. Suzuki, C. Lee, A.V. Vasilakos, S. Omura, “Balancing Performance, Resource Efficiency and Energy Efficiency for Virtual Machine Deployment in DVFS-enabled Clouds: An Evolutionary Game Theoretic Approach”, GECCO Comp `14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation,pp. 1205-1212, 2014
[84] M. Basoglu, M. Orshansky, M. Erez, “NBTI-Aware DVFS: A New Approach to Saving Energy and Increasing Processor Lifetime”, IEEE International Symposium on Low-Power Electronics and Design, pp. 253-258, 2010
[85] B. Meroufeland, G. Belalem “Adaptive time-based coordinated checkpointing for cloud computing workflows”, Scalable Computing: Practice and Experience, pp. 153–168, 2014
[86] D. Jung, S.H. Chin, K.S. Chung, H. Yu, J. Gil “An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment”, IFIP International Federation for Information Processing, pp. 185-200, 2011
[87] A. Kumar, A. Bashir, "Improved EDF Algorithm for Fault Tolerance with Energy Minimization." In Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, pp. 370-374, 2015.
[88] T.Hea, A.Toosib, R. Buyyaa, “Performance Evaluation of Live Virtual Machine Migration in SDN-enabled Cloud Data Centers”, Journal of Parallel and Distributed Computing, Elsevier, pp. 1-40, 2019.