Straggler Problem –Tail Latancy in Distributed network

Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada

Open Access Article Go Back

Straggler Problem –Tail Latancy in Distributed network

Md. Nesar Rahman¹ , Ayesha Siddika² , Muhammad Shafiqul Islam³ , Md. Shahajada⁴

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-8 , Page no. 168-178, Aug-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i8.168178

Online published on Aug 31, 2019

Copyright © Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada, “Straggler Problem –Tail Latancy in Distributed network,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.8, pp.168-178, 2019.

MLA Style Citation: Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada "Straggler Problem –Tail Latancy in Distributed network." International Journal of Computer Sciences and Engineering 7.8 (2019): 168-178.

APA Style Citation: Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada, (2019). Straggler Problem –Tail Latancy in Distributed network. International Journal of Computer Sciences and Engineering, 7(8), 168-178.

BibTex Style Citation:
@article{Rahman_2019,
author = {Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada},
title = {Straggler Problem –Tail Latancy in Distributed network},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {8 2019},
volume = {7},
Issue = {8},
month = {8},
year = {2019},
issn = {2347-2693},
pages = {168-178},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4805},
doi = {https://doi.org/10.26438/ijcse/v7i8.168178}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i8.168178}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4805
TI - Straggler Problem –Tail Latancy in Distributed network
T2 - International Journal of Computer Sciences and Engineering
AU - Md. Nesar Rahman, Ayesha Siddika, Muhammad Shafiqul Islam, Md. Shahajada
PY - 2019
DA - 2019/08/31
PB - IJCSE, Indore, INDIA
SP - 168-178
IS - 8
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
332	361 downloads	155 downloads

Bar Line

Abstract

Distributed processing frameworks split a data intensive computation job into multiple smaller tasks, which are then executed in parallel on commodity clusters to achieve faster job completion. A natural consequence of such a parallel execution model is that slow running tasks, commonly called stragglers potentially delay overall job completion. Stragglers in general take more time to complete tasks than their peers. This could happen due to many reasons such as load imbalance, I/O blocks, garbage collections, hardware configuration etc. Straggler tasks continue to be a major hurdle in achieving faster completion of data intensive applications running on modern data-processing frameworks. The trouble with stragglers is that when parallel computations are followed by synchronizations such as reductions, this would cause all the parallel tasks to wait for others meaning that the parallel runtime is dominated by the slowest performing straggler. In a large-scale distributed system comprising a group of worker nodes, the stragglers` delay performance bottleneck, is caused by the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. Such stragglers increase the average job duration by 52% in data clusters of Facebook and Bing even after these companies using state of the art straggler mitigation techniques[1]. This is because current mitigation techniques all involve an element of waiting and speculation. Existing straggler mitigation techniques are inefficient due to their reactive and replicative nature – they rely on a wait speculate- execute mechanism, thus leading to delayed straggler detection and inefficient resource utilization. Hence, full cloning of small jobs, avoiding waiting and speculation altogether is proposed in a system called as Dolly. Dolly utilizes extra resources due to replication.

Key-Words / Index Term

Distributed network, latency, straggler detection, data clusters, slowest performing straggler

References

[1] S. Venkataraman, A. Panda, M. J. Franklin, and I. Stoica, “The Power of Choice in Data-Aware Cluster Scheduling This paper is included in the Proceedings of the Operating Systems Design and Implementation .,” 2014.
[2] D. Ford et al., “Availability in Globally Distributed Storage Systems,” 9th USENIX Symp. Oper. Syst. Des. Implement., pp. 61–74, 2010.
[3] X. Tian, R. Han, L. Wang, J. Zhan, and G. Lu, “Latency critical big data computing in finance,” J. Financ. Data Sci., vol. 1, no. 1, pp. 33–41, 2015.
[4] J. Dean and S. Ghemawat, “Summary of Installed Capacity , Dependable Capacity , Power Generation and Consumption (2003-2016),” pp. 137–149, 2016.
[5] J. Dean and L. A. Barroso, “The tail at scale,” Commun. ACM, vol. 56, no. 2, p. 74, 2013.
[6] W. D. Gray and D. A. Boehm-Davis, “Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior,” J. Exp. Psychol. Appl., vol. 6, no. 4, pp. 322–335, 2000.
[7] M. Kambadur, T. Moseley, R. Hank, and M. A. Kim, “Measuring interference between live datacenter applications,” Int. Conf. High Perform. Comput. Networking, Storage Anal. SC, no. 3, 2012.
[8] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, “Effective Straggler Mitigation: Attack of the Clones,” Nsdi, p. 185, 2013.
[9] G. Ananthanarayanan et al., “Reining in the Outliers in Map-Reduce Clusters using Mantri,” Time, pp. 265–278, 2010.
[10] E. Krevat, J. Tucek, and G. R. Ganger, “Disks are like snowflakes: no two are alike,” Proc. 13th USENIX Conf. Hot Top. Oper. Syst., p. 5, 2011.
[11] A. Tumanov, R. H. Katz, M. A. Kozuch, C. Reiss, and G. R. Ganger, “Heterogeneity and dynamicity of clouds at scale,” pp. 1–13, 2012.
[12] P. Beckman, K. Iskra, K. Yoshii, and S. Coghlan, “The influence of operating systems on the performance of collective operations at extreme scale,” Proc. - IEEE Int. Conf. Clust. Comput. ICCC, 2006.
[13] F. Petrini, D. J. Kerbyson, and S. Pakin, “The Case of the Missing Supercomputer Performance,” vol. 836, p. 55, 2011.
[14] K. B. Ferreira, P. G. Bridges, R. Brightwell, and K. T. Pedretti, “The impact of system design parameters on application noise sensitivity,” Cluster Comput., vol. 16, no. 1, pp. 117–129, 2013.
[15] C. Curino, D. E. Difallah, C. Douglas, and S. Krishnan, “Socc14-Paper15.”
[16] A. D. Ferguson, P. Bodik, E. Boutin, and R. Fonseca, “Jockey : Guaranteed Job Latency in Data Parallel Clusters,” Proc. 8th ACM Eur. Conf. Comput. Syst. - EuroSys ’12, pp. 99–112, 2012.
[17] B. Hindman et al., “2011_Benjamin Hindman_Benjamin Hindman_Mesos A Platform for Fine-Grained Resource Sharing in the Data Center.”
[18] C. R. Lumb and R. Golding, “D-SPTF: Decentralized Request Distribution in Brick-based Storage Systems,” ACM SIGOPS Oper. Syst. Rev., vol. 38, p. 37, 2004.
[19] D. Shue, M. Freedman, and A. Shaikh, “Performance Isolation and Fairness for Multi-Tenant Cloud Storage Setting : Shared Storage in the Cloud.”
[20] T. Zhu, A. Tumanov, M. A. Kozuch, M. Harchol-Balter, and G. R. Ganger, “PriorityMeister: Tail Latency QoS for Shared Networked Storage,” Symp. Cloud Comput., pp. 1–14, 2014.
[21] M. Capitão, “Mediator Framework for Inserting Data into Hadoop Micael José Pedrosa Capitão Plataforma de Mediação para a Inserção de Dados em Hadoop Mediator Framework for Inserting Data into Hadoop,” no. January, 2015.
[22] “Apache Hadoop.” [Online]. Available: http://hadoop.apache.org/. [Accessed: 08-Mar-2019].
[23] “NameNode and DataNode – Hadoop In Real World.” [Online]. Available: http://www.hadoopinrealworld.com/namenode-and-datanode/. [Accessed: 08-Mar-2019].
[24] “What is Hadoop Distributed File System (HDFS)? - Definition from WhatIs.com.” [Online]. Available: https://searchdatamanagement.techtarget.com/definition/Hadoop-Distributed-File-System-HDFS. [Accessed: 10-Jan-2019].
[25] “20 Essential Hadoop Tools for Crunching Big Data – Data Science IO – Medium.” [Online]. Available: https://medium.com/data-science-io/20-essential-hadoop-tools-for-crunching-big-data-efbc8b5c77ce. [Accessed: 15-Jan-2019].
[26] A. Manzanares et al., “Improving MapReduce performance through data placement in heterogeneous Hadoop clusters,” Ned. Tijdschr. Psychol., vol. 4, no. 4, pp. 1–9, 2010.
[27] “Hadoop Soup: 01/10/14.” [Online]. Available: http://dailyhadoopsoup.blogspot.com/2014_01_10_archive.html. [Accessed: 10-Sep-2018].
[28] “20 essential Hadoop tools for crunching Big Data.” [Online]. Available: https://bigdata-madesimple.com/20-essential-hadoop-tools-for-crunching-big-data/. [Accessed: 08-Sep-2018].
[29] “Apache Spark Introduction.” [Online]. Available: https://www.tutorialspoint.com/apache_spark/apache_spark_introduction.htm. [Accessed: 08-Aug-2018].
[30] “Home - Apache Hive - Apache Software Foundation.” [Online]. Available: https://cwiki.apache.org/confluence/display/HIVE. [Accessed: 08-Aug-2018].
[31] “What is Hive? Architecture & Modes.” [Online]. Available: https://www.guru99.com/introduction-hive.html. [Accessed: 08-Jul-2018].
[32] “Impala Hadoop Tutorial.” [Online]. Available: https://www.dezyre.com/hadoop-tutorial/hadoop-impala-tutorial. [Accessed: 20-Nov-2018].
[33] “Cloudera Impala Overview | 5.3.x | Cloudera Documentation.” [Online]. Available: https://www.cloudera.com/documentation/enterprise/5-3-x/topics/impala_intro.html. [Accessed: 04-Oct-2018].
[34] “Big Data: How to manage Hadoop.” [Online]. Available: https://www.cleverism.com/how-to-manage-hadoop-big-data/. [Accessed: 20-Dec-2018].
[35] “Introduction to batch processing - MapReduce - Data, what now?” [Online]. Available: https://datawhatnow.com/batch-processing-mapreduce/. [Accessed: 05-Jan-2019].
[36] A. Gupta and G. N. Campus, “HIVE- Processing Structured Data in HADOOP,” no. August, 2018.
[37] “Why is Impala faster than Hive? - Quora.” [Online]. Available: https://www.quora.com/Why-is-Impala-faster-than-Hive. [Accessed: 30-Sep-2018].
[38] “What is the advantages of Hadoop and Big data? - Quora.” [Online]. Available: https://www.quora.com/What-is-the-advantages-of-Hadoop-and-Big-data. [Accessed: 12-Jan-2019].
[39] “Advantages of Hadoop MapReduce Programming.” [Online]. Available: https://www.tutorialspoint.com/articles/advantages-of-hadoop-mapreduce-programming. [Accessed: 10-Dec-2018].

Citations	2325
h-index	16
i10-index	47