Hadoop Distributed Computing Clusters for Fault ... - IEEE Xplore

Recommend Documents

Allocating Applications in Distributed Computing - IEEE Xplore

interconnected servers host a set of applications to balance the .... with the highest network demand, and ... and server 3 is dedicated to running application 6.

Toward internet distributed computing - IEEE Xplore

needed to make the Internet an application-hosting platform. This would be a networked, distributed counterpart of the hosting environment that traditional.

Towards an Agent Computing Platform for Distributed ... - IEEE Xplore

Jul 3, 2013 - now aim to use networking technologies utilizing the latest electronics and ... Java networking and mobile ad hoc network-based distributed.

Distributed Fault-Tolerant Topology Control in ... - IEEE Xplore

Sep 2, 2015 - an arbitrary communication network. Simulation results ... X. Liu is with the Department of Electronic and Information Engineering,. Huazhong ...

An Efficient Fault-Tolerant Distributed Channel ... - IEEE Xplore

AbstractâA channel allocation algorithm in a cellular network consists of two parts: a channel ... there is one Mobile Service Station (MSS) [1] as well as a.

services computing - IEEE Xplore

sponse to the call, and 5 papers were accepted for this spe cial section. The first ... As it is time-consuming to specify all the business proc esses from low-level ... Approach to Guaranteeing Multimedia Conferencing Ser vices" by WU Jiyan, ...

Cloud Computing - IEEE Xplore

ally know what cloud computing is in the more formal sense. ... Cloud computing allows for pay-per-use or charge-per-use access ... tem complexity that required specialized talent for maintenance ... vironments, and even infrastructure and security c

Wearable Computing - IEEE Xplore

architects, in one or two years all auto- .... Inexpensive wireless point-of-sale terminals using GSM: www.wayinc.com ... mobile, as in the case of a laptop, the.

Speckled Computing - IEEE Xplore

{shah, myongsp}@ilab.korea.ac.kr. Brunel University, United Kingdom [email protected]. â Corresponding Author. Abstract - Speckled computing is an ...

services computing - IEEE Xplore

of Waterloo, Canada and a David Cheriton Faculty Fellow. He is the founding Editor-in-Chief of the IEEE Transactions on Network and. Service Management ...

Cloud Computing - IEEE Xplore

In 2011, 10 of the top-25 downloaded ar- ticles from IEEE's ... ally know what cloud computing is in the more ... by free email servers, applications, and storage.

Wearable Computing - IEEE Xplore

explored mini-QWERTY (âthumbâ) key- boards. Our subjects in this study also averaged respectable desktop typing speeds (60 words per minute), even if.

FDTD speedups obtained in distributed computing on a ... - IEEE Xplore

Institute for Simulation and Training (IST), University of Central Florida ... Department of Electrical and Computer Engineering, University of Central Florida.

Fault Tolerance in Distributed Neural Computing

Sep 30, 2015 - Index TermsâFault-tolerance, graceful degradation, redun- dancy, neural networks. I. INTRODUCTION. The inevitable demand for ever more ...

Combining Fault Avoidance, Fault Removal and Fault ... - IEEE Xplore

Model. A. Mili, B. Cukic, T. Xia. Institute for Software Research. 1000 Technology Drive. Fairmont, WV 26554, USA. {amili,bcukic,txia}@softwareresearch.org.

Distributed Algorithm for Cooperative Coverage ... - IEEE Xplore

Queen Mary University of London. London, UK john.bigham, [email protected]. Abstractâ Recent developments in mobile networks have looked.

Distributed Reinforcement Learning Frameworks for ... - IEEE Xplore

Oct 20, 2010 - Distributed Reinforcement Learning Frameworks for. Cooperative Retransmission in Wireless Networks. Ghasem Naddafzadeh-Shirazi ...

Distributed Constrained Optimization for Bayesian ... - IEEE Xplore

Akshay A. Morye, Chong Ding, Amit K. Roy-Chowdhury, and Jay A. Farrell, Fellow, IEEE .... tain tasks, [2] and [3] define deployment strategies for camera.

Diffusion Adaptation Strategies for Distributed ... - IEEE Xplore

Oct 9, 2014 - data to estimate some. 1 parameter vector in a distributed manner. There are a ... recovery of sparse vect

Protected Computing vs. Trusted Computing - IEEE Xplore

Protected Computing vs. Trusted Computing. *. Antonio MaÃ±a, Antonio MuÃ±oz. Computer Science Department. University of MÃ¡laga, 29071 MÃ¡laga, Spain.

Mathematical Decomposition Techniques for Distributed ... - IEEE Xplore

Mathematical Decomposition Techniques for. Distributed Cross-Layer Optimization of Data Networks. BjÃ¶rn Johansson, Student Member, IEEE, Pablo Soldati, ...

Multiagent Architecture For Distributed Adaptive ... - IEEE Xplore

INDEX TERMS Distributed embedded system, Energy harvesting, Multiagent, Reconfiguration, Real-time ... EH-EDF â Energy-Harvesting Earliest Deadline First.

Distributed Command Governor Strategies for ... - IEEE Xplore

Distributed Command Governor Strategies for Constrained. Coordination of Multi-Agent Networked Systems. F. Tedesco, A. Casavola, E. Garone. AbstractâIn ...

Distributed Constrained Optimization for Bayesian ... - IEEE Xplore

Akshay A. Morye, Chong Ding, Amit K. Roy-Chowdhury, and Jay A. Farrell, Fellow, IEEE .... tain tasks, [2] and [3] define deployment strategies for camera.

Hadoop Distributed Computing Clusters for Fault ... - IEEE Xplore

Download PDF

2 downloads 0 Views 206KB Size Report

Comment

January-February 2016. doi: 10.1109/MNET.2016.7389829. [5] Y.M. Teo, B.L. Luong, Y. Song, T. Nam,âCost-. Performance of Fault Tolerance in Cloud ...

Hadoop Distributed Computing Clusters for Fault Prediction Joey Pinto1, Pooja Jain2, Tapan Kumar3 Indian Institute of Information Technology Kota, India [email protected](pooja2, tapan3) @iiitkota.ac.in

Abstract—Hadoop architecture provides one level of fault tolerance, in a way of rescheduling the job on the faulty nodes to other nodes in the network. But, this approach is inefficient when a fault occurs after most of the job is executed. Thus, it’s necessary to predict the fault at the node at quite an early stage so that the rescheduling of the job is not costly in terms of time and efficiency. Prediction of these faults gives us the necessary time to shift the task load onto another node(s) and thus prevent data or computation time loss. An implementation is done on MATLAB SVM kernel and Ganglia with Java as an interfacing language. Ganglia is used for network system statistics monitoring. The system is trained using statistics of a normal task run and can thus detect deviations from them in real time. The experimental results clearly indicate that it is possible to predict the occurrence of a fault using previously gained knowledge with minimal time delay.As a result of which either the job can be rescheduled or the cluster itself can be upscaled.The reinforced learning module reduces false positives with each run and makes it possible to implement a truly faulttolerant cluster. IndexTerms—Hadoop tolerance

cluster,

Big

data,

SVM,

Fault

I. INTRODUCTION One of the most important challenges in distributed computing is to ensure that services are correct and available despite the faults [1]. Fault detection aims at identifying the faulty components so that they can be isolated and repaired [2][3]. As the need for big data increases, new data collection, transmission, and processing techniques are required [4]. To avoid the complications of high-performance computing of big data, the distributed systems should be fault tolerant [5]. Map-Reduce frameworks such as Hadoop have built-in fault-tolerance mechanisms that allow jobs to run to completion even in the occurrence of certain faults [6]. But these jobs can experience severe performance penalties when a node crashes. The time taken to reschedule or restart a task can havea serious impact on the time taken to complete the complete job. For example, if the total execution time of task ‘i’ on node ‘j’ is Tij , and the task ‘i’ encounters a failure at the time ‘t’ . If t