Implementing a Publish-Subscribe Distributed Notification System on Hadoop Jyotiska Nath Khasnabish, Ananda Prakash Verma, and Shrisha Rao International Institute of Information Technology, Bangalore 560 100, India {jyotiskanath.khasnabish,anandaprakash.verma}@iiitb.org,
[email protected]
Abstract. Apache Hadoop is an open source framework for processing massive amount of data in a distributed environment. Hadoop services use a polling mechanism for event notifications. In this paper, we propose a distributed notification system for Hadoop based on the Publish-Subscribe model. Such a notification system can be used for message-passing among Hadoop services. It can also be used to chain multiple MapReduce jobs based on events occuring in a Hadoop cluster. This results in reduced cluster load and network bandwidth requirement. We have used two popular Publish-Subscribe-based messaging systems— Apache ActiveMQ and Apache Kafka—for implementation. Lastly, we have executed performance tests on both these messaging systems to monitor time taken for message delivery and reception.
1
Introduction
In recent years, Hadoop [2] has become a de facto standard for processing massive amounts of data in a distributed environment. With growing numbers of Hadoopbased services each year, the necessity to implement an event-based notification system has grown significantly. In the Hadoop Summit 2011 [13], Yahoo disclosed that their primary workflow manager ‘Oozie’ manages over 600,000 processed jobs per month internally on their cluster, with the total number of users being more than 300. According to their prediction, the number of jobs will grow to a larger number in coming years. Different Hadoop services like MapReduce [10] computations produce large number of jobs every hour. Often these services are run together on a Hadoop cluster with several other Hadoop services to perform complex data-intensive operations. We have designed and implemented a notification system on Hadoop using the Publish-Subscribe [11] model, which provides high performance and scalable solution for passing messages between different services. In this system, one node or one service can play one of the following two roles, ‘Publisher’ or ‘Subscriber.’ The benefit of using the Publish-Subscribe model is that the Publishers are connected to the Subscribers through one or more than one message brokers rather
S.C. Satapathy et al. (eds.), ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of CSI - Volume I, Advances in Intelligent Systems and Computing 248, c Springer International Publishing Switzerland 2014 DOI: 10.1007/978-3-319-03107-1_60, ⃝
543
544
J.N. Khasnabish, A.P. Verma, and S. Rao
than directly; a Publisher need not know a Subscriber, and vice versa. The intermediate message broker performs the filtering procedure on the messages based on the ‘Topic,’ so that the Subscribers only get a relevant subset of messages instead of all messages sent by all the Publishers. In Section 2 we describe in detail the Publish Subscribe model, how a specific Topic-based notification message is delivered from a Publisher and reaches the Subscribers who are interested in that specific Topic. In Section 3 we explain the entire architecture of our notification system and the different roles and the functions in it. We also discuss the JMS (Java Message Service) architecture and how JMS can be used to implement our notification system. In Section 4 we briefly mention three possible use cases of our proposed notification system. First, we discuss how the system can be used to send notification between different Hadoop services. Second, we discuss how our system can be used to build an event-based job control framework for chaining multiple MapReduce jobs together and respecting dependencies among multiple jobs. Lastly, if one job waits for particular data which is to be produced as the output of another job, then the notification system can be used to notify about the data availability, instead of polling the NameNode. In Sections 5 and 6, we have analyzed the performance and scalability of the notification system using both ActiveMQ and Kafka, both of which are opensource versions of Publish Subscribe pattern-based messaging systems. Based on the results, we have concluded how the notification system can be scaled up with increasing sizes of input message sets, and provide better delivery times. Lastly, we discuss the possible areas where the proposed notification system can be deployed in future.
2
Publish Subscribe Model
Publish-Subscribe [11] is a message passing model where senders, also known as ‘Publisher’ can send messages to receivers known as ‘Subscriber’. The publishsubscribe model differs from a traditional client server system in that the Subscribers do not need to be directly connected to Publishers to receive messages. Publishers publish their messages when events occur, and messages are typically classified by a taxonomy of predefined Topics. The intermediate message broker takes care of the guaranteed message delivery. Since the message broker performs Topic-based filtering on all incoming messages from the Publishers, the Subscribers do not get all the messages published by all the Publishers present. Instead, a Subscriber gets only the messages on Topics to which which it is subscribed. We use a Topic based publish-subscribe model where a Hadoop service or a Node can act as a Publisher and publish some message on a specific Topic. Nodes or services which act as Subscribers, subscribed to that specific Topic, receive the message. The advantage of this system is that Subscriber nodes do not have to be on the same physical rack, and services need not to originate from same node to receive a message. In fact, the Publishers may not even be aware of the
Implementing a Publish-Subscribe Distributed Notification System
545
existence of a Subscriber. A Publisher just publishes its message and continue its own operation. For example, let us say that one MapReduce job is dependent on the output of a Pig job. So, it waits until the data has arrived from the currently executing job. The Pig job, upon the successful completion, publishes a message on a sample Topic like ‘Job #413 Complete.’ The MapReduce job which is subscribed to that Topic receives the notification as soon as the message is published, and therefore can start its computation immediately, rather than checking for the status of the previous job again and again. In this system, each process pi can execute the following operations: publishi (m) and subscribei (T ). A Publisher publishes a message m on a Topic T , and Subscribers which are listening on that Topic receive it from the message broker. We have implemented our notification system twice, using both Apache ActiveMQ and Apache Kafka, and measured their performances with different input sets. ActiveMQ and Kafka both provide APIs for Topic-based messaging system using the Publish-Subscribe model and both of them are highly powerful, scalable and can be distributed over large networks. 2.1
Apache ActiveMQ
Apache ActiveMQ [1] is an open-source messaging system which uses JMS (Java Messaging Service) to send and receive messages. ActiveMQ provides high scalability, performance and security for large scale messaging system. ActiveMQ architecture is divided into three components: Publisher, Broker and Subscriber. Following the Publish-Subscribe model, the Publisher publishes a message on a specific Topic to the ActiveMQ message broker. The broker may choose to store the messages which is known as Persistent ActiveMQ, or may not choose to store the messages which is known as Non-Persistent ActiveMQ. For persistence, ActiveMQ uses KahaDB to store the messages. By default, ActiveMQ uses TCP (Transmission Control Protocol) for guaranteed and safe message delivery. Each time a connection needs to be established, a Publisher uses 3-way handshaking protocol which is the norm in TCP. This is the reason why ActiveMQ is highly scalable, has zero message loss and guarantees message delivery.
Fig. 1. ActiveMQ Messaging System Architecture
546
J.N. Khasnabish, A.P. Verma, and S. Rao
A Subscriber also follow the same method to create a connection using 3-way handshaking. Then it starts listening on specific Topics. In a persistent system, all the messages on that Topic are delivered to the Subscriber. In a non-persistent system, only those messages that are published after the Subscriber has started are delivered. After receiving a given message, the Subscriber may choose to close the connection or to keep listening for further messages. 2.2
Apache Kafka
Apache Kafka [14] is a distributed publish-subscribe messaging system designed by LinkedIn, the social media company. Kafka supports persistent messaging with O(1) disk structures that provide constant-time performance (even with many TB of stored messages), high throughput, explicit support for partitioning messages over Kafka servers, and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics, support for parallel data load into Hadoop [3]. In Kafka, messages of a specific type are identified by Topics. A set of servers, known as ‘Brokers’ store the messages which are published by the Producers on a Topic. A Consumer can choose and subscribe to one or more one than Topics from the available brokers and consume the subscribed messages provided by the brokers. In Kafka, each of the brokers gets a partition of the Topic after it is divided into several parts. This results in balancing the load so that more than one producers and consumers can send and retrieve messages simultaneously. Kafka uses a Zookeeper [5] instance to start its server. The default port for Zookeeper is 2181 and Kafka server is 9092. After the server has started, Kafka can create a new Topic which is then added to the list of Topics, or it can choose from the existing Topics. Kafka automatically persists messages with its default configuration which means that when Subscriber comes live, it is able to receive messages on a specific Topic from the beginning.
Fig. 2. Kafka Messaging System Architecture (from [14])
Implementing a Publish-Subscribe Distributed Notification System
3
547
Architecture
Our notification system is based on JMS [7] publish-subscribe model and an intermediate message broker. The Observer [12] pattern in a publish-subscribe channel helps to inform the Subscribers when there is a change in the system. In this case, the system administrator creates a channel for message passing. The Publishers create a Topic on which the Subscribers receive messages published by the Publishers. As a result, when a Publisher sends a message the channel makes sure that a copy of the message is sent asynchronously to all the Subscribers listening on that Topic. For details, see [12]. JMS [7] or Java Message Service is a Java Message Oriented Middleware API which is used to send and receive messages among multiple clients. JMS has two different implementation models: Point to Point and Publish and Subscribe. In our system, we have used ‘Publish and Subscribe’ implementation model. JMS is comprised of the following elements: Provider, Client, Producer/Publisher, Consumer/Subscriber, Message, Queue, Topic. The model is implemented in the following steps in Hadoop: – The NameNode creates a Publish-Subscribe Channel. (This is represented as a JMS Topic.) – The Hadoop services acting as the Publisher creates a Topic to send messages on the channel. – Each of the Hadoop services acting as the Subscriber subscribes to a Topic to receive messages on the channel. For example, let us go back to the scenario mentioned in Section 2. Whenever the Pig job has finished its execution, it loads data into HDFS and the data
Fig. 3. Publish Subscribe Model of Notification System
548
J.N. Khasnabish, A.P. Verma, and S. Rao
is divided into blocks of fixed size. Then, the NameNode allocates the blocks to the DataNodes where the data gets written. As soon as the NameNode gets some acknowledgement from the DataNodes about the successful completion of the writing of data, it publishes a notification message of data availability to the Topic ‘Job #413 Complete’ using JMS (Java Message Service). The next MapReduce job depending on the output of the previous job and subscribed itself to the Topic receives the message of data availability. After receiving the notification message about the availability of data, it triggers its own workflow and begins the computation process. In this way, polling on the NameNode is removed, thus resulting in reduced network bandwidth usage and improvement in cluster performance.
4
Use Cases
Our notification system consists of two modules, ‘Publish’ and ‘Subscribe’. Each time a Hadoop service wants to publish a message based on a specific Topic on a channel, it just calls the ‘Publish’ module with necessary parameters such as a predefined Topic name, message and the broker URL. The ‘Subscribe’ module also performs the same work except it just needs a Topic name and the broker URL as parameter in order to receive the messages published by the Publisher. We have identified the following three possible use cases where we can deploy our proposed notification system. 4.1
Passing Messages between Hadoop Services
As discussed before, we can use our notification system if we want to pass notification messages between different running Hadoop services. The messages can be status flags, or progress reports about a certain running job (busy, waiting, or complete). For example, the TaskTrackers keep updating their status to JobTrackers at certain intervals, which creates network overhead when there is no status change. Our notification system allows the TaskTrackers to only notify JobTrackers when there is a status change. If a service wants to publish some message on a specific Topic, it may do so by calling the ‘Publish’ module with appropriate parameters (Topic, message, broker URL). In the same way, if a Hadoop service is waiting for certain notification message from another service then it will subscribe to a Topic and whenever the message arrives on the channel it is delivered to the Subscriber by the intermediate message broker. 4.2
Notification for Data Availability
In a large Hadoop environment, many of the jobs depend on the output of other currently executing jobs. If an organization produces large number of jobs every hour, many of which are dependent on other jobs this means that a job cannot start its execution until the jobs on which it is dependent finish executing and
Implementing a Publish-Subscribe Distributed Notification System
549
put their output data into HDFS. In this scenario, we can use the notification system to inform currently waiting messages about the data availability. For example, if one job A is waiting for one batch of log data regarding user activities in a website over a period of month, when the data becomes available, it is analyzed and a report is created. The job A is completely unaware of the availability of data or exactly when the data should be available. So it has no choice but to keep polling the NameNode. Now if thousands of jobs, like 10,000 or 15,000 jobs, keep polling the NameNode for such information continually, it not only cause unnecessary waste of bandwidth, but also increases the cluster load significantly. Instead, if the jobs were to subscribe to a specific Topic, say ‘Data Available,’ then they do not have to poll the NameNode repeatedly. As soon as the data are available, the NameNode, as a ‘Publisher,’ publishes a message on the Topic ‘Data Available’ and the subscribed jobs are able to continue their execution as soon as they hear back. This approach saves both network bandwidth and reduces the load on the Hadoop cluster. 4.3
Event Based Job Chaining
If multiple MapReduce or any other Hadoop jobs need to be chained in order to accomplish a complex task, this can be achieved using our notification system also. Many complex problems need to be solved by writing several MapReduce steps which run in series to accomplish a goal. It is also very common in a large organization where thousands of jobs are created every hour, that many of the jobs are interdependent with one another. This means that there should be an efficient workflow manager within the cluster to handle the jobs. Existing workflow managers like Oozie [4] handles the workflow using a DAG (Directed Acyclic Graph) where the jobs and their dependencies are represented using edges and nodes. The chain of jobs can be depicted as following: Map1 - Reduce1 - Map2 - Reduce2 - Map3 ... There are existing workflow managers like LinkedIn’s Azkaban [8], Spotify’s Luigi [9] or Yahoo’s Oozie [4] which are capable of chaining jobs. But in certain cases it is better to employ a workflow manager using the notification system we have implemented to remove the overhead of chaining jobs one by one and trigger workflows automatically. In this system, to make a chain of jobs, the ‘Publish’ and ‘Subscribe’ modules can be used so that when a job has finished executing, it can trigger the next set of jobs automatically without having to set the job dependencies manually.
5
Performance Analysis
We have analyzed the performance with different sets of messages and observed how much time it takes to publish messages and also to receive messages. We also have observed whether any message is getting lost or not. Since Hadoop is
550
J.N. Khasnabish, A.P. Verma, and S. Rao
designed to run on commodity hardware, we have used day-to-day use computers and executed our tests within virtual machines (VMs) with limited processing power. We used three VMs, one for ‘Publisher,’ one for ‘Subscriber,’ and one for ‘Message Broker.’ All three were running Hadoop v1.1.1 and ActiveMQ 5.8.0. For our graph we have taken message sets of 100, 200, 500, 1000, 2000 and 5000 and noted down the time taken to send those messages. We ran our tests three times and used the average values to increase accuracy. When the Publisher or Subscriber starts, it first creates a connection by the TCP 3-way handshaking protocol with the message broker using its URL and port number, which is by default 61616. After the session and Topic have been created, the Publisher prepares a message, sends the message to the broker, and closes the connection. The Subscriber keeps listening on some Topic; if a message is Published against that Topic, the broker sends the message to the Subscriber and upon receiving the desired message, the Subscriber may close the connection or keep listening for further messages on that or other Topics. In Kafka, if there are multiple brokers running, then a Topic gets divided into multiple partitions to manage load. But in our tests, we have used a single broker running on Machine #1. All the messages published from the Publishers go to the broker first, then are sent to the Subscribers based on their Topics. We have monitored the load on our cluster and overall network bandwidth consumption when our notification system was not in use and when we used our notification system. Based on the results received, we have shown in Figures 5 and 6 how the notification system can be used to bring down the cluster load and reduce network bandwidth by replacing the default polling mechanism.
Table 1. System Configuration used for Testing Machine #1 Processing Speed 2.3 GHz Primary Memory (RAM) 2 GB Disk Space 8 GB Operating System Ubuntu 12.04 Hadoop Version 1.1.1 ActiveMQ Version 5.8.0 Kafka Version 0.8
Machine #2 2.3 GHz 2 GB 8 GB Ubuntu 12.04 1.1.1 5.8.0 0.8
Machine #3 2.3 GHz 2 GB 8 GB Ubuntu 12.04 1.1.1 5.8.0 0.8
Table 2. Test Results using Apache ActiveMQ Number of Messages Iteration 1 (sec) Iteration 2 (sec) Iteration 3 (sec) Average Time (sec) 100 51 57 55 54.33 200 104 100 104 102.67 500 265 270 261 265.33 1000 509 519 520 516 2000 1035 1048 1043 1042 5000 2522 2497 2528 2515.67
Implementing a Publish-Subscribe Distributed Notification System
551
Table 3. Test Results using Apache Kafka Number of Messages Iteration 1 (sec) Iteration 2 (sec) Iteration 3 (sec) Average Time (sec) 100 55 53 52 53.33 200 94 93 96 94.33 500 253 246 247 248.67 1000 478 489 493 486.67 2000 986 981 978 981.67 5000 2402 2377 2362 2380.33
Fig. 4. Analysis of delivery times of messages using ActiveMQ and Kafka
Figure 5 shows how the load on the cluster was significantly reduced when we used our notification system instead of polling for a certain hour. Figure 6 shows the network bandwidth usage before and after using the notification system. We have used the Ganglia Monitoring System [6] for monitoring cluster load and network bandwidth usage, and to retrieve the data to plot the
(a) Using Polling
(b) Using Notification System
Fig. 5. Monitoring Cluster Load before and after using Notification System
552
J.N. Khasnabish, A.P. Verma, and S. Rao
(a) Using Polling
(b) Using Notification System
Fig. 6. Monitoring Network Bandwidth before and after using Notification System
graphs used to visualize the results found. In all our tests, we have experienced zero message loss. This is per the guarantee provided by both Kafka and ActiveMQ because of the underlying protocol (TCP) used by them. Apart from reliable transmission, TCP provides error detection, flow control, and congestion control, which helps ensure guaranteed message delivery. Also, ActiveMQ and Kafka both have almost similar message delivery times according to the results we found. This means that either of them can be used to implement our notification system without decrease in performance.
6
Conclusion
We have presented a distributed notification system for Hadoop based on the Publish-Subscribe model. Event-based notification systems like this not only reduce the load on Hadoop components such as the NameNode, but also increase the productivity of developers using Hadoop. As we have shown in our use cases, this notification system can be used to lower network bandwidth usage by reducing the number of redundant network packets; it can also chain multiple MapReduce jobs together to accomplish a complex task. High scalability and reliability are more reasons to deploy our notification system in different use cases we have mentioned, like message-passing between Hadoop services or chaining jobs. Because of the extensibility of our proposed system, it can also be used with any Hadoop services in the future. Also the scalable feature of such system ensures that large Hadoop clusters (more than 100 nodes) benefit from it to a great extent. In such environments, efficient use of ‘Publish’ and ‘Subscribe’ modules can ensure that job control frameworks are able to handle heavily interdependent jobs and computations resulting in optimal use of hardware and resources. Also, this system can reduce network bandwidth and resource usage if integrated with the core Hadoop framework. Over the years, as the complexity of job control frameworks increases, our notification system will help Hadoop users address these difficulties.
Implementing a Publish-Subscribe Distributed Notification System
553
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12. 13. 14.
Apache ActiveMQ, http://activemq.apache.org/ Apache Hadoop Project, http://hadoop.apache.org Apache kafka, https://github.com/apache/kafka Apache Oozie, https://oozie.apache.org/ Apache zookeeper, http://zookeeper.apache.org/ Ganglia monitoring system, http://ganglia.sourceforge.net/ Java message service, http://en.wikipedia.org/wiki/Java_Message_Service Linkedin azkaban, http://azkaban.github.io/azkaban2/ Spotify luigi, https://github.com/spotify/luigi Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004, pp. 137–149 (2004) Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.-M.: The many faces of publish/subscribe. ACM Computing Surveys 35(2), 114–131 (2003) Hohpe, G., Woolf, B.: Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley (2012) Islam, M.K.: Oozie: Scheduling workflows on the grid. In: Hadoop Summit (2011) Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: NetDB: Networking Meets Database, NetDB 2011 (2011)