Masterless ACK - Google Groups

Masterless ACK Enhance the Reliability and Performance of Flume

Yongkun Wang Next Generation Search Group, Rakuten, Inc.

Flume Reliability

Flume provides an “End-to-End” mode to ensure the data delivery.

Firstly, agent writes the event to disk in a 'write-ahead log' (WAL) An acknowledgment (ACK) is sent back to the originating agent after the destination receives the event Agent can remove the log entry for this event If ACK is not received by waiting timeout, agent will resend the event

The successful delivery of ACK is critical in this mode, otherwise:

Receiving duplicate data due to timeout re-sending Consuming large disk space because the log entries are not removed

2

Yongkun Wang, NGS, ACT, DU, [email protected]

Problems of Current ACK Design master master master

3

ACK queue

2

Heartbeat Check ACKs

Send ACKs

Liveness agent Data

Source

Sink

1 Liveness

Data

Aggregated Data

Collector

Source

Sink Data to HDFS

Collector

agent

The ACKs are not directly sent back, but via the master. Issues:

Once Master crashes, all data without ACKs needs to be sent again Master might be a bottleneck due to large amount of ACKs Agent needs to wait until the heartbeat to get the ACKs.

Multi-master scheme with replication has the same issues

ACKs can be lost during the replication interval 3

Yongkun Wang, NGS, ACT, DU [email protected]

Masterless ACK

Sending the ACKs to previous Agent/Collector

Instead of sending them to Master

Let the ACKs carry the route information.

Add host list to ACK ACK Host 1 Host 2 Host 3 …

Reuse the Event connection Push back ACKs by collector once ACKs are ready

ACKs are not pulled back by agent

4


Masterless ACK Design (1)

For each flume node, either agent or collector, start two threads

Start a Distributor thread by Source to distribute ACKs Start a Receiver thread by Sink to wait for ACKs

Flume Node Source

Flume Node

Sink Flume Node ACKs

Source

Sink

Collector sink?

Start

Ack Ack Distributor Receiver

Start

Ack NO Distributor

Is destination?

Source

Sink

Ack Ack Distributor Receiver

Ack Receiver

YES WAL Manager 5


Masterless ACK Design (2) Host 1

Agent

Event Source

Event

Host 1

Sink

ACK ACK Distributor Receiver

Host 1

Event

Host 2

Host 1 Host 2

Collector

Host 2

Collector

Host 3

Host 3

ACK Source Host 1

Sink

Source


Sink


Agent

Host 3

ACK Source

Sink


ACK Host 2 Host 1

Host 2 Host 1

When connection between sink and source is established, the Distributor records the connection info Event records the host it passes by Host list is passed to ACK once the Event is saved to the destination ACK will be sent along the reversed host list 6 Yongkun Wang, NGS, ACT, DU, [email protected]

Implementation with Thrift

Implemented in Flume-0.9.3 Masterless ACK is transported by thrift-0.6.0 Should work with the default thrift-0.5.0 Source

FlumeNode

Sink

ACK Distributor ACK Receiver

Thread

thread

Conn List ACK

Thrift ACK Adaptor

Thrift ACK Distributor thread

Thrift ACK

Thrift RPC Thrift ACK Receiver thread

Thrift Conn List

7


Reuse Connection with Thrift

When the Source is open, get the connection and add it to Distributor’s Connection List

// ThriftEventSource class synchronized public void open() throws IOException { try { //start a new Distributor thread FlumeNode.getInstance().setAckDistributor(new ThriftAckDistributor()); new Thread(FlumeNode.getInstance().getAckDistributor()).start(); … // Override getProcessor() of Thrift and get the connection (thrift Client) when Event connection // is established. The connection is enqueued and used for the ack tranmission. TProcessorFactory processorFactory = new TProcessorFactory(null) { @Override public TProcessor getProcessor(TTransport trans) { //Add a client connection to queue FlumeNode.getInstance().getAckDistributor().addClient(new AckServiceClient(trans)); return new ThriftFlumeEventServer.Processor( … )); } }; … } Yongkun Wang, NGS, ACT, DU, [email protected]

8

Distribute the ACK

With the Connection List, Ack Distributor will Popup the host from ACK’s host list, Compare the host name (IP is better?) with that in Connection List, Send that Ack to the corresponding Host(s) Ack Distributor Agent 1

Conn (Host1)

Collector 1

Collector 2

Conn (Host1)

Conn (Host2)

Conn (Host3)

Ack 2

Ack 3

Ack 1

Collector 3

Connection List

Host 3 Host 1 Host 2

Host 2 Host 2

Host 1

Ack Queue

9


Test

Configuration:

collector2

agent1 collector1 collector2 agent1: tail(“/tmp/log”) | agentSink(“localhost”, 35853); collector1: collectorSource(35853) | agentBESink(“localhost”, 35854); collector2: collectorSource(35854) | collectorSink(“file:///tmp/temp”, “test-”);

Should use “agentBESink” for the middle nodes Use different port for each source/sink pair when all nodes on the same Host; When nodes are not on the same host, same port can be used for all source/sink pair, or any port available Log message:

Push back ack: log.xxxxxxxxxxxxxxx

agent1

collector1

Got ack: log.xxxxxxxxxxxxxxx

Master

Not destination. Send ack to distributor. Ack ID: log.xxxxxxxxxxxxxxx Push back ack: log.xxxxxxxxxxxxxxx

collector2

agent1

Got ack: log.xxxxxxxxxxxxxxx Ack reaches destination. Ack ID: log.xxxxxxxxxxxxxxx

collector1 Yongkun Wang, NGS, ACT, DU, [email protected]

10

Thank you very much! Q&A Yongkun Wang [email protected] Next Generation Search Group, Rakuten, Inc.

Masterless ACK - Google Groups

Masterless ACK - Google Groups

Suggest Documents

oral presetations - ACK Cyfronet AGH

Page 1 Date Rec Date Ack IntOff Ack Date CF sent IntOff Outcomes ...

Random Two-hop ACK to Detect Uncooperative Nodes ... - Google Sites

A DELAYED-ACK SCHEME FOR PERFORMANCE ENHANCEMENT ...

kerowack J ack M ann - touchpaperlit

ralphvernacchiateamtr ack & fieldmeet and spring forward ...

enhanced adaptive ack- protected interruption detection ... - ijartet

QUALITY ASSURANCE IN HIGHER - ACK Cyfronet AGH

TCP: From PSH to ACK

ACK 1 - Food and Drug Administration

Masterless Distributed Computing with Riak Core - Erlang Factory

Google Groups

×× ×× ×× ×××× ××× - Google Groups

Google Groups

Google Maps - Google Groups

Community Groups â Group Outline - Google Groups

Sign in - Google Groups

Registration - Google Groups

Newsletter - Google Groups

Download - Google Groups

Internet Appendix - Google Groups

Position - Google Groups

Position - Google Groups

Sahaja Marriages - Google Groups

Masterless ACK - Google Groups