Once Master crashes, all data without ACKs needs to be sent again. Master might be a bottleneck due to large amount of A
Masterless ACK Enhance the Reliability and Performance of Flume
Yongkun Wang Next Generation Search Group, Rakuten, Inc.
Flume Reliability
Flume provides an “End-to-End” mode to ensure the data delivery.
Firstly, agent writes the event to disk in a 'write-ahead log' (WAL) An acknowledgment (ACK) is sent back to the originating agent after the destination receives the event Agent can remove the log entry for this event If ACK is not received by waiting timeout, agent will resend the event
The successful delivery of ACK is critical in this mode, otherwise:
Receiving duplicate data due to timeout re-sending Consuming large disk space because the log entries are not removed
2
Yongkun Wang, NGS, ACT, DU,
[email protected]
Problems of Current ACK Design master master master
3
ACK queue
2
Heartbeat Check ACKs
Send ACKs
Liveness agent Data
Source
Sink
1 Liveness
Data
Aggregated Data
Collector
Source
Sink Data to HDFS
Collector
agent
The ACKs are not directly sent back, but via the master. Issues:
Once Master crashes, all data without ACKs needs to be sent again Master might be a bottleneck due to large amount of ACKs Agent needs to wait until the heartbeat to get the ACKs.
Multi-master scheme with replication has the same issues
ACKs can be lost during the replication interval 3
Yongkun Wang, NGS, ACT, DU
[email protected]
Masterless ACK
Sending the ACKs to previous Agent/Collector
Instead of sending them to Master
Let the ACKs carry the route information.
Add host list to ACK ACK Host 1 Host 2 Host 3 …
Reuse the Event connection Push back ACKs by collector once ACKs are ready
ACKs are not pulled back by agent
4
Yongkun Wang, NGS, ACT, DU,
[email protected]
Masterless ACK Design (1)
For each flume node, either agent or collector, start two threads
Start a Distributor thread by Source to distribute ACKs Start a Receiver thread by Sink to wait for ACKs
Flume Node Source
Flume Node
Sink Flume Node ACKs
Source
Sink
Collector sink?
Start
Ack Ack Distributor Receiver
Start
Ack NO Distributor
Is destination?
Source
Sink
Ack Ack Distributor Receiver
Ack Receiver
YES WAL Manager 5
Yongkun Wang, NGS, ACT, DU,
[email protected]
Masterless ACK Design (2) Host 1
Agent
Event Source
Event
Host 1
Sink
ACK ACK Distributor Receiver
Host 1
Event
Host 2
Host 1 Host 2
Collector
Host 2
Collector
Host 3
Host 3
ACK Source Host 1
Sink
Source
ACK ACK Distributor Receiver
Sink
ACK ACK Distributor Receiver
Agent
Host 3
ACK Source
Sink
ACK ACK Distributor Receiver
ACK Host 2 Host 1
Host 2 Host 1
When connection between sink and source is established, the Distributor records the connection info Event records the host it passes by Host list is passed to ACK once the Event is saved to the destination ACK will be sent along the reversed host list 6 Yongkun Wang, NGS, ACT, DU,
[email protected]
Implementation with Thrift
Implemented in Flume-0.9.3 Masterless ACK is transported by thrift-0.6.0 Should work with the default thrift-0.5.0 Source
FlumeNode
Sink
ACK Distributor ACK Receiver
Thread
thread
Conn List ACK
Thrift ACK Adaptor
Thrift ACK Distributor thread
Thrift ACK
Thrift RPC Thrift ACK Receiver thread
Thrift Conn List
7
Yongkun Wang, NGS, ACT, DU,
[email protected]
Reuse Connection with Thrift
When the Source is open, get the connection and add it to Distributor’s Connection List
// ThriftEventSource class synchronized public void open() throws IOException { try { //start a new Distributor thread FlumeNode.getInstance().setAckDistributor(new ThriftAckDistributor()); new Thread(FlumeNode.getInstance().getAckDistributor()).start(); … // Override getProcessor() of Thrift and get the connection (thrift Client) when Event connection // is established. The connection is enqueued and used for the ack tranmission. TProcessorFactory processorFactory = new TProcessorFactory(null) { @Override public TProcessor getProcessor(TTransport trans) { //Add a client connection to queue FlumeNode.getInstance().getAckDistributor().addClient(new AckServiceClient(trans)); return new ThriftFlumeEventServer.Processor( … )); } }; … } Yongkun Wang, NGS, ACT, DU,
[email protected]
8
Distribute the ACK
With the Connection List, Ack Distributor will Popup the host from ACK’s host list, Compare the host name (IP is better?) with that in Connection List, Send that Ack to the corresponding Host(s) Ack Distributor Agent 1
Conn (Host1)
Collector 1
Collector 2
Conn (Host1)
Conn (Host2)
Conn (Host3)
Ack 2
Ack 3
Ack 1
Collector 3
Connection List
Host 3 Host 1 Host 2
Host 2 Host 2
Host 1
Ack Queue
9
Yongkun Wang, NGS, ACT, DU,
[email protected]
Test
Configuration:
collector2
agent1 collector1 collector2 agent1: tail(“/tmp/log”) | agentSink(“localhost”, 35853); collector1: collectorSource(35853) | agentBESink(“localhost”, 35854); collector2: collectorSource(35854) | collectorSink(“file:///tmp/temp”, “test-”);
Should use “agentBESink” for the middle nodes Use different port for each source/sink pair when all nodes on the same Host; When nodes are not on the same host, same port can be used for all source/sink pair, or any port available Log message:
Push back ack: log.xxxxxxxxxxxxxxx
agent1
collector1
Got ack: log.xxxxxxxxxxxxxxx
Master
Not destination. Send ack to distributor. Ack ID: log.xxxxxxxxxxxxxxx Push back ack: log.xxxxxxxxxxxxxxx
collector2
agent1
Got ack: log.xxxxxxxxxxxxxxx Ack reaches destination. Ack ID: log.xxxxxxxxxxxxxxx
collector1 Yongkun Wang, NGS, ACT, DU,
[email protected]
10
Thank you very much! Q&A Yongkun Wang
[email protected] Next Generation Search Group, Rakuten, Inc.