Runnable Graph. Concepts ... Akka Streams wrapper around Kafka. API. ⢠Consumer Source. ⢠Producer Sink. Reactive Ka
Reducing Microservice Complexity with Kafka and Reactive Streams Jim Riecken Specialist Software Developer @jimriecken -
[email protected]
@jimriecken
Agenda • • • •
Monolith to Microservices + Complexity Asynchronous Messaging Kafka Reactive Streams + Akka Streams
Anti-Agenda • Details on how to set up a Kafka cluster • In-depth tutorial on Akka Streams
Monolith to Microservices
M
Time
Efficiency
S1
M S2
S1 S4 F
S2 S5 S3
Efficiency
• • • • • Time
Small Scalable Independent Easy to Create Clear ownership
Network Calls
• Latency • Failure
Reliability
99.9%
99.9%
99.9%
~99.5%
99.9%
Coordination
• Between services • Between teams
Asynchronous Messaging
Synchronous
Asynchronous Message Bus
Why? • Decoupling • Pub/Sub • Less coordination • Additional consumers are easy • Help scale organization
Messaging Requirements • • • • • •
Well-defined delivery semantics High-Throughput Highly-Available Durable Scalable Backpressure
Kafka
What is Kafka? • Distributed, partitioned, replicated commit log service • Pub/Sub messaging functionality • Created by LinkedIn, now an Apache open-source project
Producers
Kafka Brokers
Consumers
Topics + Partitions Topic P0
0 | 1 | 2 | 3 | 4 | 5
P1
0 | 1 | 2 | 3 | 4 | 5 | 6
P2
0 | 1 | 2 | 3
New Messages Appended
Producers • Send messages to topics • Responsible for choosing which partition to send to • Round-robin • Consistent hashing based on a message key
Consumers • Pull messages from topics • Track their own offset in each partition
Topic P0
1
2
Group 1
P1
P2
3
4
5
Group 2
6
How does Kafka meet the requirements?
Kafka is Fast • Hundreds of MB/s of reads/writes from thousands of concurrent clients • LinkedIn (2015) • 800 billion messages per day (18 million/s peak) • 175 TB of data produced per day • > 1000 servers in 60 clusters
Kafka is Resilient • Brokers • All data is persisted to disk • Partitions replicated to other nodes
• Consumers • Start where they left off
• Producers • Can retry - at-least-once messaging
Kafka is Scalable • Capacity can be added at runtime with zero downtime • More servers => more disk space
• Topics can be larger than any single node could hold • Additional partitions can be added to add more parallelism
Kafka Helps with Back-Pressure • Large storage capacity • Topic retention is a Consumer SLA
• Almost impossible for a fast producer to overload a slow consumer • Allows real-time as well as batch consumption
Message Data Format
Messages • • • •
Array[Byte] Serialization? JSON? Protocol Buffers • Binary - Fast • IDL - Code Generation • Message evolution
Processing Data with Reactive Streams
Reactive Streams • Standard for async stream processing with non-blocking back-pressure • Subscriber signals demand to publisher • Publisher sends no more than demand
• Low-level • Mainly meant for library authors
onSubscribe(s: Subscription) onNext(t: T) onComplete() onError(t: Throwable)
subscribe(s: Subscriber[-T])
Publisher[T]
Subscriber[T] Subscription request(n: Long) cancel()
Processing Data with Akka Streams
Akka Streams • Library on top of Akka Actors and Reactive Streams • Process sequences of elements using bounded buffer space • Strongly Typed
Concepts Source
Fan Out
Sink
Flow
Fan In
Concepts
Runnable Graph
Composition
Materialization • Turning on the tap • Create actors • Open files/sockets/other resources
• Materialized values • Source: Actor, Promise, Subscriber • Sink: Actor, Future, Producer
Reactive Kafka
Reactive Kafka • https://github.com/akka/reactive-kafka • Akka Streams wrapper around Kafka API • Consumer Source • Producer Sink
Producer • Sink - sends message to Kafka topic • Flow - sends message to Kafka topic + emits result downstream • When the stream completes/fails the connection to Kafka will be automatically closed
Consumer • Source - pulls messages from Kafka topics • Offset Management • Back-pressure • Materialization • Object that can stop the consumer (and complete the stream)
Simple Producer Example implicit val system = ActorSystem("producer-test") implicit val materializer = ActorMaterializer()
val producerSettings = ProducerSettings( system, new ByteArraySerializer, new StringSerializer ).withBootstrapServers("localhost:9092")
Source(1 to 100) .map(i => s"Message $i") .map(m => new ProducerRecord[Array[Byte], String]("lower", m)) .to(Producer.plainSink(producerSettings)).run()
Simple Consumer Example implicit val system = ActorSystem("producer-test") implicit val materializer = ActorMaterializer()
val consumerSettings = ConsumerSettings( system, new ByteArrayDeserializer, new StringDeserializer, ).withBootstrapServers("localhost:9092").withGroupId("test-group")
val control = Consumer.atMostOnceSource( consumerSettings.withClientId("client1"), Subscriptions.topics("lower")) .map(record => record.value) .to(Sink.foreach(v => println(v))).run()
control.stop()
Combined Example val control = Consumer.committableSource( consumerSettings.withClientId("client1"), Subscriptions.topics("lower")) .map { msg => val upper = msg.value.toUpperCase ProducerMessage.Message( new ProducerRecord[Array[Byte], String]("upper", upper), msg.committableOffset) }.to(Producer.commitableSink(producerSettings)).run()
control.stop()
Demo
Wrap-Up
Wrap-Up • Microservices have many advantages, but can introduce failure and complexity. • Asynchronous messaging can help reduce this complexity and Kafka is a great option. • Akka Streams makes reliably processing data from Kafka with back-pressure easy
Thank you! Questions? Jim Riecken @jimriecken -
[email protected]