Reducing Microservice Complexity with Kafka and ... - Scala Up North

1 downloads 116 Views 919KB Size Report
Runnable Graph. Concepts ... Akka Streams wrapper around Kafka. API. • Consumer Source. • Producer Sink. Reactive Ka
Reducing Microservice Complexity with Kafka and Reactive Streams Jim Riecken Specialist Software Developer @jimriecken - [email protected]

@jimriecken

Agenda • • • •

Monolith to Microservices + Complexity Asynchronous Messaging Kafka Reactive Streams + Akka Streams

Anti-Agenda • Details on how to set up a Kafka cluster • In-depth tutorial on Akka Streams

Monolith to Microservices

M

Time

Efficiency

S1

M S2

S1 S4 F

S2 S5 S3

Efficiency

• • • • • Time

Small Scalable Independent Easy to Create Clear ownership

Network Calls

• Latency • Failure

Reliability

99.9%

99.9%

99.9%

~99.5%

99.9%

Coordination

• Between services • Between teams

Asynchronous Messaging

Synchronous

Asynchronous Message Bus

Why? • Decoupling • Pub/Sub • Less coordination • Additional consumers are easy • Help scale organization

Messaging Requirements • • • • • •

Well-defined delivery semantics High-Throughput Highly-Available Durable Scalable Backpressure

Kafka

What is Kafka? • Distributed, partitioned, replicated commit log service • Pub/Sub messaging functionality • Created by LinkedIn, now an Apache open-source project

Producers

Kafka Brokers

Consumers

Topics + Partitions Topic P0

0 | 1 | 2 | 3 | 4 | 5

P1

0 | 1 | 2 | 3 | 4 | 5 | 6

P2

0 | 1 | 2 | 3

New Messages Appended

Producers • Send messages to topics • Responsible for choosing which partition to send to • Round-robin • Consistent hashing based on a message key

Consumers • Pull messages from topics • Track their own offset in each partition

Topic P0

1

2

Group 1

P1

P2

3

4

5

Group 2

6

How does Kafka meet the requirements?

Kafka is Fast • Hundreds of MB/s of reads/writes from thousands of concurrent clients • LinkedIn (2015) • 800 billion messages per day (18 million/s peak) • 175 TB of data produced per day • > 1000 servers in 60 clusters

Kafka is Resilient • Brokers • All data is persisted to disk • Partitions replicated to other nodes

• Consumers • Start where they left off

• Producers • Can retry - at-least-once messaging

Kafka is Scalable • Capacity can be added at runtime with zero downtime • More servers => more disk space

• Topics can be larger than any single node could hold • Additional partitions can be added to add more parallelism

Kafka Helps with Back-Pressure • Large storage capacity • Topic retention is a Consumer SLA

• Almost impossible for a fast producer to overload a slow consumer • Allows real-time as well as batch consumption

Message Data Format

Messages • • • •

Array[Byte] Serialization? JSON? Protocol Buffers • Binary - Fast • IDL - Code Generation • Message evolution

Processing Data with Reactive Streams

Reactive Streams • Standard for async stream processing with non-blocking back-pressure • Subscriber signals demand to publisher • Publisher sends no more than demand

• Low-level • Mainly meant for library authors

onSubscribe(s: Subscription) onNext(t: T) onComplete() onError(t: Throwable)

subscribe(s: Subscriber[-T])

Publisher[T]

Subscriber[T] Subscription request(n: Long) cancel()

Processing Data with Akka Streams

Akka Streams • Library on top of Akka Actors and Reactive Streams • Process sequences of elements using bounded buffer space • Strongly Typed

Concepts Source

Fan Out

Sink

Flow

Fan In

Concepts

Runnable Graph

Composition

Materialization • Turning on the tap • Create actors • Open files/sockets/other resources

• Materialized values • Source: Actor, Promise, Subscriber • Sink: Actor, Future, Producer

Reactive Kafka

Reactive Kafka • https://github.com/akka/reactive-kafka • Akka Streams wrapper around Kafka API • Consumer Source • Producer Sink

Producer • Sink - sends message to Kafka topic • Flow - sends message to Kafka topic + emits result downstream • When the stream completes/fails the connection to Kafka will be automatically closed

Consumer • Source - pulls messages from Kafka topics • Offset Management • Back-pressure • Materialization • Object that can stop the consumer (and complete the stream)

Simple Producer Example implicit val system = ActorSystem("producer-test") implicit val materializer = ActorMaterializer()

val producerSettings = ProducerSettings( system, new ByteArraySerializer, new StringSerializer ).withBootstrapServers("localhost:9092")

Source(1 to 100) .map(i => s"Message $i") .map(m => new ProducerRecord[Array[Byte], String]("lower", m)) .to(Producer.plainSink(producerSettings)).run()

Simple Consumer Example implicit val system = ActorSystem("producer-test") implicit val materializer = ActorMaterializer()

val consumerSettings = ConsumerSettings( system, new ByteArrayDeserializer, new StringDeserializer, ).withBootstrapServers("localhost:9092").withGroupId("test-group")

val control = Consumer.atMostOnceSource( consumerSettings.withClientId("client1"), Subscriptions.topics("lower")) .map(record => record.value) .to(Sink.foreach(v => println(v))).run()

control.stop()

Combined Example val control = Consumer.committableSource( consumerSettings.withClientId("client1"), Subscriptions.topics("lower")) .map { msg => val upper = msg.value.toUpperCase ProducerMessage.Message( new ProducerRecord[Array[Byte], String]("upper", upper), msg.committableOffset) }.to(Producer.commitableSink(producerSettings)).run()

control.stop()

Demo

Wrap-Up

Wrap-Up • Microservices have many advantages, but can introduce failure and complexity. • Asynchronous messaging can help reduce this complexity and Kafka is a great option. • Akka Streams makes reliably processing data from Kafka with back-pressure easy

Thank you! Questions? Jim Riecken @jimriecken - [email protected]

Suggest Documents