NGA2-5_Home24 AWS Summit 2017 - Coordinating External Data ... [PDF]

3 downloads 210 Views 8MB Size Report
2. Data Team working with Scala, Spark, R, ... 4. 16 in Data Team, 14 nationalities. 3 home24.tech.blog home24.de/jobs. LinkedIn - Home24 AG home24 “code ...
AWS Summit 2017

Coordinating External Data Importer Services using AWS Step Functions Andre Vella, Director of Data Marcos Rebelo, Principal Data Engineer

1

home24 “Zuhause ist, was dir gefällt”

THE EUROPEAN MARKET LEADER AND GO-TO DESTINATION FOR HOME & LIVING ONLINE SHOPPING

Significant Scale Started in 2012, €234m net sales in 2015

Dynamic Growth +49% Y-o-Y sales growth in 2015

Consumer Destination > 100000 Articles

International Reach 8 Countries, 2 Continents

2

home24 “code sweet code” 1

100+ in tech department

2

8 Teams

3

16 in Data Team, 14 nationalities

4

Data Team working with Scala, Spark, R, ...

home24.tech.blog home24.de/jobs LinkedIn - Home24 AG 3

home24 Data Platform

4

External Data Sources

Import GBs of Data into S3 every day from multiple Services

5

Evaluating Options

Potential Buy and Build Options

Data Virtuality

Apache Airflow

Amazon Simple Workflow

funnel.io

and some others ...

6

Behold … AWS Step Functions

“State Machine” (noun) 1. A concept used by Computer Science professors for torturing undergrads, full of arcane math. 2. A practical way to build and manage modern Serverless Cloud apps. 7

Core Principles of our External Data Importer

𝝺 SIMPLE

RESILIENT

SERVERLESS

8

Working with AWS Step Functions

Define in JSON

Visualize in Console

Monitor Executions

{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }

9

Our Approach … as an ideal scenario

𝝺1

𝝺2

𝝺3

Starter: AWS Lambda function starting Step Function

Downloader: Cycle of AWS Lambda Function downloading files from the remote service to S3 “Raw” Bucket Refine: Cycle of AWS Lambda Function processing each file that arrive to the S3 “Raw” Bucket and storing it in S3 “Refined” Bucket

10

Solution Design Challenge #1 5 Minute Lambda Limit

𝝺4

Dispatcher: One AWS Lambda Function that splits work in smaller loads 11

Solution Design Challenge #2 Maximum input of 32,768 characters

12

Solution Design

13

Solution Design

14

Solution Design

15

Solution Design

16

Our take on AWS Step Functions

17

Defining AWS Step Function “States”

1 2 3 4 5 6 7

Task: single unit of work performed by a state machine Wait: delays the state machine from continuing for a specified time Pass: simply passes its input to its output, performing no work Parallel: can be used to create parallel branches of execution in your state machine Choice: Adds branching logic Succeed: stops an execution successfully Fail: stops the execution of the state machine and marks it as a failure

18

Anatomy of the Template - Defining in JSON

{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }

19

Anatomy of the Template - Defining in JSON

{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }

20

Cycle on Step Functions

"DownloaderChoiceState" : { "Type" : "Choice", "Choices" : [ { "Variable" : "$.downloaderFinished", "BooleanEquals" : false, "Next" : "DownloaderState" } ], "Default" : "RefinerChoiceState" }, "DownloaderState" : { "Type" : "Task", "Resource" : "arn:aws:lambda:eu-wes...", "Next" : "DownloaderChoiceState" }

21

Cycle on Step Functions

"DownloaderChoiceState" : { "Type" : "Choice", "Choices" : [ { "Variable" : "$.downloaderFinished", "BooleanEquals" : false, "Next" : "DownloaderState" } ], "Default" : "RefinerChoiceState" }, "DownloaderState" : { "Type" : "Task", "Resource" : "arn:aws:lambda:eu-wes...", "Next" : "DownloaderChoiceState" }

22

Retry on AWS Lambda Function Error

"DownloaderState" : { "Type" : "Task", "Resource" : arn:aws:lambda:eu-...", "Retry" : [ { "ErrorEquals" : [ "States.ALL" ], "IntervalSeconds" : 60, "MaxAttempts" : 5, "BackoffRate" : 2 } ], "Next" : "DownloaderChoiceState" }

23

Visualizing in Console

24

Monitoring Execution in Console

25

Monitoring Errors in Console

26

Monitoring in Amazon CloudWatch

27

AWS Step Functions + CloudFormation

28

Fun Facts

~ 50 GB of GZIP external data Every day

20+ Services and increasing

~ 5 man days full dev-cycle effort per Service

29

Price Facts

4,000 state transitions are free each month $0.025 per 1,000 state transitions thereafter ($0.000025 per state transition)

… coming from a SaaS costing us $5000/month We are doing ~1000 state transitions a day to ~ $45/month for Step Functions + Lambda → $44 Lambda →