2. Data Team working with Scala, Spark, R, ... 4. 16 in Data Team, 14 nationalities. 3 home24.tech.blog home24.de/jobs. LinkedIn - Home24 AG home24 âcode ...
AWS Summit 2017
Coordinating External Data Importer Services using AWS Step Functions Andre Vella, Director of Data Marcos Rebelo, Principal Data Engineer
1
home24 “Zuhause ist, was dir gefällt”
THE EUROPEAN MARKET LEADER AND GO-TO DESTINATION FOR HOME & LIVING ONLINE SHOPPING
Significant Scale Started in 2012, €234m net sales in 2015
Dynamic Growth +49% Y-o-Y sales growth in 2015
Consumer Destination > 100000 Articles
International Reach 8 Countries, 2 Continents
2
home24 “code sweet code” 1
100+ in tech department
2
8 Teams
3
16 in Data Team, 14 nationalities
4
Data Team working with Scala, Spark, R, ...
home24.tech.blog home24.de/jobs LinkedIn - Home24 AG 3
home24 Data Platform
4
External Data Sources
Import GBs of Data into S3 every day from multiple Services
5
Evaluating Options
Potential Buy and Build Options
Data Virtuality
Apache Airflow
Amazon Simple Workflow
funnel.io
and some others ...
6
Behold … AWS Step Functions
“State Machine” (noun) 1. A concept used by Computer Science professors for torturing undergrads, full of arcane math. 2. A practical way to build and manage modern Serverless Cloud apps. 7
Core Principles of our External Data Importer
𝝺 SIMPLE
RESILIENT
SERVERLESS
8
Working with AWS Step Functions
Define in JSON
Visualize in Console
Monitor Executions
{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }
9
Our Approach … as an ideal scenario
𝝺1
𝝺2
𝝺3
Starter: AWS Lambda function starting Step Function
Downloader: Cycle of AWS Lambda Function downloading files from the remote service to S3 “Raw” Bucket Refine: Cycle of AWS Lambda Function processing each file that arrive to the S3 “Raw” Bucket and storing it in S3 “Refined” Bucket
10
Solution Design Challenge #1 5 Minute Lambda Limit
𝝺4
Dispatcher: One AWS Lambda Function that splits work in smaller loads 11
Solution Design Challenge #2 Maximum input of 32,768 characters
12
Solution Design
13
Solution Design
14
Solution Design
15
Solution Design
16
Our take on AWS Step Functions
17
Defining AWS Step Function “States”
1 2 3 4 5 6 7
Task: single unit of work performed by a state machine Wait: delays the state machine from continuing for a specified time Pass: simply passes its input to its output, performing no work Parallel: can be used to create parallel branches of execution in your state machine Choice: Adds branching logic Succeed: stops an execution successfully Fail: stops the execution of the state machine and marks it as a failure
18
Anatomy of the Template - Defining in JSON
{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }
19
Anatomy of the Template - Defining in JSON
{ "StartAt" : "DispatcherState", "Comment" : "An example of the ASF.", "States" : { "DispatcherState" : { ... }, ..., "FinalState" : { "Type" : "Pass", "End" : true } } }
20
Cycle on Step Functions
"DownloaderChoiceState" : { "Type" : "Choice", "Choices" : [ { "Variable" : "$.downloaderFinished", "BooleanEquals" : false, "Next" : "DownloaderState" } ], "Default" : "RefinerChoiceState" }, "DownloaderState" : { "Type" : "Task", "Resource" : "arn:aws:lambda:eu-wes...", "Next" : "DownloaderChoiceState" }
21
Cycle on Step Functions
"DownloaderChoiceState" : { "Type" : "Choice", "Choices" : [ { "Variable" : "$.downloaderFinished", "BooleanEquals" : false, "Next" : "DownloaderState" } ], "Default" : "RefinerChoiceState" }, "DownloaderState" : { "Type" : "Task", "Resource" : "arn:aws:lambda:eu-wes...", "Next" : "DownloaderChoiceState" }
22
Retry on AWS Lambda Function Error
"DownloaderState" : { "Type" : "Task", "Resource" : arn:aws:lambda:eu-...", "Retry" : [ { "ErrorEquals" : [ "States.ALL" ], "IntervalSeconds" : 60, "MaxAttempts" : 5, "BackoffRate" : 2 } ], "Next" : "DownloaderChoiceState" }
23
Visualizing in Console
24
Monitoring Execution in Console
25
Monitoring Errors in Console
26
Monitoring in Amazon CloudWatch
27
AWS Step Functions + CloudFormation
28
Fun Facts
~ 50 GB of GZIP external data Every day
20+ Services and increasing
~ 5 man days full dev-cycle effort per Service
29
Price Facts
4,000 state transitions are free each month $0.025 per 1,000 state transitions thereafter ($0.000025 per state transition)
… coming from a SaaS costing us $5000/month We are doing ~1000 state transitions a day to ~ $45/month for Step Functions + Lambda → $44 Lambda →