Amazon Redshift & Amazon DynamoDB

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift & Amazon DynamoDB

Amazon Redshift

Amazon Redshift

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

A fully managed data warehouse service •  Massively parallel relational data warehouse •  Takes care of cluster management and distribution of your data •  Columnar data store with variable compression •  Optimized for complex queries across many large tables •  Use standard SQL & standard BI tools Amazon Redshift

Amazon DynamoDB

A fully managed fast key-value store •  •  •  •  • 

Fast, predictable performance Simple and fast to deploy Easy to scale as you go, up to millions of IOPS Pay only for what you use: Read / write IOPS + storage Data is automatically replicated across data centers

Amazon DynamoDB

Amazon DynamoDB •  Fast insert & update •  Limited query capability (single table only) •  NoSQL database

Amazon Redshift •  Fast queries •  Flexible queries (JOINs, aggregation functions, …) •  SQL

Queries in Amazon DynamoDB

Queries in Amazon DynamoDB •  Query or BatchQuery APIs retrieve items •  Scan & filter to comb through a whole table •  You have to join tables in your own code!

Amazon DynamoDB

Queries in Amazon DynamoDB (2) •  Apache Hive on Amazon EMR can access data in DynamoDB •  Run HiveQL queries for bulk processing •  Can integrate data in HDFS, Amazon S3, …

HiveQL queries on Amazon EMR

Amazon DynamoDB

Queries in Amazon DynamoDB (3) •  Import data into Amazon Redshift •  Use SQL queries, use BI tools etc. •  Powerful analytics and aggregation functions

Amazon Redshift

Amazon DynamoDB

Importing Data into Amazon Redshift

TMTOWTDI

Query & Insert #1 Query / BatchQuery

Amazon DynamoDB

#2 Retrieve Items

#3 INSERT … INTO (…)

Amazon Redshift

Query & Insert The Good •  Full control over queries •  Decide which items you want to move to Redshift •  Process data on the way

The Bad •  Slow •  Inefficient on the Redshift side of things •  Does not scale well

The COPY Command #1 COPY FROM …

#2 Politely ask for a table

Amazon DynamoDB

#3 Return whole table

Amazon Redshift


Amazon DynamoDB

#2 Parallel Scans

Amazon Redshift


Amazon DynamoDB

#3 Return Items

Amazon Redshift

The COPY Command •  COPY a single table at a time •  From one Amazon DynamoDB table into one Amazon Redshift table •  Fast – executed in parallel on all data nodes in the Amazon Redshift cluster •  Can be limited to use a certain percentage of provisioned throughput on the DynamoDB table

The COPY Command COPY (col1, col2, …) FROM 'dynamodb://' CREDENTIALS 'aws_access_key_id=…;aws_secret_access_key=…' READRATIO 10 -- use 10% of available read capacity COMPROWS 0 -- how many rows to read to determine -- compression […other options…]

The COPY Command •  •  •  • 

Attributes are mapped to columns by name Case of column names is ignored Attributes that do not map are ignored Missing attributes are stored as NULL or empty values •  Only works for STRING and NUMBER attributes

The COPY Command The Good •  Easy to use •  Fast •  Efficient use of resources •  Scales linearly with cluster size •  Only uses certain percentage of read throughput

The Bad •  Whole tables only •  No processing in between •  Can only copy from DynamoDB in same region •  Only works with STRING and NUMBER types

Query & Insert at Scale #1 Query / BatchQuery in parallel

Amazon DynamoDB

#2 Retrieve Items

#3 INSERT … INTO (…) in parallel

Amazon Redshift


#3 INSERT … INTO (…) in parallel Amazon EMR

Amazon DynamoDB

#2 Retrieve Items

Amazon Redshift


#3 INSERT … INTO (…) in parallel Amazon EMR

Amazon DynamoDB

#2 Retrieve Items

Amazon Redshift

Query & Import using Amazon EMR #1 Query / BatchQuery in parallel

Amazon DynamoDB

#3 Export to Amazon EMR file(s) on S3

e #5 R

#2 Retrieve Items

Amazon S3

#4 COPY… FROM s3://

tr

file e v ie

s

Amazon Redshift

Query & Import using Amazon EMR #3 COPY … FROM emr://

#1 Query / BatchQuery in parallel Amazon EMR

#4 Retrieve files from HDFS

Amazon DynamoDB

#2 Retrieve Items

Amazon Redshift

Query & Import using Amazon EMR The Good •  Decide which items you want to move to Redshift •  Full control over queries •  Process data on the way •  Scales well •  Integrates with other data sources easily

The Bad •  Additional complexity •  Additional cost (for EMR) •  Slower than direct COPY from Amazon DynamoDB

Please welcome Erez Hadas-Sonnenschein, Sr. Product Manager Witali Stohler, Datawarehouse & BI Specialist

clipkit GmbH

Video Syndication – The Possibilities

Content – Partner Overview News Sports Cars/motor Business/finances Music Gaming Cinema Cooking/food Lifestyle/fashion Traveling Computer/mobile Fitness/wellness Knowledge/hobby entertaintment

clipkit Player – Analytics (Metrics) Full Screen Category Playlist Pos. Play / Pause Progress Pos.

Mute / Unmute

Volume

clipkit Player – Analytics (Metrics) Location (Country, City) Language Browser Operating System Video Id Publisher URL Etc…

First Implementation (Expensive and Slow) •  designed in starting days •  not calculated to such amount of data •  slow copy process from S3 to DB (PHP application old architecture) •  fix EC2 price (expensive to support peak hours) •  PostgreSQL scalability limitations •  sometimes the copy process was so slow that the delay was ~3 days.

Analytics / Metrics (Requests Graph)

Analytics / Metrics (Numbers) •  ~ 6,000,000 New Entries per day •  ~ 1,000 Requests per second (Peak Hours) •  ~ 25 Requests per second (Off-peak Hours) 4000% Requests Growth during the day.

Second Implementation (Expensive and Slow)

•  Inserting only for one (big) Table •  The copy command only works for whole tables •  The minimum delay was one day •  Our solution have increase the provisioned throughput and that was expensive

NO REAL-TIME DATA

Third Implementation (Cheap and Fast)

Third Implementation – Dynamo DB •  Java SDK AmazonDynamoDBAsyncClient (Fire and Go) •  Easy to Create and Delete Tables •  Write Latency ~5ms •  Throughput auto scale with Dynamic DynamoDB •  One Table per day •  Continuous Iteration and copy to Redshift •  We just pay for what we use

Third Implementation – Redshift •  Standard PostgreSQL JDBC •  Fully managed by Amazon •  Automated Backups and Fast Restores

•  ~7000 Insert Items per Second •  Less than 2 seconds Queries to > 1 billion entries •  Real-time available data (maximum 1 minute delay)

Third Implementation – Conclusions •  Java Web Application –  Auto Scale (Off-Peak - 1 Small Instance)

•  Dynamo DB –  One Table per day (After copied it will be deleted) –  Auto Scale –  ~5 ms Put Item Latency

•  Redshift –  Insert ~7000 Items per second –  Fully managed

Thank You!

Amazon Redshift & Amazon DynamoDB

Amazon Redshift & Amazon DynamoDB

Suggest Documents

Amazon Aurora - Amazon AWS

Amazon

Amazon Aurora

Amazon Optimization - Amazon Product Sales â Amazon Sales App ...

Getting Started with Amazon EC2 and Amazon SQS - Amazon Web ...

Amazon EC2 Systems Manager

Amazon Mechanical Turk

Untitled - Amazon Simple Storage Service - Amazon AWS

Amazon EC2 Systems Manager

Amazon Kindle Publishing Guidelines

Understanding Chevron's âAmazon Chernobylâ - Amazon Watch

Understanding Chevron's âAmazon Chernobylâ - Amazon Watch

Units - Amazon Simple Storage Service (Amazon S3)

Amazon Tipping Point www.researchgate.net › publication › fulltext › Amazon-

amazon? - Imazon

Amazon mollies

Amazon Alexa; 2017 Amazon Echo User Guide for Amazon Echo Dot ...

Amazon Alexa; 2017 Amazon Echo User Guide for Amazon Echo Dot ...

Ruth-Ann Styron - Amazon Simple Storage Service (Amazon S3)

[PDF] Download Amazon Echo: Amazon Echo ... - Google Sites

Zadara Storage - Amazon Simple Storage Service - Amazon Web ...

[PDF] Download Amazon FBA: Amazon FBA Guide - Google Sites

History of APIs - Amazon Simple Storage Service (S3) - Amazon Web ...

(WuaD~]] Download 'How to Use Amazon Tap; Amazon Tap Manual ...

Amazon Redshift & Amazon DynamoDB