DoubleDown Wins Big with Snowflake - Snowflake Computing

19 downloads 167 Views 188KB Size Report
marketing partners all make use of data analytics ... where we got into the problem of big data. ... important advantage
CASE STUDY

DoubleDown Wins Big with Snowflake

ABOUT DOUBLEDOWN INTERACTIVE DoubleDown Interactive is a leading provider of funto-play casino games on the internet. DoubleDown was founded in 2010 in Seattle, Washington, and is part of International Game Technology (NYSE: IGT). Its games are massively popular and available on Facebook, desktop and mobile platforms such as iOS and Android. Although most of its games are free to players, DoubleDown makes money from in-game purchases and working with advertising partners.

DOUBLEDOWN INTERACTIVE’S SCENARIO According to Rolfe Lindberg, Head of Business Intelligence at DoubleDown Interactive: “We have a lot of analytics projects underway with business analysts and data scientists for decision making purposes as well as for many internal departmental needs.” Game developers, customer support, marketing staff, customer experience and loyalty teams, and external marketing partners all make use of data analytics at DoubleDown. By understanding and drilling into their data, DoubleDown finds insights that influence game design, enable rigorous marketing campaign evaluation and management, improve understanding of player behavior, assess user experience, and uncover bugs and defects. Metrics based on game event data allow stakeholders to understand what players are doing during gaming sessions, which helps them evolve a particular game as well as create new and different games. In addition, DoubleDown also has several production processes that process data to manage user account balances and handle revenue recognition for its games.

“We had minimal configuration work to do with Snowflake; we did not have to worry about indexes or administration, because it’s a highly optimized SQL database already. Because the Snowflake data warehouse is truly elastic, we can increase and decrease compute power for different user needs that are temporary, with no changes to data or data locations.” Rolfe Lindberg Head of Business Intelligence DoubleDown Interactive

Performing these analyses requires bringing together data from multiple sources. Rolfe explains, “For our internal data we get bookings, user information, marketing campaigns and promotions. Separately, game event logs are generated when users go into our online casinos and play those games. We get this data from MySQL databases, the internal production databases and cloud-based game servers. Some of the operations data is collected from our Splunk system. We also get a lot of external third-party data— we have about 19 vendors who provide us data that needs to go into our data warehouse. That includes ad partner data from Facebook, AppLovin, and many other publishers.”

PREVIOUS ENVIRONMENT DoubleDown’s challenge was to take continuous data feeds from their games and integrate that with other data into a holistic representation of game activity, usability and trends. “When it came to our event log data, this is where we got into the problem of big data. Our game servers generate roughly 3.5 terabytes of data per day,“ says Rolfe. Previous DoubleDown data flow noSQL Database

Staging

Game Event Data Internal Data Third-party Data

EDW

Analysts & BI Tools

Cleanse, Normalize, Transform, Reprocess Errors

Integrating that data was complex—it required many sources with separate data flow paths and ETL transformations for each, in part because all the game event data is stored in large log files using JSON format. In addition to using Talend’s enterprise integration data suite to help them with ETL and data integration, DoubleDown also used a noSQL database, MongoDB, for processing the data. “The previous process was to get the data into a noSQL database, and then run a collection of noSQL DB collectors and aggregators. The data was then pulled into a staging area where it got cleaned, transformed, and conformed to the star schema, then it was then loaded into our pre-existing enterprise data warehouse.” says Rolfe. Once in the data warehouse, the data was used for analysis and reporting via both commercial tools including Tableau and a homegrown reporting dashboard used heavily across the company that was supported by MySQL. DoubleDown had latency, throughput, and reliability challenges with their data pipeline. They had hidden costs and risks due to the lack of reliability of their data pipeline and the amount of ETL transformations required. According to Rolfe, “There were a lot of challenges with our previous architecture because it took a really long time to process the event log. There were many times that we had to wait until 3pm the next day to get the data from the previous day. If one of the MongoDB clusters went down, we actually lost data.” DoubleDown also needed even more event data detail and more in-depth reporting and analytics to support

more complex ad hoc data science explorations. “We didn’t have any detailed game-level log data because the noSQL system would not scale to process the larger volume that was required. As a result, it was very difficult for us to go back and do any root cause analysis or find issues we observed in that data.”

FINDING A BETTER SOLUTION DoubleDown turned to the combination of the cloud and Snowflake’s cloud data warehouse for a better solution to host the computing and data flow for all operational and game event analysis data. This combination has given them increased scalability, lower infrastructure costs and higher agility in navigating new data flow and processing requirements, all of which helps enable them to stay ahead of their growth curve. In fact, within the next year they expect to be using 100% cloud IT infrastructure. Intrigued by Snowflake’s scalable cloud architecture and its ability to load and process JSON log data in its native form, DoubleDown decided to replace their MongoDB data store and related MapReduce processing with Snowflake. All previous MongoDB transformations and aggregations, plus several new ones, are now done inside Snowflake after loading their JSON game event data directly into Snowflake. According to Rolfe: “We now take in the data from Amazon Kinesis and load it into an Amazon S3 landing area. Once the data is available there, our Talend process runs every 5 minutes and then loads the files directly into an event log table in Snowflake, which makes all the JSON attributes queryable.”

DoubleDown data flow with Snowflake

Putting Snowflake in place was straightforward and happened quickly. “Snowflake seamlessly integrates between our file system and Amazon S3, and it was simple to integrate with our Talend data integration process. We brought it into production in just three

months—development took less than two man-months, and then we migrated the process in the third month, including all of the testing and QA,” says Rolfe.

SEEING RESULTS Using Snowflake has brought DoubleDown three important advantages: a faster, more reliable data pipeline; lower costs; and the flexibility to access new data using SQL.

Fast, reliable data pipeline According to Rolfe, “We have huge amounts of event data in JSON files that we need to process. Snowflake was able to manage this very efficiently—because Snowflake can load and flatten a JSON structure of 2.5 million elements in less than two minutes, we’re able to run and process new event data every five minutes. Our daily process now takes about 15 minutes to process a full day’s worth of data, whereas previously it would take more than 24 hours even while using lower granularity data”. Using Snowflake also helped DoubleDown eliminate the failures that had created delays waiting for data to be reprocessed. “Since we moved to our new data architecture, we have not had any data loss,” explained Rolfe. The improved reliability means they can now meet their SLAs by getting all game data results to analysts the same day they are generated.

Cost savings Rolfe goes on to say, “Snowflake is extremely cost effective—we have saved nearly 80% by implementing Snowflake.” One part of the cost savings was being able to stage and store data with higher granularity costeffectively in Amazon S3, something made possible by the Snowflake architecture. DoubleDown also saw cost savings because they no longer need to allocate resources to constantly monitor and fix their noSQL clusters, and they do not require specialized resources to write MapReduce jobs in order to transform their game event data.

Flexibility Snowflake’s ability to load JSON natively saves DoubleDown several steps in their ETL process. “Snowflake provided an upgrade for our transformation

processes that previously ran in MongoDB as MapReduce jobs”, says Rolfe. The ability to process their JSON data using SQL also provided significant benefits, allowing them to open up more data to both Tableau users and users of their internal dashboards. “Because Snowflake has the standard SQL that you would typically use in a relational database, our development pace was really rapid. Using Snowflake, we are able to quickly create queries that enable new features such as verification and validation of payout probabilities for various games and reconciling chips balances across all players.

BUSINESS IMPACT Previously, the lack of high granularity game event data meant whole sets of decisions were ignored and therefore the root causes of problem events were not understood nor acted upon. Using Snowflake, they can now perform root-cause analysis. Because of this, many future problems and software bugs can be solved and often avoided entirely which improves both productivity as well as cycle time speed for product delivery. Further, this positively impacts product quality, customer experience and customer lifetime value. By removing processing steps, not only do they achieve a performance advantage resulting in same-day analytic results, they also achieve a more reliable infrastructure with fewer maintenance requirements and the ability to build out new specialized ad hoc analyses for different stakeholders. Looking to the future, Rolfe says, “We have only scratched the surface with this new implementation. We have additional real-time reporting for game performance in development so that when a new game is launched, we can immediately see how the game is performing. We can put in alerts based on any data outliers and see where and why things are going wrong.” “Overall, the addition of Snowflake is a giant leap for DoubleDown and we expect many more good things to come out of this in the future,” says Rolfe.

Copyright © 2015 Snowflake Computing, Inc. All rights reserved. SNOWFLAKE COMPUTING, the Snowflake Logo, and SNOWFLAKE ELASTIC DATA WAREHOUSE are trademarks of Snowflake Computing, Inc. Rev CS_DDI_1_0_122015

www.snowflake.net | @SnowflakeDB