Data Integration for Hadoop Data Lakes - Attunity

SOLUTION SHEET: HADOOP DATA INGEST

Data Integration for Hadoop Data Lakes Accelerate and simplify the loading and transformation of large-scale data lakes Data Lakes enable enterprises to process vast data volumes and address use cases that range from historical analysis to streaming analytics and machine learning. Whether on premises or in the cloud, Data Lakes provide an efficient, scalable and centralized foundation for modern analytics. But traditional tools for integrating this data are neither efficient nor scalable for Data Lake implementations. IT organizations often struggle to ingest data from hundreds or even thousands of sources that require custom coding and individual agents, tying up your most talented programmers with repetitive and error prone work. They also struggle to efficiently transform data into accurate, consistent and analytics-ready systems of record. Attunity can remove these obstacles and create a fully automated data pipeline with analytics ready data.

CUSTOMER SUCCESS “Using Attunity, we were able to create our strategic analytical platform, insights analytics, which allows us to make important operational decisions that benefit our staff and students.” JUERGEN STEGMAIR, LEAD FOR DATABASE ADMIN, UNIVERSITY OF NORTH TEXAS

Attunity Replicate is a simple, universal and real-time data ingestion solution that delivers data efficiently to any major Hadoop/Data Lake platform. With Attunity Replicate, database administrators and data architects gain:

• Ease of use Reduce manual coding with an automated GUI that quickly and easily configures, controls and monitors bulk loads as well as real-time updates. Programmable REST and .NET interfaces also enable you to monitor and manage replication through enterprise dashboards.

• High scale Ingest data from hundreds or thousands of databases. Design, execute and monitor Attunity Replicate processes across your entire environment through a single pane of glass. Minimize production impact and administrative burden by remotely identifying and copying source updates, with no need for disruptive agent software.

• Universal platform support Apply one replication process to any major RDBMS, data warehouse or mainframe source, on premises or in the cloud. Integrate with Cloudera, Hortonworks, MapR, as well as Kafka and cloud platforms such as Amazon EMR and Azure Event Hub.

• Real time updates Ingest incremental datasets efficiently and continuously with enterprise-class change data capture (CDC) technology. Optimize throughout and latency by applying transactions in order and real time, or processing varying change volumes in configurable batches.

• Transaction consistency Partition updates by time to ensure each transaction update is processed holistically for maximum consistency.

SOLUTION SHEET : HADOOP DATA INGEST

Attunity Compose for Hive: rapid, automated creation of analytics-ready data stores Once the data is ingested and landed in Hadoop, IT often still struggles to create usable analytics data stores. Traditional methods require Hadoopsavvy ETL programmers to manually code the various steps – including data transformation, the creation of Hive SQL structures, and reconciliation of data insertions, updates and deletions to avoid locking and disrupting users. The administrative burden of ensuring data is accurate and consistent can delay and even kill analytics projects. Attunity Compose for Hive automates the creation and loading Hadoop Hive structures, as well as the transformation of enterprise data within it. Our solution fully automates the pipeline of BI ready data into Hive, enabling you to automatically create both Operational Data Stores (ODS) and Historical Data Stores (HDS). And we leverage the latest innovations in Hadoop such as the new ACID Merge SQL capabilities, available today in Apache Hive (part of the Hortonworks 2.6 distribution), to automatically and efficiently process data insertions, updates and deletions. Attunity Replicate integrates with Attunity Compose for Hive to simplify and accelerate data ingestion, data landing, SQL schema creation, data transformation and ODS and HDS creation/updates.

CLOUD

SAP

RDBMS

ATTUNITY REPLICATE

ATTUNITY COMPOSE

INGEST & LAND

DATA WAREHOUSE

LAND

CREATE & UPDATE

With Attunity Compose for Hive, database administrators and data architects gain:

• Comprehensive automation Automatically generate Hive schemas for ODS and HDS targets, and all necessary data transformations are seamlessly applied.

• Continuous, non-disruptive data store updates Leverage the ANSI SQL compliant ACID MERGE operation to process data insertions, updates and deletions in a single pass.

• Improved operational visibility Support slow changing dimensions to understand change impact with a granular history of updates such as customer address changes, etc. within the Historical Data Store.

UPDATE

OPERATIONAL DATA STORE

CDC INITIAL COPY AND CDC

FILES

BATCH

TRANSFORM

HISTORICAL DATA STORE

MAINFRAME

ON PREM

BUSINESS BENEFITS Faster Data Lake operational readiness Reduced development time Reduced reliance on Hadoop skills Standard SQL access for data architects

Americas 866-288-8648 [email protected]

Europe / Middle East / Africa 44 (0) 1932-895024 [email protected]

Asia Pacific (852) 2756-9233 [email protected]

www.attunity.com © 2018 ATTUNITY LTD | ALL RIGHTS RESERVED | 20171123

Data Integration for Hadoop Data Lakes - Attunity

Data Integration for Hadoop Data Lakes - Attunity

Suggest Documents

The Data Management And Analytics Dilemma - Attunity

BIG DATA HADOOP

Hadoop and Big Data

Data Lakes - Talend

Data Lakes - Talend

Data Processing for Big Data Applications using Hadoop ... - ijarcce

Data Processing for Big Data Applications using Hadoop ... - ijarcce

Hadoop Data for Big Data Processing - IJITECH-International Journal ...

Data-Analytics-With-Hadoop-An-Introduction-For-Data-Scientists.pdf ...

Data Processing for Big Data Applications using Hadoop ... - IJARCCE

A Big Data Framework for Mining Sensor Data Using Hadoop

Data-Analytics-With-Hadoop-An-Introduction-For-Data-Scientists.pdf ...

Hadoop Big Data Analytics - Cisco

Big Data Storage Options for Hadoop

Big Data & Hadoop - Google Groups

Data Lakes for Data Science: Integrating Analytics Tools with Shared ...

Data Lakes for Data Science: Integrating Analytics Tools with Shared ...

Minimally-Supervised Attribute Fusion for Data Lakes

Minimally-Supervised Attribute Fusion for Data Lakes

Emerging Best Practices For Data Lakes - Pentaho

OPTIMISING DATA LAKES FOR FINANCIAL SERVICES

Attunity Solutions for AWS

Security and Compliance for Scale-Out Hadoop Data Lakes - Dell EMC

Security and Compliance for Scale-Out Hadoop Data Lakes - Dell EMC