Document not found! Please try again

Motti Hadas Manager, Matrix Open Source Matan Zohar CTO, Matrix ...

4 downloads 150 Views 5MB Size Report
Analytics 1.0. Descriptive. Analytics 2.0. Diagnostic. Analytics 3.0*. Predictive & Prescriptive. *First referenced
Motti Hadas Manager, Matrix Open Source

Matan Zohar CTO, Matrix Open Source

11

22

Open Data

Open DevOps

Open Cloud

33

44

55

Open source culture

Collaboration *

Transparency

Shared problems are solved faster

Working together creates standardization

66

77

88

99

10 10

11 11

Data

Risk

Cost

Time

Volume Velocity Variety

Always On Secure Global

Open-Source Cloud Commodity

Iterative Agile Short Cycles

12 12

13 13

ONLY

Hortonworks: Big Data Hadoop for the Enterprise

%

Founded in 2011

100 1

ST

open source

HADOOP distribution to go public

TM

Apache Hadoop data platform

Page 6

IPO Fall 2014 (NASDAQ: HDP)

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

1100+ employees

2500+

17

countries

technology partners

14 14

A Connected Data Strategy Solves for All Data DATA IN MOTION

DATA AT REST

15 15

‫מידע מתכלה‬

ACTIONABL E INTELLIGEN CE

‫מידע היסטורי‬

Capture

Store

streaming data

data forever

Deliver

Access

perishable insights

a multi-tenant data lake

Combine

Model

new & old data

with artificial intelligence

DATA IN MOTION

DATA AT REST

16 16

Hortonworks Influences the Apache Community We Employ the Committers --one third of all committers to the Apache® Hadoop™ project, and a majority in other important projects

Our Committers Innovate and expand Open Enterprise Hadoop

We Influence the Hadoop Roadmap by communicating important requirements to the community through our leaders A PA C H E H A D O O P C O M M I T T E R S Page 7

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

17

17 17

Hive

HBase

Spark

Storm

Kafka

Solr

YARN (Resource Manager / Data Operating System) HDFS (Hadoop Distributed File System)

18

18 18

19 19

Or

Big Data ? 20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

Real data science, fast and simple.

#1 Data Science Platform #1 Open Source Data Science Platform #1 Marketplace for Expertise

31 31

Analytics 3.0*

Step Five Predictive & Prescriptive

Analytics 2.0 Diagnostic

Proactive

Analytics 1.0 Descriptive

Reactive

Passive Business Intelligence

Data Visualization

Data Science

Databases

Analytic Data Marts

Big Data

Sums & Counts

Drilldowns

Machine Learning

Historical Information

Current Insight

Human / Automated Actions *First

referenced by Thomas H Davenport, HBR December 2013

32 32

Affordability

When the unaffordable becomes affordable the impossible becomes possible

33 33

34 34

[email protected] 35 35

Enterprise Source System

Open Source Project

36 36

37 37

38 38

The RapidMiner Platform RapidMiner Market Place

RapidMiner Web Applications

Industry, Application & ML Extensions

RapidMiner Studio

RapidMiner Server

Visual Workflow Designer Guided Analytics & Reusable Processes Wealth of Predictive Algorithms and Functions

Collaborate Compute Secure Deploy + Maintain Serve

Python

R

SQL

In-memory

Hadoop

RapidMiner Radoop Compile + Execute in Hadoop

Business Applications

Data Visualization Open APIs

Spark

Embed results in all types of business apps & data visualization tools

39 39

RapidMiner Server Seamlessly publish

Operationalized Services

RapidMiner Server

Example: Publish analytics workflow as a web service for batch or real-time execution

Job Service Queue 2

Queue 1

Integrate with

Data Visualization

Job Agent

Job Agent

Job Agent

Job Container Job Container Process Job Container Process Process

Job Container Job Container Process Job Container Process Process

Job Container Job Container Process Job Container Process Process

Example: Bi-directional integration with Qlik or Tableau Dashboard

Integrate with

Business Applications Example: integrate web application with published model exposed as web services end point 40 40

Documents are Rich Data Structures { firstName: ‘Paul’, lastName: ‘Miller’, cell: 447557505611, city: ‘London’, location: { type : ‘Point’, coordinates : [45.123,47.232] }, Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, Fields can contain an array of sub-documents { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]

Fields

Typed field values

Fields can contain arrays

} 41 41

Data Modelling - Application in Mind Relational Model

Document Model

CATEGORY

TAG

Name URL

Name URL

ARTICLE Name Publish date URL Text

ARTICLE Name Publish date URL Text

COMMENT Text Date Author

COMMENT [] USER Name Email

Text Date Author

TAG [] USER Name Email

Name URL

CATEGORY [] Name URL

42 42

Connected Data Platforms

43 43

Nifi DataFlow Designer

44 44

220+ Processors, 30% Increase FTP SFTP

Hash

Encrypt

GeoEnrich

Merge

Tail

Scan

Extract

Evaluate

Replace

Duplicate

Execute

Translate

Split

Fetch

Convert

HL7 UDP XML

HTTP Email HTML

Route Text

Distribute Load

Route Content

Generate Table Fetch

AMQP

Route Context

Jolt Transform JSON

MQTT

Control Rate

Prioritized Delivery

Image Syslog

All Apache project logos are trademarks of the ASF and the respective projects.

45 45

Extreme Messaging: Apache Kafka

46 46

Extreme Messaging: Apache Kafka

47 47

Hive

HBase

Spark

Storm

Kafka

Solr

YARN (Resource Manager / Data Operating System) HDFS (Hadoop Distributed File System)

48 48

Fast engine for SQL querying on Spark formatted data

streaming data analytics

Machine Learning algorithms

Graph processing algorithms

49 49

50 50

51

51 51

[email protected] 52 52