X - IBM

7 downloads 4828 Views 2MB Size Report
Simulation, Analytique, Big Data. Sylvie Boin .... B B. A. B. C. D. Risk analytics,. Sensitivity analysis,. Monte Carlo simulation. Support for Diverse Workloads & ...
Technical Computing : la nouvelle ère Simulation, Analytique, Big Data Sylvie Boin Technical Computing sales Manager Emmanuel Lecerf Platform Computing Sales

Technical Computing : les annonces IBM en un clin d’oeil

IBM Technical Computing portfolio : refresh of major products and solutions for mainstream technical computing Application Ready Solutions for

Power Systems

Auto/Aero, Life Sciences, Petroleum, Big Data TM

Engine for faster insights

Flex Systems

TM

Integrated hybrid system

System x® Blue

Gene®

Extremely fast, energy efficient supercomputer

Redefining x86 New

Storage System® High performance storage

Parallel Environment

3

NeXtScale System™ Hyperscale, Density, Flexibility

HPC Cloud

GPFS™ Storage Server

IBM Platform LSF® Family

IBM Platform™ Symphony Family

IBM Platform HPC

IBM Platform Cluster Manager

Big data storage

xCAT Intelligent Cluster™ GPFS™

Factory-integrated, interoperability-tested system with compute, storage, networking and cluster management

Application Ready Solutions : new enhancements Industry

Life Sciences

Auto/Aero Engineering

Petroleum

Life Sciences

Life Sciences Comp Chem.

Auto/Aero Engineering

Auto/Aero Engineering

Life Sciences

Big Data

ISV

Accelyrs

ANSYS

Schlumberger

CLC-Bio

Gaussian

MSC Software

Dassault Systemes

mpiBLAST

IBM SWG

Applis.

Accelrys Pipeline Pilot NGS collect.

ANSYS, FLUENT, Remote 3D

ECLIPSE, INTERSECT

CLC Genomics Server

Gaussian

MSC Nastran, Patran, SimManager

ABAQUS

MpiBLAST

InfoSp here BigInsi ghts

IBM platform

Flex System x240 V7000 Unified, Platform HPC, GPFS

Flex System x240 DS3500, Platform HPC, GPFS

Flex System x240 DS3500, Platform HPC, GPFS

Flex System x240 V7000 Unified, Platform HPC, GPFS

Flex System p260, p460, DS3500, Platform LSF, GPFS

Flex System x240, NeXtScale, GPFS, V7000 Unified, Platform HPC

System x3650 M4, Flex System x240, NeXtScale, GPFS, Platform HPC, MIO, integrated storage

NeXtScale, Platform HPC, GPFS

Power Linux R72, Platfor m PCM, Symph ony, GPFS, Int. Storag e

Live

Live

Live

Live

Live

90% go

Live

Adding Flex IVB,NeXtScal e Platform HPC

Adding Flex System p460, with LSF, PCM

Adding IVB, NeXtScale, Platform HPC

NeXtScal e, Platform HPC

Addin g NeXtS cale, Platfor m HPC

Status New October Content

Technical Computing : la nouvelle ère Simulation, Analytique, Big Data

83x 6,000,000 users on Twitter pushing out 300,000

500,000,000 users on Twitter pushing out 400,000,000

tweets per day

tweets per day

1333x

The characteristics of big data Cost efficiently processing the growing Volume 50x

2010

35 ZB

Responding to the increasing Velocity

30 Billion RFID sensors and counting

Collectively Analyzing the broadening Variety

80% of the worlds data is unstructured

2020

Establishing the Veracity of big data sources

1 in 3 business leaders don’t trust the information they use to make decisions

5 Big Data Patterns

Big Data Exploration Find, visualize, understand all big data to improve business knowledge

Enhanced 360o View of the Customer

Security/Intelligence Extension

Achieve a true unified view, incorporating internal and external sources

Lower risk, detect fraud and monitor cyber security in real-time

Operations Analysis

Data Warehouse Augmentation

Analyze a variety of machine data for improved business results

Integrate big data and data warehouse capabilities to increase operational efficiency

Hadoop MapReduce

De-facto “Big Data” standard • Pioneered at Google / Yahoo! • Framework for writing applications to rapidly process vast datasets • More cost effective than traditional data warehouse / BI infrastructure • Dramatic performance gains • Java based

• From our perspective: Just another distributed computing problem

Common Pain Points in Big Data Hadoop environment •

Limited HA features in the workload engine



Large performance overhead during job initiation



Resource silos associated with MapReduce applications •

Single purpose clusters - under utilized resources



Not adaptive



Scheduling engine lacks sophistication



No way to manage a shared services model tied to an SLA



Difficult to troubleshoot



Difficult to manage as the cluster scales



Lack of application life cycle / rolling upgrades



Scalability concerns



Lack of reporting tools

IBM’ Big Data Architecture Streams

   

Data in Motion

Video/Audio Network/Sensor Entity Analytics Predictive

Information Ingestion and Operational Information

Data at Rest

 Stream Processing  Data Integration  Master Data

Hadoop

Data in Many Forms

Intelligence Analysis

Real-time Analytics

Landing Area, Analytics Zone and Archive

Exploration, Integrated Warehouse, and Mart Zones    

Discovery Deep Reflection Operational Predictive

 Raw Data  Structured Data  Text Analytics  Data Mining  Entity Analytics  Machine Learning

Information Governance, Security and Business Continuity

IBM Platform Computing – shared infrastructure

Decision Management

BI and Predictive Analytics

Navigation and Discovery

The MapReduce Architecture 3 logical layers … 3 options in IBM

12

Applications or End User Access

IBM Software (BigInisght, SPSS, analytics…)

MapReduce Workload Management

Platform Symphony

Distributed Parallel File Systems / Data Storage

GPFS-FPO

IBM Platform Symphony “Analytics meets Infrastructure” Two distinct value propositions 1 Use a fast, distributed software infrastructure to accelerate and provide greater capacity for business critical analytic workloads

Analytics

Infrastructure (HW & SW)

2

Use sophisticated policies to optimize the use of infrastructure resources and ensure alignment to the goals of the business

As the scale of problems grow, an agile, distributed infrastructure becomes ever more critical to project success.

IBM Platform Symphony Architecture

COMPUTE INTENSIVE

IBM Platform Symphony Management Console

Low-latency Serviceoriented Application Middleware

DATA INTENSIVE

Enhanced Hadoop MapReduce Service Processing Framework

Instance Manager (SIM) IBM Platform Symphony Core

IBM Resource Orchestrator

IBM Platform Symphony Enterprise Reporting Framework

Different workloads demand different SLAs

“I need an updated counterparty credit risk analysis for the final earnings report by 2:00 pm”

“I wonder if teenagers in California still think red shoes are cool?”

Cluster Sprawl - Silos of underutilized, incompatible clusters A

Risk analytics, Sensitivity analysis, Monte Carlo simulation

B

Metadata generation, File classification, Batch analysis

D

C Search, Analysis, Concept Recognition

Data Intensive Apps

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

Cluster 1

Cluster 2

Cluster 3

Support for Diverse Workloads & Platforms Heterogeneous Application Support A

B

Risk analytics, Sensitivity analysis, Monte Carlo simulation

D

C

Metadata generation, File classification, Batch analysis

Search, Analysis, Concept Recognition

Data Intensive Apps

Workload Manager C

C

C

C

C

C

B

B

A

A

A

A

A

A

A

A

C

C

C

C

C

C

B

B

A

A

A

A

A

A

A

A

C

C

C

C

C

C

B

D

D

D

D

D

D

B

B

B

B

B

B

B

B

B

B

D

D

D

D

D

D

B

B

B

Resource Orchestration

Multiple Instances of MapReduce in a Single Cluster

Sophisticated, Policy Based, Resource Sharing    

Sharing while preserving ownership Near 100% sustained resource utilization – Dynamic Allocation Allocations flex during runtime to reflect business priorities Enables application level SLA management

Easy to Manage Sophisticated Workload ManagementInteract with Running Jobs

GPFS-FPO: Enterprise Class Replacement for HDFS GPFS 3.5

HDFS

Terasort: large reads

X

X

Hbase: small write

X

X

Metadata intensive

X

X

Posix compliance

X

Meta-data replication

X

Distributed name node

X

Snapshot

X

Asynchronous replication

X

Backup

X

Security & integrity

Access control lists

X

Ease of use

Policy based Ingest

X

Performance

Enterprise readiness

Protection & recovery

Open Source

IBM BigInsights

IBM

IBM InfoSphere BigInsights v2.1 Enterprise Edition Administration

Applications & Development

Visualization & Discovery

Big SQL BigSheets Dashboard & Visualization

JDBC

Apps

Text Analytics

Workflow

Pig & Jaql

MapReduce Hive

Admin Console Netezza

Monitoring DB2

Streams

Advanced Analytic Engines R

Text Processing Engine & Extractor Library)

Adaptive Algorithms

DataStage

Workload Optimization

Runtime

Guardium

Integrated Installer

Enhanced Security

Splittable Text Compression

Adaptive MapReduce

ZooKeeper

Oozie

Jaql

Flexible Scheduler

Lucene

Pig

H Catalog

Index

MapReduce IBM Platform Symphony Advanced Edition

Data Store

HBase

Hive

High Availability

Platform Computing Cognos

Management

HDFS

IBM GPFS-FPO

Flume

Security

Audit & History Lineage

File System

Integration

Sqoop

Benchmark: Short Running Tasks – Results Scheduler Performance 1600 1400 1200

Tasks/Sec

1000 800 600 400 200 0

Tasks per second

Hadoop 0.20.2

3,3

Hadoop

30,3

Symphony 6.1

1516

Symphony 6.1 can schedule ~50x more tasks per second than current Hadoop release Production clusters running workloads with large amounts of tasks greatly benefit in a fast scheduler Hadoop results taken from Hadoop World 2011 performance presentation, Lipcon & Chen

Benchmark: SWIM: Facebook 2010 Workload – Results SWIM: Facebook 2010 Workload Hadoop 1.0.1

Hadoop 1.0.1

Hadoop 1.0.1

Symphony 6.1

7.5x Faster

Symphony 6.1

Symphony 6.1

0

1000

2000

3000

4000

Seconds

5000

6000

7000

8000

Thank you!