Optimizing Big Data with Search - IBM

11 downloads 272 Views 8MB Size Report
2010 IBM Corporation. Optimizing Big Data with Search. Mark Myers. Sr. Director of Product marketing. Vivisimo, An IBM C
Optimizing Big Data with Search Mark Myers Sr. Director of Product marketing Vivisimo, An IBM Company

© 2010 IBM Corporation

Topics

 Big Data opportunity  Evolution of search  Search, navigation and discovery for Big Data  Deployment scenarios  Case studies and examples

© 2010 IBM Corporation

What is Big Data?

What is Big Data? Definition varies … but you probably know it when you see it…

Data sets that are: •Too large (volume) •Too fast-moving (velocity) •Too diverse (variety) … for “conventional” data management tools Can be structured, unstructured, or semistructured

© 2010 IBM Corporation

“Data isData the New – Oil”Big Potential Big “Data is the new Oil. Data is just like crude. It’s valuable, but if unrefined it cannot really be used.” – Clive Humby, DunnHumby

“We have for the first time an economy based on a key resource [Information] that is not only renewable, but self-generating. Running out of it is not a problem, but drowning in it is.” – John Naisbitt © 2010 IBM Corporation 4

1 Cloud Big Data is one of five major trends in IT.

2 Mobile 5 3 Social 4 Consumerization

5

Big Data

The other four trends are feeding big data

Government is near the top in volume of data

Stored Data Per Agency (2009): 1,313 TB

Source: US Bureau of Labor Statistics; McKinsey Global Institute analysis

© 2010 IBM Corporation

Big Data applications in government

 Government  Intelligence analysis  Law enforcement and investigation  Cyber security / cyber warfare  Trend forecasting  Logistics  Physical sciences  … many more

© 2010 IBM Corporation

Many organizations are losing ground

Amount of data being produced by an organization

Percentage of data that can be exploited

© 2010 IBM Corporation

SEARCH, DISCOVERY AND NAVIGATION FOR BIG DATA

© 2010 IBM Corporation

We’ve seen this before … search evolves at each stage

Ubiquitous Search Mobile

Enterprise search Faceted Navigation Clustering & Discovery Filtering/alerting E-commerce Natural-language

Platforms

processing Embedded search Proprietary online databases Boolean full-text and proximity search First full-text search systems

Web search

c ar e S



h

New Analytics

rch a e S

Web, E-business and SOA Se

Big Data

a rc h

Big Data Platform

Transaction Systems

WebSphere Mainframe, IMS and CICS

196010

Time 1990-

2010© 2010 IBM Corporation

Search today is ubiquitous and rich with features to connect people and machines with information Keyword “Natural language”/ semantics Clustering and tag clouds Faceted navigation Virtual documents Recommendation Filtering / alerting Federation Autocomplete Speech

… Delivered across many devices and applications



… Portals

Web

Enterprise Apps

Mobile

Commerce

Media

Social © 2010 IBM Corporation

Challenges emerge when applied to an entire enterprise Internal Safety Documents Third-Party Research

Da er v r Se

Prior Work

se ba a t

Historical Abstracts

L SQ gs nt o eme Bl is g a n a k Wi nt M u me c o D e) xchang Email (E

Vivisimo Velocity

Critical Correspondence

Cloud Oracle Dom ain Int ran Sh et De areP oin sk to t p

CRM

ER

Federated Sources

P

Users

Competitive Intelligence

Commenting

Tagging

Rating

Shared Folders Collaboration

Social Tools Personal Documents Financial Mgt. System

© 2010 IBM Corporation

SEARCH, DISCOVERY AND NAVIGATION WITH BIG DATA

© 2010 IBM Corporation

Cost per byte increases with more refined applications

Data Warehouse & Analytics

Hadoop / Big Data Framework yte B er P st o C

lue a /V

Search

© 2010 IBM Corporation

Big data adds another layer of challenge

 Disk space has increased massively but speed to read / write has not; same with seek time – 1TB drive w/ 100MB/s = ~2.5 hours to read all data from disk  More hardware means greater chance a single piece will fail  Analytics need to be able to combine the data in some way; often require preprocessing and caching

© 2010 IBM Corporation

Tenets of Big Data Processing

 Distributed Processing – Ability to distribute and processing across a network of nodes, and re-assemble the results. Analysis takes place where the data is stored.  Fault Tolerance – Failure of a particular node should not bring down whole system; if a node fails and can be restored, it should be able to re-join the group activity without introducing inconsistencies.  Linear Scalability – Adding computing resources should increase speed and performance in a linear fashion.  Graceful Load Response – Increased load should not cause failure, but rather graceful decline in performance.  Elasticity of Resources – Readily expand or contract to match the workload at a given time.

Search platform needs to match these demands to function in a Big Data environment © 2010 IBM Corporation

Search Platform Architecture Application

Application

Application

Application Framework

User Profiles

Federated Sources

Authentication/ Authorization Query Transformation Personalization Display

Search Engine

Text Analytics Thesaurus Clustering Ontology Support Semantic Processing Entity Extraction Relevancy

CM, RM, DM

RDBMS

Subscriptions Feeds Web Results Other Apps

Meta Data Faceting Tagging Taxonomy Collaboration

Indexing Converting Crawling

Feeds

Web 2.0

Email

Web

CRM, ERP

File Systems

Connector Framework © 2010 IBM Corporation

Deployment/Integration Scenarios for Exploiting Big Data Velocity Platform

1. 2. 3. 4.

Rapid search, discovery and navigation Load data from enterprise applications into Big Data framework Index and search of Big Data analytics Leveraging Big Data Platform for bulk processing and analytics © 2010 IBM Corporation

Rapid search, discovery and navigation Application

Application

Application

Application Framework

User Profiles

Federated Sources

Authentication/ Authorization Query Transformation Personalization Display

Subscriptions Feeds Web Results

 Rapid, near real-time access  Immediate answers  Pinpoint results  Data fusion IBM Big Data Platform

Search Engine

Thesaurus Clustering Ontology Support Semantic Processing Entity Extraction Relevancy

CM, RM, DM

RDBMS

Meta Data Faceting Tagging Taxonomy Collaboration

Indexing Converting Crawling

Feeds

Web 2.0

Email

Web

CRM, ERP

Vivisimo Big Data Connectors

Text Analytics

Data

Data

Data

File Systems

Connector Framework © 2010 IBM Corporation

Load data from enterprise applications into Big Data framework Application

Application

Application

Application Framework

User Profiles

Federated Sources

Authentication/ Authorization Query Transformation Personalization Display

RDBMS

Meta Data Faceting Tagging Taxonomy Collaboration

Indexing Converting Crawling

Feeds

Web 2.0

Email

Web

CRM, ERP

Vivisimo Big Data Connectors

Thesaurus Clustering Ontology Support Semantic Processing Entity Extraction Relevancy

CM, RM, DM

IBM Big Data Platform Suscriptions Feeds Web Results

Search Engine

Text Analytics

 Delivers enterprise content into big data framework for analytics and fusion

Analytics & Conversion

Data

Analytics & Conversion

Data

Analytics & Conversion

Data

Meta Data

Meta Data

Meta Data

File Systems

Connector Framework © 2010 IBM Corporation

Index and search of Big Data analytics Application

Application

Application Framework

User Profiles

Federated Sources

Authentication/ Authorization Query Transformation Personalization Display

Thesaurus Clustering Ontology Support Semantic Processing Entity Extraction Relevancy

RDBMS

Meta Data Faceting Tagging Taxonomy Collaboration

Indexing Converting Crawling

Feeds

Web 2.0

Email

Web

CRM, ERP

IBM Big Data Platform Vivisimo Big Data Connectors

Search Engine

Text Analytics

CM, RM, DM

Suscriptions Feeds Web Results

 Ensures ability to access and use products of big data analytics in the future  Fusion of big data analytics with enterprise data

Analytics & Conversion

Data

Analytics & Conversion

Data

Analytics & Conversion

Data

File Systems

Connector Framework © 2010 IBM Corporation

Leveraging Big Data Platform for bulk processing and analytics Application

Application

Application

Application Framework

User Profiles

Federated Sources

Authentication/ Authorization Query Transformation Personalization Display

Subscriptions Feeds Web Results

 Leverage the framework for text analytics and metadata extraction  Bulk processing of enormous volumes  Fusion IBM Big Data Platform

Thesaurus Clustering Ontology Support Semantic Processing Entity Extraction Relevancy

CM, RM, DM

RDBMS

Meta Data Faceting BI Tagging Taxonomy Collaboration

Indexing Converting Crawling

Feeds

Web 2.0

Email

Web

Connector Framework

CRM, ERP

Vivisimo Big Data Connectors

Search Engine

Text Analytics

Analytics & Conversion

Data

Analytics & Conversion

Data

Meta Data

Analytics & Conversion

Data

Meta Data

Meta Data

File Systems © 2010 IBM Corporation

EXAMPLES AND CASE STUDIES © 2010 IBM Corporation

Federation across secure domains at massive scale

© 2010 IBM Corporation

Knowledge fusion and collaboration across more than 400,000 users

© 2010 IBM Corporation

Powerful social search to drive collaboration and knowledge-sharing

© 2010 IBM Corporation

Metadata Catalog

© 2010 IBM Corporation

Fusion of enterprise data and analytics – commercial

© 2010 IBM Corporation

nt o C

ed u in

© 2010 IBM Corporation

de 0 36

vie e gre

er m sto u c e h ft o w

© 2010 IBM Corporation

vie e re g de 0 36

t se s a n fa o w

© 2010 IBM Corporation

360 degree view of the citizen (conceptual prototype)

© 2010 IBM Corporation

Search across multiple silos

© 2010 IBM Corporation

National Archives and Records Administration – Electronic Records Administration  Challenge: create a single access point and rich discovery environment for the permanent records of the United States  Online Public Access prototype – Streamlined searching – Better results – Better presentation

© 2010 IBM Corporation

National Archives and Records Administration Projected Data Growth for Electronic Records Administration

© 2010 IBM Corporation

QUESTIONS & DISCUSSION

© 2010 IBM Corporation

Suggest Documents