Graph Database Use Cases - GOTO Conference

7 downloads 175 Views 15MB Size Report
143. Max. 326. Big Data Tech Con .... from the ground up to manage graph data ... Ad-hoc. Analysis. Graph. Visualization
Graph Databases Use Cases

What’s a Graph?

Graph data model LOVES

name: “James”

age: 32

twitter: “@spam”

LIVES WITH LOVES

OWNS property type: “car”

DRIVES

brand: “Volvo”

model: “V70”

name: “Mary”

age: 35

Relational Tables

Join this way…

The Problem

• all JOINs are executed every time you query (traverse) the relationship

•  executing a JOIN means to search for a key in another table

•  with Indices executing a JOIN means to lookup a key

•  B-Tree Index: O(log(n))

•  more entries => more lookups => slower JOINs

143

Max 143

326

326 725

People

143

72

143

981

D Big

N

a

hC c e ta T

LN Q S o

on

ow

5

Attend

981

ta a D iot r a Ch

IO

Conferences

143

Max 143

326

326 725

143

72

143

981

Big

N

Tec a t Da

on C h

ow N L oSQ

5

981

iot r a Ch

IO a t Da

A Property Graph Nodes r e b em

m uid: MDM! name: Max

member

uid: BDTC! where: Burlinggame

uid: NSN! where: San Francisco

mem

ber

Relationships

uid: CDIO! where: Philadelphia

The Neo4j Secret Sauce

• Pointers instead of look-ups

• Do all your “Joining” on creation

• Spin spin spin

through this data structure

Graph Buzz!

The Neo4j Graph Database • Neo4j is the leading graph database in the world today

• Most widely deployed: 500,000+ downloads

• Largest ecosystem: active forums, code contributions, etc

• Most mature product: in development since 2000, in 24/7 production since 2003

Early Adopters of Graph Tech

Survival of the Fittest Evolution of Web Search

Pre-1999

WWW Indexing

Discrete Data

1999 - 2012

Google Invents PageRank

Connected Data

(Simple)

2012-?

Google Knowledge Graph, Facebook Graph Search

Connected Data

(Rich)

Open Source Example

http://maxdemarzi.com/2013/01/28/facebook-graphsearch-with-cypher-and-neo4j/

Survival of the Fittest Evolution of Online Recruiting

1999

Keyword Search

Discrete Data

2011-12

Social Discovery

Connected Data

Open Source Example

http://maxdemarzi.com/2012/10/18/matches-are-thenew-hotness/

Open Source Example

http://maxdemarzi.com/2012/10/18/matches-are-thenew-hotness/

Open Source Example

http://maxdemarzi.com/2012/10/18/matches-are-thenew-hotness/

Emergent Graph in Other Industries (Actual Neo4j Graphs)

Content Management & Access Control

Insurance Risk Analysis

Geo Routing

(Public Transport)

Network Cell Analysis

BioInformatics

Network Asset Management

Open Source Example

http://maxdemarzi.com/2013/03/18/permissionresolution-with-neo4j-part-1/

Emergent Graph in Other Industries (Actual Neo4j Graphs)

Web Browsing

Gene Sequencing

Portfolio Analytics

Mobile Social Application

Open Source Example

http://maxdemarzi.com/2013/04/19/match-making-withneo4j/

Curriculum Graph

Early Adopter Segments

(What we expected to happen - view from several years ago) Core Industries & Use Cases:

Network & Data Center Management

MDM

Social

Geo

Web / ISV

Finance & Insurance

Datacom / Telecom

Neo4j Adoption Snapshot

Select Commercial Customers (Community Users Not Included)

Core Industries & Use Cases:

Software

Financial Services

Core Industries Network & Data Center Management& Use Cases:

Telecommu Web Social, HR & Recruiting nications

Web / ISV

Health Care & Life Sciences

Finance & Insurance

Media & Publishing

Energy, Services, Automotive, Gov’t, Logistics, Education, Gaming, Other

Telecommunications

MDM / System of Record

Network & Data Center Management Social Accenture

Geo Identity & Access Mgmt Content Management

MDM Finance

Social

Recommend-ations

Geo BI, CRM, Impact Analysis, Fraud Detection, Resource Optimization, etc.

Energy

Aerospace

What Can You Do With Graphs?

MATCH (me:Person)-[:IS_FRIEND_OF]->(friend), (friend)-[:LIKES]->(restaurant), (restaurant)-[:LOCATED_IN]->(city:Location), (restaurant)-[:SERVES]->(cuisine:Cuisine) ! WHERE me.name = 'Philip' AND city.location='New York' AND cuisine.cuisine='Sushi' ! RETURN restaurant.name

http://maxdemarzi.com/?s=facebook

* Cypher query language example

Of course.. a graph is a graph is a graph What drugs will bind to protein X and not interact with drug Y?

Connected Query Performance

Connected Query Performance Query Response Time* =

f(graph density, graph size, query degree)

• Graph density (avg # rel’s / node)

• Graph size (total # of nodes in the graph)

• Query degree (# of hops in one’s query) RDBMS:
 >> exponential slowdown as each factor increases

Neo4j:
 >> Performance remains constant as graph size increases
 >> Performance slowdown is linear or better as density & degree increase

Connected Query Performance

Response Time

RDBMS vs. Native Graph Database

Degree: Thousands+

Size: Billions+

# Hops: Tens to Hundreds

Degree: < 3

Size: Thousands

# Hops: < 3

RDBMS

Neo4j

Connectedness of Data Set

Graph db performance ๏ a sample social graph

• with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O Database MySQL Neo4j Neo4j

# persons

query time

Graph db performance ๏ a sample social graph

• with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O Database MySQL Neo4j Neo4j

# persons 1,000

query time 2,000 ms

Graph db performance ๏ a sample social graph

• with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O Database

# persons

query time

MySQL

1,000

2,000 ms

Neo4j

1,000

2 ms

Neo4j

Graph db performance ๏ a sample social graph

• with ~1,000 persons

๏ average 50 friends per person

๏ pathExists(a,b) limited to depth 4

๏ caches warmed up to eliminate disk I/O Database

# persons

query time

MySQL

1,000

2,000 ms

Neo4j

1,000

2 ms

Neo4j

1,000,000

2 ms

*Additional Third Party Benchmark Available in Neo4j in Action: http://www.manning.com/partner/

The Zone of SQL Adequacy Social

Performance

Geo

Salary List

Graph Database

Optimal Comfort Zone

Network / Data Center

Management

ERP CRM

Master Data

Management

SQL database

Requirement of application

Connectedness of Data Set

Graph Technology Ecosystem

#1: Graph Local Queries

e.g. Recommendations, Friend-of-Friend, Shortest Path

What is a Graph Database “A graph database... is an online database management system with CRUD methods that expose a graph data model”1

• Two important properties:

• Native graph storage engine: written

from the ground up to manage graph data

• Native graph processing, including


index-free adjacency to facilitate traversals

1] Robinson, Webber, Eifrem. Graph Databases. O’Reilly, 2013. p. 5. ISBN-10: 1449356265

Graph Databases are Designed to: 1. Store inter-connected data

2. Make it easy to make sense of that data

3. Enable extreme-performance operations for:

• Discovery of connected data patterns

• Relatedness queries > depth 1

• Relatedness queries of arbitrary length

4. Make it easy to evolve the database

Top Reasons People Use 
 Graph Databases 1.Problems with Join performance.

2.Continuously evolving data set (often involves wide and sparse tables)

3.The Shape of the Domain is naturally a graph

4.Open-ended business requirements necessitating fast, iterative development.

Wait what?

New Users? Real Time Updates?

Graph Database Deployment Ad-hoc visual navigation & discovery

End User Graph-
 Dashboards
 &
 Ad-hoc
 Analysis


Graph Visualization

Application Other Databases

Reporting

Bulk Analytic Infrastructure

ETL

ETL Graph Database Cluster

(e.g. Graph Compute Engine)

Graph Mining & Aggregation Ad-Hoc
 Analysis Data Scientist

Data Storage &
 Business Rules Execution

Graph Dashboards The Power of Visualization

Fraud Detection & Money Laundering

IT Service Dependencies

Working with Graphs Case Studies & Working Examples

Cypher ASCII art Graph Patterns A

LOVES

B

MATCH (A) -[:LOVES]-> (B) WHERE A.name = "A" RETURN B as lover

Social Example

Practical Cypher Social Graph - Create CREATE ! ! (joe:Person {name:"Joe"}),! ! (bob:Person {name:"Bob"}),! ! (sally:Person {name:"Sally"}),! ! (anna:Person {name:"Anna"}),! ! (jim:Person {name:"Jim"}),! ! (mike:Person {name:"Mike"}),! ! (billy:Person {name:"Billy"}),! ! ! ! (joe)-[:KNOWS]->(bob),! ! (joe)-[:KNOWS]->(sally),! ! (bob)-[:KNOWS]->(sally),! ! (sally)-[:KNOWS]->(anna),! ! (anna)-[:KNOWS]->(jim),! ! (anna)-[:KNOWS]->(mike),! ! (jim)-[:KNOWS]->(mike),! ! (jim)-[:KNOWS]->(billy)

Practical Cypher Social Graph - Friends of Joe's Friends MATCH (person)-[:KNOWS]-(friend),! (friend)-[:KNOWS]-(foaf) ! WHERE person.name = "Joe"! AND NOT(person-[:KNOWS]-foaf)! RETURN foaf

foaf {name:"Anna"}

!

Practical Cypher Social Graph - Common Friends MATCH (person1)-[:KNOWS]-(friend),! (person2)-[:KNOWS]-(friend)! WHERE person1.name = "Joe" ! AND person2.name = "Sally"! RETURN friend! ! !

friend {name:"Bob"}

Practical Cypher Social Graph - Shortest Path MATCH path = shortestPath(! (person1)-[:KNOWS*..6]-(person2)! )! WHERE person1.name = "Joe" ! ! AND person2.name = "Billy"! RETURN path! !

path {start:"13759", ! nodes:["13759","13757","13756","13755","13753"],! length:4,! relationships:["101407","101409","101410","101413"],! end:"13753"}

Industry: Online Job Search Use case: Social / Recommendations Sausalito, CA

Background

T KS_A R O W

• Online jobs and career community, providing anonymized inside information to job seekers

Compan

Person KN

WS KNO

KN OW S

Person

OW S

Person

WORKS_AT

Compan

Business problem

Solution & Benefits

• Wanted to leverage known fact that most jobs are

• First-to-market with a product that let users find jobs

found through personal & professional connections

• Needed to rely on an existing source of social network data. Facebook was the ideal choice.

• End users needed to get instant gratification

• Aiming to have the best job search service, in a very competitive market

through their network of Facebook friends

• Job recommendations served real-time from Neo4j

• Individual Facebook graphs imported real-time into Neo4j

• Glassdoor now stores > 50% of the entire Facebook social graph

• Neo4j cluster has grown seamlessly, with new instances being brought online as graph size and load have increased

Neo Technology Confidential

Network Management Example

! Industry: Communications Use case: Network Management

Service

Paris, France

PE N DS

! ! Business !

Switch

DEP END

S_O

DEPENDS_ON

_ON NDS

LINKED

D KE N I

DEPE

Switch

N

Router L

DEPENDS_ON

Fiber Link

_O

N

DEP EN DS_ ON

Router

DEPENDS_ON

DE

D KE LIN

• Second largest communications company in France

• Part of Vivendi Group, partnering with Vodafone

N

DS_O DEPEN

Background

O S_ D N PE DE

N

Fiber Link

DEPENDS_ON

Fiber Link

DEPENDS_ON

Oceanfloor Cable

problem

• Infrastructure maintenance took one full week to plan, because of the need to model network impacts

• Needed rapid, automated “what if” analysis to ensure resilience during unplanned network outages

• Identify weaknesses in the network to uncover the need for additional redundancy

• Network information spread across > 30 systems, with daily changes to network infrastructure

• Business needs sometimes changed very rapidly

Solution & Benefits • Flexible network inventory management system, to support modeling, aggregation & troubleshooting

• Single source of truth (Neo4j) representing the entire network

• Dynamic system loads data from 30+ systems, and allows new applications to access network data

• Modeling efforts greatly reduced because of the near 1:1 mapping between the real world and the graph

• Flexible schema highly adaptable to changing business requirements

! Industry: Web/ISV, Communications Use case: Network Management Global (U.S., France)

Background • World’s largest provider of IT infrastructure, software & services

• HP’s Unified Correlation Analyzer (UCA) application is a key application inside HP’s OSS Assurance portfolio

• Carrier-class resource & service management, problem determination, root cause & service impact analysis

• Helps communications operators manage large, complex and fast changing networks

! ! Business

problem

• Use network topology information to identify root problems causes on the network

• Simplify alarm handling by human operators

• Automate handling of certain types of alarms Help operators respond rapidly to network issues

• Filter/group/eliminate redundant Network Management System alarms by event correlation

Solution & Benefits • Accelerated product development time

• Extremely fast querying of network topology

• Graph representation a perfect domain fit

• 24x7 carrier-grade reliability with Neo4j HA clustering

• Met objective in under 6 months

Practical Cypher Network Management - Create CREATE ! ! (crm {name:"CRM"}),! ! (dbvm {name:"Database VM"}),! ! (www {name:"Public Website"}),! ! (wwwvm {name:"Webserver VM"}),! ! (srv1 {name:"Server 1"}),! ! (san {name:"SAN"}),! ! (srv2 {name:"Server 2"}),! ! ! (crm)-[:DEPENDS_ON]->(dbvm),! ! (dbvm)-[:DEPENDS_ON]->(srv2),! ! (srv2)-[:DEPENDS_ON]->(san),! ! (www)-[:DEPENDS_ON]->(dbvm),! ! (www)-[:DEPENDS_ON]->(wwwvm),! ! (wwwvm)-[:DEPENDS_ON]->(srv1),! ! (srv1)-[:DEPENDS_ON]->(san)!

Practical Cypher Network Management - Impact Analysis // Server 1 Outage! MATCH (n)(downstream)! WHERE n.name = "Public Website"! RETURN downstream! !

downstream {name:"Database VM"} {name:"Server 2"} {name:"SAN"} {name:"Webserver VM"} {name:"Server 1"}

Practical Cypher Network Management - Statistics // Most depended on component! MATCH (n)