Implementing Graph Transformations in the Bulk ...

11 downloads 17334 Views 6MB Size Report
Communication & Modification: During the vertex computation, vertices can send messages to other vertices and apply modifications. Effects are visible in next ...
Implementing Graph Transformations in the Bulk Synchronous Parallel Model Christian Krause (SAP), Matthias Tichy (Göteborgs Uni.), Holger Giese (HPI) Fundamental Approaches to Software Engineering (FASE) 2014

Public

Outline

1.

Motivation & Background

2.

Graph Transformations

3.

Bulk Synchronous Parallel

4.

Giraph Code Generator in Henshin

5.

Benchmarks

6.

Future Work

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

2

Motivation: Analytics for Big Data Proteomics

SpringerLink

Find patterns in human proteome that indicate diseases, e.g. lung cancer. 160 Mio data points / sample (2.4GB); Large studies analyze up to 1K samples

Full-text indexing and search of 3.8 Mio documents  extraction of 2.8 Billion text entities; 460GB in main memory

Mass-Spectrum

Structured Data (XY-coordinates) © 2014 SAP AG or an SAP affiliate company. All rights reserved.

Unstructured Text Data Public

3

Transforming and Analyzing Big Graphs

Internet Movie Database (IMDb) •

IMDb data has a natural graph representation



Large (but not huge) graph: 3.2 Mio nodes



Highly connected: movies with > 1000 actors!!

Tasks: 1.

Find all couples of actors and/or actresses that starred together in at least 3 movies.

2.

Compute the average rank for each of these couples.

Simplified metamodel / typegraph for IMDb data.

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

4

Finding Couples using Graph Transformations (Henshin)

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

5

Bulk Synchronous Parallel (BSP) •

Bridging model for designing parallel algorithms (published ´90 by L. Valiant)



Popular implementations: Pregel ( Google), Apache Giraph (Facebook)



Graph computations as a series of supersteps, each consisting of:

Illustration of shortest path algorithm in BSP

1.

Master computation: Single computation executed centrally on a master node, mainly used for bookkeeping and orchestrating the vertex computations.

2.

Vertex computation: Concurrent computation executed locally for every active vertex of the graph. This part can be highly parallelized.

3.

Communication & Modification: During the vertex computation, vertices can send messages to other vertices and apply modifications. Effects are visible in next superstep!

4.

Barrier synchronization: Before the next superstep is started, the vertex computation and communication of the current one must be finished for all vertices.

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

6

From Graph Transformations to Bulk Synchronous Parallel

Transformation Specification

Code Generation

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Deployment

Execution

Public

7

Preparations: Search Plan Generation

2

4 3

1

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

5

Public

8

Generating Rule Code

Search plan

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

9

Generating Transformation Unit Code

Supported Transformation Units •

Sequential units, Independent units (nondeterministic choice), Priority units, Iterated units (for-loops)



No roll-backs on unsuccessful execution

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

10

Benchmarks

Sierpinski Benchmark

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

12

Movies Benchmark (2-Movie Version)

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

13

Movies Benchmark (2-Movie Version) @ Glenn Cluster

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

14

Movies-Benchmark (3-Movies): Native Henshin Interpreter

Henshin Benchmark At the time of paper writing: Henshin was not able to produce a single match even for a small subgraph!



Today: massive performance improvements achieved by:



• •

Execution Time in seconds

1000

Path contraints in match finder Parallelization of multi-rules

Size \ Threads 500K 1M 2M

0 2 4 12.606 7.175 4.57 58.24 31.863 19.812 399.686 210.832 126.758

Input graph size

100

6 8 3.313 2.757 13.938 12.198 90.717 71.525

500K 1M 10

2M

1 0

10 2.581 11.208 63.109

2

4 6 8 10 12 14 16 18 20 22 Number of threads ( = number of partitions)

12 2.32 10.726 56.481

14 16 2.512 2.368 10.119 9.473 54.758 54.413

18 2.284 9.207 52.075

20 2.216 8.744 50.885

24

22 24 2.207 2.215 8.534 8.552 50.426 48.128

Speed-up: 8.3x © 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

15

Conclusions and Future Work

Conclusions •

GTBSP Mapping allows us to transform graphs that do not fit into the main memory of a single machine!



Communication cost limits the performance of Giraph-based applications



Approach is not suited for real-time transformations / analysis. Here we would rather use single-machine, in-memory techniques (e.g. Henshin interpreter)

Future Work •

Improve pattern matcher in generated Giraph code (reduce number of supersteps)



Enable attribute calculations and constraints



Define suitable graph file format (maybe JSON instead of XMI) © 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

16

Thank you Contact information: Christian Krause SAP Innovation Center Konrad-Zuse-Ring 10 14469 Potsdam

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

18

© 2014 SAP AG oder ein SAP-Konzernunternehmen. Alle Rechte vorbehalten. Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durch SAP AG oder ein SAP-Konzernunternehmen nicht gestattet. SAP und andere in diesem Dokument erwähnte Produkte und Dienstleistungen von SAP sowie die dazugehörigen Logos sind Marken oder eingetragene Marken der SAP AG (oder von einem SAP-Konzernunternehmen) in Deutschland und verschiedenen anderen Ländern weltweit. Weitere Hinweise und Informationen zum Markenrecht finden Sie unter http://global.sap.com/corporate-de/legal/copyright/index.epx. Die von SAP AG oder deren Vertriebsfirmen angebotenen Softwareprodukte können Softwarekomponenten auch anderer Softwarehersteller enthalten. Produkte können länderspezifische Unterschiede aufweisen. Die vorliegenden Unterlagen werden von der SAP AG oder einem SAP-Konzernunternehmen bereitgestellt und dienen ausschließlich zu Informationszwecken. Die SAP AG oder ihre Konzernunternehmen übernehmen keinerlei Haftung oder Gewährleistung für Fehler oder Unvollständigkeiten in dieser Publikation. Die SAP AG oder ein SAP-Konzernunternehmen steht lediglich für Produkte und Dienstleistungen nach der Maßgabe ein, die in der Vereinbarung über die jeweiligen Produkte und Dienstleistungen ausdrücklich geregelt ist. Keine der hierin enthaltenen Informationen ist als zusätzliche Garantie zu interpretieren. Insbesondere sind die SAP AG oder ihre Konzernunternehmen in keiner Weise verpflichtet, in dieser Publikation oder einer zugehörigen Präsentation dargestellte Geschäftsabläufe zu verfolgen oder hierin wiedergegebene Funktionen zu entwickeln oder zu veröffentlichen. Diese Publikation oder eine zugehörige Präsentation, die Strategie und etwaige künftige Entwicklungen, Produkte und/oder Plattformen der SAP AG oder ihrer Konzernunternehmen können von der SAP AG oder ihren Konzernunternehmen jederzeit und ohne Angabe von Gründen unangekündigt geändert werden. Die in dieser Publikation enthaltenen Informationen stellen keine Zusage, kein Versprechen und keine rechtliche Verpflichtung zur Lieferung von Material, Code oder Funktionen dar. Sämtliche vorausschauenden Aussagen unterliegen unterschiedlichen Risiken und Unsicherheiten, durch die die tatsächlichen Ergebnisse von den Erwartungen abweichen können. Die vorausschauenden Aussagen geben die Sicht zu dem Zeitpunkt wieder, zu dem sie getätigt wurden. Dem Leser wird empfohlen, diesen Aussagen kein übertriebenes Vertrauen zu schenken und sich bei Kaufentscheidungen nicht auf sie zu stützen.

© 2014 SAP AG or an SAP affiliate company. All rights reserved.

Public

19

Suggest Documents