Introduction: ... BFS Using Map-Reduce Framework( Hadoop Impl ):. ⢠Graph is represented as adjacency list. ⢠Key: N
Graph Algorithms Using Map-Reduce By Team-6
Introduction:
Breadth First Search • • •
Breadth-first search (BFS) is a general technique for traversing a graph. BFS on a graph with n vertices and m edges takes O(n + m ) time. Algorithm: – Input: Simple Connected directed graph with ‘n’ vertices and the node to be searched. – Output: if node is found “Yes” is printed and the corresponding path is displayed else “No” is printed.
BFS Using Map-Reduce Framework( Hadoop Impl ): •
Graph is represented as adjacency list. • Key: Node ID • Value: EDGES|DISTANCE_FROM_SOURCE|COLOR| • where EDGES is a comma delimited list of the ids of the nodes that are connected to this node. in the beginning, we do not know the distance and will use Integer.MAX_VALUE for marking "unknown". color tells us whether or not we've seen the node before, so this starts off as white. – Eg: Key Value 1 2,5|0|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE|
BFS Continued…(2) • Map Function: – For each gray node, the mappers emit a new gray node, with distance = distance + 1. they also then emit the input gray node, but colored black. (once a node has been exploded, we're done with it.) mappers also emit all non-gray nodes, with no change. so, the output of the first map iteration would be 1 2,5|0|BLACK| 2 NULL|1|GRAY| 5 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE| Note: When the mappers "explode" the gray nodes and create a new node for each edge, they do not know what to write for the edges of this new node - so they leave it blank
BFS Continued…(3) • Reduce Function: – Reducers, receives all data for a given key - in this case it means that they receive the data for all "copies" of each node. – for example, the reducer that receives the data for key = 2 gets the following list of values : 2 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| – the reducers job is to take all this data and construct a new node using • the non-null list of edges • the minimum distance • the darkest color
BFS Continued…(4) • Iterations – using this logic the output from our first iteration will be : 1 2,5,|0|BLACK 2 1,3,4,5,|1|GRAY 3 2,4,|Integer.MAX_VALUE|WHITE 4 2,3,5,|Integer.MAX_VALUE|WHITE 5 1,2,4,|1|GRAY – the second iteration uses this as the input and outputs : 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|GRAY 4 2,3,5,|2|GRAY 5 1,2,4,|1|BLACK – and the third iteration outputs: 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|BLACK 4 2,3,5,|2|BLACK 5 1,2,4,|1|BLACK – subsequent iterations will continue to print out the same output.
Issues Addressed.. • When should we terminate search? – Case1:if all the vertices are visited i.e colored black.(Here Output is “No”) – Case2:When mapper finds the destined node with color gray(i.e when mapper visits destined node for first time).(Here Output is “Yes”) • In above cases crux is usage of shared variables between mapper/reducer and main program.
Issues Addressed.. • Shared Variables are not supported in hadoop. We Addressed this issue by serializing mapper’s object to HDFS and deserializing those objects in main program. Further Enhancement: - Implementing BFS for disconnected graph.
Depth First Search •DFS algorithm traverses the graph by starting at root(some node selected as root) and explores as far as possible along each branch before backtracking.
Shortest Path • BFS Guarantees to find the shortest path to the destined node if it exists in the graph. • Idea: – Check if destined node exists in the graph by doing a search(either BFS or DFS). – If exists path is printed.(Path will be saved while doing search, that path will be printed if target node is found ,else saved path will be discarded).
Any Queries??
Thank You