DAMMP: A Distributed Actor Model for Mobile Platforms

20 downloads 48 Views 8MB Size Report
Sep 27, 2017 - to a MacBook Pro. (2010, Core i7, 8 GB DDR3 memory). Execution time of the AI to find the best move with a lookahead depth of six.
DAMMP: A Distributed Actor Model for Mobile Platforms Arghya Chatterjee

Ph.D. Student, Georgia Tech., USA, Research Collaborator, Oak Ridge National Lab, USA

ManLang’17

September 27th, 2017

ACKNOW LEDGEMENTS

Srdjan Milaković

Bing Xue

Zoran Budimlić

Vivek Sarkar

INTRODUCTION

INTRODUCTION ✤

Distributed Applications :
 → difficult to achieve scalability and programmability
 → require complex coordination and synchronization patterns


INTRODUCTION ✤

Distributed Applications :
 → difficult to achieve scalability and programmability
 → require complex coordination and synchronization patterns




Need for exploiting multi-core and multi-node parallelism
 → conceptual gap between programming models for both
 → multi-node → multiple phones


INTRODUCTION ✤

Distributed Applications :
 → difficult to achieve scalability and programmability
 → require complex coordination and synchronization patterns




Need for exploiting multi-core and multi-node parallelism
 → conceptual gap between programming models for both
 → multi-node → multiple phones




Use of Actor model → Less overhead & deadlock freedom

CLUSTER

FACIAL RECOGNITION

FACIAL RECOGNITION

FACIAL RECOGNITION



Hurricane Harvey (August 27th)



As high as 80 cell towers down



~16 Emergency services (911) call centers affected



Hurricane Harvey (August 27th)



Hurricane Irma (September 12th)



As high as 80 cell towers down



As high as 90 cell towers down



~16 Emergency services (911) call centers affected

HJDS

Habanero Java Distributed Selector

(Cluster Model)

HJDS

DAMMP

Habanero Java Distributed Selector

Selector System Design (Android Platform)

(Cluster Model)

HJDS

DAMMP

Habanero Java Distributed Selector

Selector System Design (Android Platform)

(Cluster Model)

CONNECTIVITY

Communication Pattern

HJDS

DAMMP

Habanero Java Distributed Selector

Selector System Design (Android Platform)

(Cluster Model)

CONNECTIVITY

Communication Pattern

DYNAMIC

Dynamic Joining and Leaving of Devices

HJDS

DAMMP

Habanero Java Distributed Selector

Selector System Design (Android Platform)

(Cluster Model)

CONNECTIVITY

Communication Pattern

DYNAMIC

EVALUATION

Dynamic Joining and Leaving of Devices

Evaluation on Nexus 5’s and Nexus 4’s
 
 Performance & Energy

CONTRIBUTIONS

CONTRIBUTIONS ✤

Actor / Selector model 
 → cross platform runtime system (clusters / mobile devices)


CONTRIBUTIONS ✤

Actor / Selector model 
 → cross platform runtime system (clusters / mobile devices)




Changes in system topology in the network
 → dynamic joining and dynamic leaving of devices


CONTRIBUTIONS ✤

Actor / Selector model 
 → cross platform runtime system (clusters / mobile devices)




Changes in system topology in the network
 → dynamic joining and dynamic leaving of devices




Standalone and seamless offload model 
 → computation offloading to other powerful handheld devices

ACTOR / SELECTOR MOD EL

ACTOR / SELECTOR MOD EL

ACTOR 
 MODEL Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12

ACTOR / SELECTOR MOD EL ✤

Pros: ✤

Asynchronous message passing



Data isolation



Inherently concurrent

ACTOR 
 MODEL Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12

ACTOR / SELECTOR MOD EL ✤



ACTOR 
 MODEL

Pros: ✤

Asynchronous message passing



Data isolation



Inherently concurrent

Cons: ✤

Ordering of messages are not guaranteed



Message filtering — inefficient to implement

Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12

ACTOR / SELECTOR MOD EL ✤

Cons: ✤

Ordering of messages are not guaranteed



Message filtering — inefficient to implement

Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12

ACTOR / SELECTOR MOD EL ✤

Cons (Solved): ✤

Ordering of messages are not guaranteed



Message filtering — inefficient to implement

SELECTOR 
 MODEL S. Imam and V. Sarkar, Selectors : Actors with multiple guarded mailboxes, AGERE’14

BACKGROU ND: 
 CLUSTER MODEL

A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16

BACKGROU ND: 
 CLUSTER MODEL ✤

High-level programming model ๏

Habanero Java Distributed Selectors (HJDS)



Location transparency of the programming model 


A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16

BACKGROU ND: 
 CLUSTER MODEL ✤



High-level programming model ๏

Habanero Java Distributed Selectors (HJDS)



Location transparency of the programming model 


Used Selectors — Unified programming model for both shared & distributed multi-node execution


A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16

BACKGROU ND: 
 CLUSTER MODEL ✤





High-level programming model ๏

Habanero Java Distributed Selectors (HJDS)



Location transparency of the programming model 


Used Selectors — Unified programming model for both shared & distributed multi-node execution
 Runtime provides ๏

Automated system bootstrap (User agnostic)



Distributed global termination (Detects when system is quiescent)

A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16

SYSTEM D ESIGN: 
 ANDROID PLATFORM

SYSTEM D ESIGN: 
 ANDROID PLATFORM

SYSTEM D ESIGN: 
 ANDROID PLATFORM

SYSTEM D ESIGN: 
 ANDROID PLATFORM

Place 2 Place 3

Place n

COMMU NICATION LAYER :
 WI-FI DIRECT COMMUNICATION

COMMU NICATION LAYER :
 WI-FI DIRECT COMMUNICATION ✤

Devices with Wi-Fi capabilities — can communicate by forming P2P groups


COMMU NICATION LAYER :
 WI-FI DIRECT COMMUNICATION ✤

Devices with Wi-Fi capabilities — can communicate by forming P2P groups




Connect devices — even if different manufacturers 


COMMU NICATION LAYER :
 WI-FI DIRECT COMMUNICATION ✤

Devices with Wi-Fi capabilities — can communicate by forming P2P groups




Connect devices — even if different manufacturers 




Peer to Peer group establishment (discovery phase): ๏

Group Owner (GO) — Acts as the Soft AP



Group Member (GM)

COMMU NICATION LAYER :
 WI-FI DIRECT COMMUNICATION

A DDRESSING: RECONFIGURATION CHALLENGES

A DDRESSING: RECONFIGURATION CHALLENGES ✤

Extended cluster based implementation 
 → user level control of network changes


A DDRESSING: RECONFIGURATION CHALLENGES ✤

Extended cluster based implementation 
 → user level control of network changes




Allows: Dynamic joining and leaving of devices


A DDRESSING: RECONFIGURATION CHALLENGES ✤

Extended cluster based implementation 
 → user level control of network changes




Allows: Dynamic joining and leaving of devices




Uses: Publish-subscribe model to extract network level changes
 → Node (Phone) joins or drops the ad-hoc network

RECONFIGURATION :
 DYNAMIC JOINING OF DEVICES

Group
 Owner

Group
 Members

New
 Device

RECONFIGURATION :
 DYNAMIC JOINING OF DEVICES

Group
 Owner



Group
 Members

New Device tries to join the network

New
 Device

RECONFIGURATION :
 DYNAMIC JOINING OF DEVICES

Group
 Owner



Group
 Members

New
 Device

New Device connects to the group owner (GO)

RECONFIGURATION :
 DYNAMIC JOINING OF DEVICES

Group
 Owner



Group
 Members

New
 Device

Information about the network — all connected devices

RECONFIGURATION :
 DYNAMIC JOINING OF DEVICES

Group
 Owner



Group
 Members

New
 Device

New Device — Connects to all devices in network

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES ✤

Application level — might have to perform redundant computation


RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES ✤

Application level — might have to perform redundant computation




Runtime level — voluntarily leaves network or drops out (battery/out of range)

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES

RECONFIGURATION :
 DYNAMIC LEAVING OF DEVICES ✤

Worker Device: ๏



Master resends work


Master Device: ๏

Backup periodically



New Master selected



Computation resumes

SETUP: MOBILE PLATFORM

SETUP: MOBILE PLATFORM ✤

Extension to distributed selectors on heterogeneous handheld devices

SETUP: MOBILE PLATFORM ✤

Extension to distributed selectors on heterogeneous handheld devices



Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip



Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip

SETUP: MOBILE PLATFORM ✤

Extension to distributed selectors on heterogeneous handheld devices



Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip



Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏

We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments

SETUP: MOBILE PLATFORM ✤

Extension to distributed selectors on heterogeneous handheld devices



Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip



Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏



We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments

SAVINA Actor Benchmark Suite

SETUP: MOBILE PLATFORM ✤

Extension to distributed selectors on heterogeneous handheld devices



Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip



Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏

We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments



SAVINA Actor Benchmark Suite



2 Benchmarks from SAVINA : 
 Trapezoidal (Message throughput)
 PiPrecision (Scaling)

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION

Approximates the integral function over an interval [a,b]:

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Using 4 Nexus 5’s to compute approximation with 100 million intervals




Communication overhead does not affect the 4x speedup until 10 4 to 105 work messages


EVA LUATION: 
 PI PRECISION

EVA LUATION: 
 PI PRECISION

EVA LUATION: 
 PI PRECISION



Strong scaling computing Pi to 15,000 decimal places




Number of messages increase on adding new devices




Nexus 4 devices are half as powerful than the Nexus 5’s

ONE MORE THING …

A DAPTIV E OFFLOA DING

A DAPTIV E OFFLOA DING ✤

Offload Computation to more powerful devices 
 → tablets / laptops / desktops


A DAPTIV E OFFLOA DING ✤

Offload Computation to more powerful devices 
 → tablets / laptops / desktops




Primarily using message passing and publish-subscribe:
 
 →Battery Status Message: Battery low/Battery okay 
 →Charging Status Message: Device connected / disconnected
 →Battery Level Message: Percentage of battery when it changes 
 →Temperature Message: Average temperature in ◦C 
 →Wifi Signal Strength: Average wifi signal strength

MITIGATING OV ERHEA D

MITIGATING OV ERHEA D ✤

Offloading partial computation based on (not better to offload always)


MITIGATING OV ERHEA D ✤

Offloading partial computation based on (not better to offload always)
 ๏

Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth

MITIGATING OV ERHEA D ✤

Offloading partial computation based on (not better to offload always)
 ๏

Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth



Application : depends on how much communication needed (phone - offloading device)

MITIGATING OV ERHEA D ✤

Offloading partial computation based on (not better to offload always)
 ๏

Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth



Application : depends on how much communication needed (phone - offloading device)



Offloading device : 
 — type of device (phone / tablet / laptop / desktop)
 — battery level / temperature

EVA LUATION: OTHELLO / REVERSI

EVA LUATION: OTHELLO / REVERSI

EVA LUATION: OTHELLO / REVERSI

EVA LUATION: OTHELLO / REVERSI



Used a Nexus 5 and a MacBook (offload device) to play the game

EVA LUATION: OTHELLO / REVERSI



Used a Nexus 5 and a MacBook (offload device) to play the game



Mimic one player as human / other as an AI

EVA LUATION: OTHELLO / REVERSI



Used a Nexus 5 and a MacBook (offload device) to play the game



Mimic one player as human / other as an AI



AI ‘can’ look-ahead up-to SIX steps to decide it’s next move

EVA LUATION: OTHELLO / REVERSI

EVA LUATION: OTHELLO / REVERSI

Execution time of the AI to find the best move with a lookahead depth of six.

EVA LUATION: OTHELLO / REVERSI ✤

Execution time of the AI to find the best move with a lookahead depth of six.

Blue bar — without offloading on Nexus 5


EVA LUATION: OTHELLO / REVERSI

Execution time of the AI to find the best move with a lookahead depth of six.



Blue bar — without offloading on Nexus 5




Green bar — with offloading to a MacBook Pro 
 (2010, Core i7, 8 GB DDR3 memory) 


EVA LUATION: OTHELLO / REVERSI

Execution time of the AI to find the best move with a lookahead depth of six.



Blue bar — without offloading on Nexus 5




Green bar — with offloading to a MacBook Pro 
 (2010, Core i7, 8 GB DDR3 memory) 




Threshold of 3 & 2 — offloading and communication overhead dominates

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Throughput — 40000 points per sec.

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Throughput — 40000 points per sec.



Temperature — 11 on-device sensors

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Throughput — 40000 points per sec.



Temperature — 11 on-device sensors

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Throughput — 40000 points per sec.



Temperature — 11 on-device sensors

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Throughput — 40000 points per sec.



Temperature — 11 on-device sensors



Offloaded computation to a MacBook Pro

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Offloaded computation to a MacBook Pro



Throughput — 40000 points per sec.



Automatic Offload — Temp. reaches 55C



Temperature — 11 on-device sensors

EVA LUATION: 
 TRAPEZOIDAL APPROXIMATION



Execution on a single Nexus 5 device



Offloaded computation to a MacBook Pro



Throughput — 40000 points per sec.



Automatic Offload — Temp. reaches 55C



Temperature — 11 on-device sensors



Throughput

; Temperature

CONCLUSION & FUTURE WORK

CONCLUSION & FUTURE WORK ✤

Novel and high level programming model — address Distribution and Reconfiguration

CONCLUSION & FUTURE WORK ✤

Novel and high level programming model — address Distribution and Reconfiguration



Android Platform : ๏

Dynamic joining and leaving of devices



Seamless offloading model (tablets, servers)

CONCLUSION & FUTURE WORK ✤

Novel and high level programming model — address Distribution and Reconfiguration



Android Platform :





Dynamic joining and leaving of devices



Seamless offloading model (tablets, servers)

Future / Ongoing Work: ๏

Real world applications (partial computation offloading)



Runtime analysis — when to offload an application

CONCLUSION & FUTURE WORK ✤

Novel and high level programming model — address Distribution and Reconfiguration



Android Platform :





Dynamic joining and leaving of devices



Seamless offloading model (tablets, servers)

Future / Ongoing Work: ๏

Real world applications (partial computation offloading)



Runtime analysis — when to offload an application

DAMMP 
 Distributed Actor Model for Mobile Platforms Arghya “Ronnie” Chatterjee Research Collaborator, CSMD, ORNL Ph.D. Student, Georgia Tech [email protected] September 27th, 2017

Suggest Documents