DAMMP: A Distributed Actor Model for Mobile Platforms Arghya Chatterjee
Ph.D. Student, Georgia Tech., USA, Research Collaborator, Oak Ridge National Lab, USA
ManLang’17
September 27th, 2017
ACKNOW LEDGEMENTS
Srdjan Milaković
Bing Xue
Zoran Budimlić
Vivek Sarkar
INTRODUCTION
INTRODUCTION ✤
Distributed Applications :
→ difficult to achieve scalability and programmability
→ require complex coordination and synchronization patterns
INTRODUCTION ✤
Distributed Applications :
→ difficult to achieve scalability and programmability
→ require complex coordination and synchronization patterns
✤
Need for exploiting multi-core and multi-node parallelism
→ conceptual gap between programming models for both
→ multi-node → multiple phones
INTRODUCTION ✤
Distributed Applications :
→ difficult to achieve scalability and programmability
→ require complex coordination and synchronization patterns
✤
Need for exploiting multi-core and multi-node parallelism
→ conceptual gap between programming models for both
→ multi-node → multiple phones
✤
Use of Actor model → Less overhead & deadlock freedom
CLUSTER
FACIAL RECOGNITION
FACIAL RECOGNITION
FACIAL RECOGNITION
✤
Hurricane Harvey (August 27th)
✤
As high as 80 cell towers down
✤
~16 Emergency services (911) call centers affected
✤
Hurricane Harvey (August 27th)
✤
Hurricane Irma (September 12th)
✤
As high as 80 cell towers down
✤
As high as 90 cell towers down
✤
~16 Emergency services (911) call centers affected
HJDS
Habanero Java Distributed Selector
(Cluster Model)
HJDS
DAMMP
Habanero Java Distributed Selector
Selector System Design (Android Platform)
(Cluster Model)
HJDS
DAMMP
Habanero Java Distributed Selector
Selector System Design (Android Platform)
(Cluster Model)
CONNECTIVITY
Communication Pattern
HJDS
DAMMP
Habanero Java Distributed Selector
Selector System Design (Android Platform)
(Cluster Model)
CONNECTIVITY
Communication Pattern
DYNAMIC
Dynamic Joining and Leaving of Devices
HJDS
DAMMP
Habanero Java Distributed Selector
Selector System Design (Android Platform)
(Cluster Model)
CONNECTIVITY
Communication Pattern
DYNAMIC
EVALUATION
Dynamic Joining and Leaving of Devices
Evaluation on Nexus 5’s and Nexus 4’s
Performance & Energy
CONTRIBUTIONS
CONTRIBUTIONS ✤
Actor / Selector model
→ cross platform runtime system (clusters / mobile devices)
CONTRIBUTIONS ✤
Actor / Selector model
→ cross platform runtime system (clusters / mobile devices)
✤
Changes in system topology in the network
→ dynamic joining and dynamic leaving of devices
CONTRIBUTIONS ✤
Actor / Selector model
→ cross platform runtime system (clusters / mobile devices)
✤
Changes in system topology in the network
→ dynamic joining and dynamic leaving of devices
✤
Standalone and seamless offload model
→ computation offloading to other powerful handheld devices
ACTOR / SELECTOR MOD EL
ACTOR / SELECTOR MOD EL
ACTOR
MODEL Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12
ACTOR / SELECTOR MOD EL ✤
Pros: ✤
Asynchronous message passing
✤
Data isolation
✤
Inherently concurrent
ACTOR
MODEL Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12
ACTOR / SELECTOR MOD EL ✤
✤
ACTOR
MODEL
Pros: ✤
Asynchronous message passing
✤
Data isolation
✤
Inherently concurrent
Cons: ✤
Ordering of messages are not guaranteed
✤
Message filtering — inefficient to implement
Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12
ACTOR / SELECTOR MOD EL ✤
Cons: ✤
Ordering of messages are not guaranteed
✤
Message filtering — inefficient to implement
Source : S. Imam and V. Sarkar. Integrating Task Parallelism with Actors, OOPSLA ’12
ACTOR / SELECTOR MOD EL ✤
Cons (Solved): ✤
Ordering of messages are not guaranteed
✤
Message filtering — inefficient to implement
SELECTOR
MODEL S. Imam and V. Sarkar, Selectors : Actors with multiple guarded mailboxes, AGERE’14
BACKGROU ND:
CLUSTER MODEL
A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16
BACKGROU ND:
CLUSTER MODEL ✤
High-level programming model ๏
Habanero Java Distributed Selectors (HJDS)
๏
Location transparency of the programming model
A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16
BACKGROU ND:
CLUSTER MODEL ✤
✤
High-level programming model ๏
Habanero Java Distributed Selectors (HJDS)
๏
Location transparency of the programming model
Used Selectors — Unified programming model for both shared & distributed multi-node execution
A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16
BACKGROU ND:
CLUSTER MODEL ✤
✤
✤
High-level programming model ๏
Habanero Java Distributed Selectors (HJDS)
๏
Location transparency of the programming model
Used Selectors — Unified programming model for both shared & distributed multi-node execution
Runtime provides ๏
Automated system bootstrap (User agnostic)
๏
Distributed global termination (Detects when system is quiescent)
A. Chatterjee, B. Gvoka, B. Xue, S. Imam, Z. Budimlić, V.Sarkar, Distributed Selectors Runtime System for Java Based Applications, PPPJ’16
SYSTEM D ESIGN:
ANDROID PLATFORM
SYSTEM D ESIGN:
ANDROID PLATFORM
SYSTEM D ESIGN:
ANDROID PLATFORM
SYSTEM D ESIGN:
ANDROID PLATFORM
Place 2 Place 3
Place n
COMMU NICATION LAYER :
WI-FI DIRECT COMMUNICATION
COMMU NICATION LAYER :
WI-FI DIRECT COMMUNICATION ✤
Devices with Wi-Fi capabilities — can communicate by forming P2P groups
COMMU NICATION LAYER :
WI-FI DIRECT COMMUNICATION ✤
Devices with Wi-Fi capabilities — can communicate by forming P2P groups
✤
Connect devices — even if different manufacturers
COMMU NICATION LAYER :
WI-FI DIRECT COMMUNICATION ✤
Devices with Wi-Fi capabilities — can communicate by forming P2P groups
✤
Connect devices — even if different manufacturers
✤
Peer to Peer group establishment (discovery phase): ๏
Group Owner (GO) — Acts as the Soft AP
๏
Group Member (GM)
COMMU NICATION LAYER :
WI-FI DIRECT COMMUNICATION
A DDRESSING: RECONFIGURATION CHALLENGES
A DDRESSING: RECONFIGURATION CHALLENGES ✤
Extended cluster based implementation
→ user level control of network changes
A DDRESSING: RECONFIGURATION CHALLENGES ✤
Extended cluster based implementation
→ user level control of network changes
✤
Allows: Dynamic joining and leaving of devices
A DDRESSING: RECONFIGURATION CHALLENGES ✤
Extended cluster based implementation
→ user level control of network changes
✤
Allows: Dynamic joining and leaving of devices
✤
Uses: Publish-subscribe model to extract network level changes
→ Node (Phone) joins or drops the ad-hoc network
RECONFIGURATION :
DYNAMIC JOINING OF DEVICES
Group
Owner
Group
Members
New
Device
RECONFIGURATION :
DYNAMIC JOINING OF DEVICES
Group
Owner
✤
Group
Members
New Device tries to join the network
New
Device
RECONFIGURATION :
DYNAMIC JOINING OF DEVICES
Group
Owner
✤
Group
Members
New
Device
New Device connects to the group owner (GO)
RECONFIGURATION :
DYNAMIC JOINING OF DEVICES
Group
Owner
✤
Group
Members
New
Device
Information about the network — all connected devices
RECONFIGURATION :
DYNAMIC JOINING OF DEVICES
Group
Owner
✤
Group
Members
New
Device
New Device — Connects to all devices in network
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES ✤
Application level — might have to perform redundant computation
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES ✤
Application level — might have to perform redundant computation
✤
Runtime level — voluntarily leaves network or drops out (battery/out of range)
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES
RECONFIGURATION :
DYNAMIC LEAVING OF DEVICES ✤
Worker Device: ๏
✤
Master resends work
Master Device: ๏
Backup periodically
๏
New Master selected
๏
Computation resumes
SETUP: MOBILE PLATFORM
SETUP: MOBILE PLATFORM ✤
Extension to distributed selectors on heterogeneous handheld devices
SETUP: MOBILE PLATFORM ✤
Extension to distributed selectors on heterogeneous handheld devices
✤
Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip
✤
Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip
SETUP: MOBILE PLATFORM ✤
Extension to distributed selectors on heterogeneous handheld devices
✤
Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip
✤
Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏
We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments
SETUP: MOBILE PLATFORM ✤
Extension to distributed selectors on heterogeneous handheld devices
✤
Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip
✤
Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏
✤
We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments
SAVINA Actor Benchmark Suite
SETUP: MOBILE PLATFORM ✤
Extension to distributed selectors on heterogeneous handheld devices
✤
Nexus 5’s — Quad-core 2260 MHz Krait 400 processor and Qualcomm Snapdragon 800 MSM8974 system chip
✤
Nexus 4’s — Quad-core 1500 MHz Krait processor and a Qualcomm Snap- dragon S4 Pro APQ8064 system chip ๏
We used up to 8 phones (5 Nexus 5’s and 3 Nexus 4’s) in our experiments
✤
SAVINA Actor Benchmark Suite
✤
2 Benchmarks from SAVINA :
Trapezoidal (Message throughput)
PiPrecision (Scaling)
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
Approximates the integral function over an interval [a,b]:
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
➤
Using 4 Nexus 5’s to compute approximation with 100 million intervals
➤
Communication overhead does not affect the 4x speedup until 10 4 to 105 work messages
EVA LUATION:
PI PRECISION
EVA LUATION:
PI PRECISION
EVA LUATION:
PI PRECISION
➤
Strong scaling computing Pi to 15,000 decimal places
➤
Number of messages increase on adding new devices
➤
Nexus 4 devices are half as powerful than the Nexus 5’s
ONE MORE THING …
A DAPTIV E OFFLOA DING
A DAPTIV E OFFLOA DING ✤
Offload Computation to more powerful devices
→ tablets / laptops / desktops
A DAPTIV E OFFLOA DING ✤
Offload Computation to more powerful devices
→ tablets / laptops / desktops
✤
Primarily using message passing and publish-subscribe:
→Battery Status Message: Battery low/Battery okay
→Charging Status Message: Device connected / disconnected
→Battery Level Message: Percentage of battery when it changes
→Temperature Message: Average temperature in ◦C
→Wifi Signal Strength: Average wifi signal strength
MITIGATING OV ERHEA D
MITIGATING OV ERHEA D ✤
Offloading partial computation based on (not better to offload always)
MITIGATING OV ERHEA D ✤
Offloading partial computation based on (not better to offload always)
๏
Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth
MITIGATING OV ERHEA D ✤
Offloading partial computation based on (not better to offload always)
๏
Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth
๏
Application : depends on how much communication needed (phone - offloading device)
MITIGATING OV ERHEA D ✤
Offloading partial computation based on (not better to offload always)
๏
Network bandwidth : user or runtime information — offload right-away or wait for better bandwidth
๏
Application : depends on how much communication needed (phone - offloading device)
๏
Offloading device :
— type of device (phone / tablet / laptop / desktop)
— battery level / temperature
EVA LUATION: OTHELLO / REVERSI
EVA LUATION: OTHELLO / REVERSI
EVA LUATION: OTHELLO / REVERSI
EVA LUATION: OTHELLO / REVERSI
✤
Used a Nexus 5 and a MacBook (offload device) to play the game
EVA LUATION: OTHELLO / REVERSI
✤
Used a Nexus 5 and a MacBook (offload device) to play the game
✤
Mimic one player as human / other as an AI
EVA LUATION: OTHELLO / REVERSI
✤
Used a Nexus 5 and a MacBook (offload device) to play the game
✤
Mimic one player as human / other as an AI
✤
AI ‘can’ look-ahead up-to SIX steps to decide it’s next move
EVA LUATION: OTHELLO / REVERSI
EVA LUATION: OTHELLO / REVERSI
Execution time of the AI to find the best move with a lookahead depth of six.
EVA LUATION: OTHELLO / REVERSI ✤
Execution time of the AI to find the best move with a lookahead depth of six.
Blue bar — without offloading on Nexus 5
EVA LUATION: OTHELLO / REVERSI
Execution time of the AI to find the best move with a lookahead depth of six.
✤
Blue bar — without offloading on Nexus 5
✤
Green bar — with offloading to a MacBook Pro
(2010, Core i7, 8 GB DDR3 memory)
EVA LUATION: OTHELLO / REVERSI
Execution time of the AI to find the best move with a lookahead depth of six.
✤
Blue bar — without offloading on Nexus 5
✤
Green bar — with offloading to a MacBook Pro
(2010, Core i7, 8 GB DDR3 memory)
✤
Threshold of 3 & 2 — offloading and communication overhead dominates
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Throughput — 40000 points per sec.
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Throughput — 40000 points per sec.
✤
Temperature — 11 on-device sensors
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Throughput — 40000 points per sec.
✤
Temperature — 11 on-device sensors
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Throughput — 40000 points per sec.
✤
Temperature — 11 on-device sensors
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Throughput — 40000 points per sec.
✤
Temperature — 11 on-device sensors
✤
Offloaded computation to a MacBook Pro
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Offloaded computation to a MacBook Pro
✤
Throughput — 40000 points per sec.
✤
Automatic Offload — Temp. reaches 55C
✤
Temperature — 11 on-device sensors
EVA LUATION:
TRAPEZOIDAL APPROXIMATION
✤
Execution on a single Nexus 5 device
✤
Offloaded computation to a MacBook Pro
✤
Throughput — 40000 points per sec.
✤
Automatic Offload — Temp. reaches 55C
✤
Temperature — 11 on-device sensors
✤
Throughput
; Temperature
CONCLUSION & FUTURE WORK
CONCLUSION & FUTURE WORK ✤
Novel and high level programming model — address Distribution and Reconfiguration
CONCLUSION & FUTURE WORK ✤
Novel and high level programming model — address Distribution and Reconfiguration
✤
Android Platform : ๏
Dynamic joining and leaving of devices
๏
Seamless offloading model (tablets, servers)
CONCLUSION & FUTURE WORK ✤
Novel and high level programming model — address Distribution and Reconfiguration
✤
Android Platform :
✤
๏
Dynamic joining and leaving of devices
๏
Seamless offloading model (tablets, servers)
Future / Ongoing Work: ๏
Real world applications (partial computation offloading)
๏
Runtime analysis — when to offload an application
CONCLUSION & FUTURE WORK ✤
Novel and high level programming model — address Distribution and Reconfiguration
✤
Android Platform :
✤
๏
Dynamic joining and leaving of devices
๏
Seamless offloading model (tablets, servers)
Future / Ongoing Work: ๏
Real world applications (partial computation offloading)
๏
Runtime analysis — when to offload an application
DAMMP
Distributed Actor Model for Mobile Platforms Arghya “Ronnie” Chatterjee Research Collaborator, CSMD, ORNL Ph.D. Student, Georgia Tech
[email protected] September 27th, 2017