Solving the Scalability Challenge from the Ground Up Justin Y. Shi |
[email protected]
Data Cloud 2015 6th International Workshop on Data Intensive Computing in the Clouds
Scalability Dilemma = Difficulty to Expand
Performance v.s. Reliability
Scalability Dilemma for HPC Applications -> An Oxy Moron?
Lessons from the History Packet-Switching Network • • • •
Slower Cheap to Maintain Infinitely Scalable (Data decoupled from switches)
Today All Internet Traffic are Packet Switched
Circuit-Switching Network • • • •
Fast Expensive to Maintain Difficult to Scale (Dedicated switch to data at a time)
The Scalability Dilemma is … “the Dedicated Resource Syndrome” Fixed program/data -processor binding
Reliability
Performance
Decoupled ->Unbounded Growth
Data Clouds Are Harder to Protect Needs a Solution to CAP Theorem
Correctness of Distributed Program-Program Coordination – One Way to Look at This Mess… Requirement 1:
Zero Data Loss
Requirement 3:
Infinitely Scalable
Requirement 2:
Zero Single Point Failure
Program and Data Must be Decoupled from Hardware
Electronics are Less Reliable
Impossibility Theories
It is Hard to Wait Correctly
User’s Perspective
Bounded Wait
Unbounded Wait
Impossibility
Decoupling is Happening at SC15 … …
What About Data Intensive HPC Cloud?
Booth #299
100% Reliable Distributed Computing (when R > minimal survival set Rs)
Statistic Multiplexed Computing (SMC)
Application-Level Tuple Switching Network
Implementations
Synergy
Anka
Tuning Can Make SMC Faster Than MPI
Synchronous Replication in Realtime in Memory
Summary
CAP Should All be Satisfied
Zero Single Point Failure Decoupling is the Only Solution
Visit Booth #299 for More Details
Acknowledgements • Reported CI architecture research is supported in part by National Science Foundation (MRI Grant #CNS0958854) • DI architecture research is supported in part by Ben Franklin Technology Partners and private investors of Parallel Computers Technology Inc. • NCAR for Yellowstone benchmark effort
Questions?