Solving the Scalability Challenge from the Groud Up - Data-Intensive ...

52 downloads 3970 Views 2MB Size Report
Justin Y. Shi | [email protected]. Data Cloud 2015. 6th International Workshop on Data Intensive Computing in the Clouds ... Zero Data Loss. Requirement 2:.
Solving the Scalability Challenge from the Ground Up Justin Y. Shi | [email protected]

Data Cloud 2015 6th International Workshop on Data Intensive Computing in the Clouds

Scalability Dilemma = Difficulty to Expand

Performance v.s. Reliability

Scalability Dilemma for HPC Applications -> An Oxy Moron?

Lessons from the History Packet-Switching Network • • • •

Slower Cheap to Maintain Infinitely Scalable (Data decoupled from switches)

Today All Internet Traffic are Packet Switched

Circuit-Switching Network • • • •

Fast Expensive to Maintain Difficult to Scale (Dedicated switch to data at a time)

The Scalability Dilemma is … “the Dedicated Resource Syndrome” Fixed program/data -processor binding

Reliability

Performance

Decoupled ->Unbounded Growth

Data Clouds Are Harder to Protect Needs a Solution to CAP Theorem

Correctness of Distributed Program-Program Coordination – One Way to Look at This Mess… Requirement 1:

Zero Data Loss

Requirement 3:

Infinitely Scalable

Requirement 2:

Zero Single Point Failure

Program and Data Must be Decoupled from Hardware

Electronics are Less Reliable

Impossibility Theories

It is Hard to Wait Correctly

User’s Perspective

Bounded Wait

Unbounded Wait

Impossibility

Decoupling is Happening at SC15 … …

What About Data Intensive HPC Cloud?

Booth #299

100% Reliable Distributed Computing (when R > minimal survival set Rs)

Statistic Multiplexed Computing (SMC)

Application-Level Tuple Switching Network

Implementations

Synergy

Anka

Tuning Can Make SMC Faster Than MPI

Synchronous Replication in Realtime in Memory

Summary

CAP Should All be Satisfied

Zero Single Point Failure Decoupling is the Only Solution

Visit Booth #299 for More Details

Acknowledgements • Reported CI architecture research is supported in part by National Science Foundation (MRI Grant #CNS0958854) • DI architecture research is supported in part by Ben Franklin Technology Partners and private investors of Parallel Computers Technology Inc. • NCAR for Yellowstone benchmark effort

Questions?