Lessons Learned from Running Serverless Workloads on ... - schd.ws

1 downloads 125 Views 535KB Size Report
Planning to write blog post about the experience of .... AWS Lambda, Azure Funcjons, Google Cloud ... Overview: h ps://d
Lessons Learned from Running Serverless Workloads on Mesos Diana Arroyo & Alek Slominski IBM Research

Serverless: Quo Vadis? •  We hope our serverless workload generator is useful tool for anybody looking on serverless services and want to compare them in more depth. •  Planning to write blog post about the experience of serverless workload generaIon and benchmarking and open source the benchmark.

If you have some quesIons, feedback, or want to tell me where I am wrong? Aleksander Slominski @aslom

Overview •  •  •  • 

Serverless workload characterisIcs GeneraIng serverless workloads Results Lessons learned

Serverless Workload CharacterisIcs •  Serverless workloads can require thousands of concurrent short lived containers to be created and destroyed in milliseconds: •  Container aka AcIon aka FuncIon aka …. –  Depends of servless service, framework, ...

•  Required operaIons: –  Start lot of acIons (short lived containers) –  Generate work: send request, generate response, and repeat –  AcIons run for some Ime to allow for reuse (cold vs. hot)

Serverless Workload Benchmark Goals •  Simulate lifecycle of serverless acIon as it takes part in serverless workload •  Minimal scenario: –  –  –  – 

Test serverless acIon start Ime Send N requests and validate response Pause / Resume acIon as needed Stop (kill) acIons

•  Scenario parameters: how many acIons are started, when, for how long etc. •  Workload runs mulIple scenario (in sequence, parallel etc.) •  Gather staIsIcs about workload execuIon –  Enough to learn how well test environments are handling high such scenarios?

Simple Scenario: WebSocket Test driver (overall workload) WebSocket

Scenario Instance 1

Scenario Instance 2



WebSocket

AcIon



AcIon



Scenario Instance S

Extended Workload Scenario Test driver (overall aggregate scenario 1)

Aggregate Scenario 1-1

Scenario Instance 1-1-2

AcIon



Scenario Instance 1-1-2

AcIon

Test driver (overall workload)



Scenario Instance S



Simple Setup Scenario Setup: Docker •  Start test driver container when it starts running it opens listening sockets and starts S scenario containers –  docker run driver –e setup_for_scenario_containers

•  Each scenario container connects using websocket to the driver and starts A acIon containers –  docker run scenario –e setup_for_acIon_containers –e WS_CALLBACK=ws://test_driver:port)

•  Each acIon container when started connects using websocket back to scenario container to ask for requests –  docker run hello-acIon WS_CALLBACK=ws://scenario:port



Simple Scenario ExecuIon ExecuIon: •  The test driver container a_er starIng S scenario containers waits on a websocket for results from scenario containers •  Each scenario container a_er starIng A acIon containers waits on a websocket from an acIon containers and then starts sending N requests and waits for responses •  Each acIon containers a_er starIng sends “ready” over websocket and then waits for requests, processes each request (sleep for M milliseconds) and sends response back End result: •  1 + S + S*A containers running (driver container + scenario containers + acIon containers) •  S *A * N requests processed •  Test duraIon: ideal Ime (with zero startup Ime): N * M milliseconds

Environment ConfiguraIon •  Swarm –  Version 1.2.4**

•  Mesos –  0.27** Compute Host

•  Docker –  1.10.2

Mesos Agent

Swarm

Mesos Master

Mesos Docker Mesos Docker Executor(s) Executor(s)

Docker Engine

Current Results •  Swarm Sync issues and deadlock –  PR2412: Fix double RLock in Mesos cluster

•  Tuning –  Mesos Master: decrease --allocaIon_interval –  Swarm Framework: decrease mesos.offerrefuseImeout

•  Custom Executor –  One executor per node vs. one executor per container to minimize startup costs

Current Results •  Results

–  Preliminary tesIng shows improved performance over Mesos Executor.

Whisk Requests per Second (Swarm Only)

40.00

400.00

35.00

350.00

30.00

300.00

25.00 20.00

Mesos Executor

15.00

Custom Executor

10.00 5.00

Time (seconds)

Time (seconds)

Whisk Requests per Second (Swarm +Mesos)

250.00 200.00 Swarm Only

150.00 100.00 50.00

0.00

0.00 0

200

400

600

Number of Containers

800

1000

0

200

400

600

800

Number of Containers

1000

Lessons learned •  Scaling becomes harder as size increases

–  We can run easily 100s but run into issues when running 1000 containers

•  Locking in Swarm

–  Only shows with this workload (different Iming of some operaIons in Swarm-Mesos leads to deadlocks ….)

•  LimitaIons in Docker engine

–  It seems we hit some limits on how many processes can be started per second –  Different in different versions of Docker

Reproducing results and other workloads •  We are making workload scripts available: –  hnps://github.com/aslom/serverless-workloadscripts –  AddiIonal measurement available to track individual acIons startup and scripts to visualize results with pyplot

•  The results are meaningful only in your environment and when you compare it to your workloads –  Scripts are easy to modify and we will accept PR

Future work •  OpImized Docker executor for Mesos •  Other changes to Mesos to bener handle serverless workloads? •  Test and compare other serverless opIons: –  AWS Lambda, Azure FuncIons, Google Cloud FuncIons, IBM OpenWhisk, …

OpenWhisk

•  Overview: hnps://developer.ibm.com/openwhisk/ •  Source code: hnps://github.com/openwhisk