Predicting Web Service Levels During VM Live Migrations - DMTF

1 downloads 0 Views 3MB Size Report
Predicting Web Service Levels During VM Live Migrations. Table of Contents. 1 Scenario. 2 Experiment. 3 Data overview. 4 Model selection. 5 Conclusions.
Predicting Web Service Levels During VM Live Migrations

Predicting Web Service Levels During VM Live Migrations 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and the Cloud Helmut Hlavacs, Thomas Treutner

24. 10. 2011

1 / 31

Predicting Web Service Levels During VM Live Migrations

Table of Contents

1

Scenario

2

Experiment

3

Data overview

4

Model selection

5

Conclusions

2 / 31

Predicting Web Service Levels During VM Live Migrations

Abstract

• VM live migration important for energy efficiency • Establish energy efficient target distribution of VMs • No perceivable downtime while live migrating?

• Live migration is resource intensive (iterative page copying) • Experiments: Influence on service levels while migrating? • Modelling: Predict service levels based on utilization?

3 / 31

Predicting Web Service Levels During VM Live Migrations Scenario

Scenario

Scenario • Virtualized data center, static consolidation (P2V)

• Provisioning for peak load, still bad energy efficiency • E.g., 9-5-cycles, very low utilization at night

• Live migration enables dynamic consolidation

• But: Seldom used, fear of possible side effects

• ⇒ Identify and quantify effects on (web) service levels

4 / 31

Predicting Web Service Levels During VM Live Migrations Scenario

Scenario

 





  





















 





5 / 31

Predicting Web Service Levels During VM Live Migrations Experiment

Experiment

Experiment • 2 servers, 1 VM, migrating forth and back

• VM disk image on central node (Gbit, open-iscsi)

• qemu-kvm VM: Linux, Apache2, PHP5, MediaWiki • SQL VM and load generation on an extra nodes

• Logging utilization of servers and VM, >100 variables

• Rise load from 50 to 600 concurrent virtual users and back

• Migrate every 15min, track response time of last 5min (SLA=1s)

6 / 31

Predicting Web Service Levels During VM Live Migrations Experiment

Experiment 

 

  



 

   



      



 7 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

CPU utilization

CPU Utilization (VM normed)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

20

40

60

80 100 120 140 160 180 200 Interval

Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on the VM’s CPU utilization 8 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Load average 6

Load5 Average

5 4 3 2 1 0 0

20

40

60

80 100 120 140 160 180 200 Interval

Abbildung: Effect of increasing/decreasing HTTP workload in equal steps on the VM’s UNIX load average. 9 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Markov Queues

• UNIX load as approximation to Q, the average number of jobs

in an M/M/1 queue, CPU utilization as ρ, the system utilization

• Q M/M/1 =

ρ2 1−ρ

• UNIX load exponentially averaged by definition, service times

are not necessarily exactly exponentially distributed, a systematic deviation from the theoretical solution is expectable.

• Q M/G/1 =

ρ2 1−ρ

·

(1+cB2 ) , 2

cB : coefficient of service time variation

• Exponentially distributed service times: cB2 = 1 • Deterministic service times: cB2 = 0

• Simple linear regression: cB2 = 0.42

10 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

8 Measured M/M/1 M/D/1 M/G/1, c2B=0.42

7 Load5 Average

6 5 4 3 2 1 0 0

0.1

0.2 0.3 0.4 0.5 0.6 0.7 CPU Utilization (VM normed)

0.8

0.9

Abbildung: Queuing issues are rapidly increasing at 60-70% CPU utilization. The measured data follow the trend of the theoretical solution very closely.

11 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Response Time

Response Time [ms]

25000 20000 15000 10000 5000 0 06

12

18

00

06 12 Hours

18

00

06

12

Abbildung: HTTP response times are massively increased during phases with higher workload. 12 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Service Level 1

Service Level Ratio

0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0

20

40

60

80 100 120 140 160 180 200 Interval

Abbildung: Service levels decrease significantly during phases of high workload. The maximum allowed response time is 1 s, which is extremely conservative.

13 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Load average vs Service Level 1

Measured Theoretic

0.98

Service Level

0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0

1

2 3 4 Load5 Average

5

6

Abbildung: Higher load average is an indicator for queuing issues which are causing decreased service level ratios. The measured data follow the theoretical solution closely for higher load values.

14 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Response Time: Low Load 4

Response Time [s]

3.5

HTTP Response Time Migration started Migration finished SLA Ratio Maximum allowed response time 1 0.99 0.98 0.97 0.96 0.95

3 2.5 2

SLA Ratio

4.5

1.5 1 0.5 0 Time

Abbildung: Influence of live migration on HTTP response time and service level during low load. 15 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Response Time: Medium Load 16 14

HTTP Response Time Migration started Migration finished SLA Ratio Maximum allowed response time

1 0.95

10

0.9

8

0.85

SLA Ratio

Response Time [s]

12

6 4 2 0 Time

Abbildung: Influence of Live Migration on HTTP Response Time and service level during medium load. 16 / 31

Predicting Web Service Levels During VM Live Migrations Data overview

Response Time: High Load 20

HTTP Response Time Migration started Migration finished SLA Ratio Maximum allowed response time

1 0.9 0.8

10

0.7

SLA Ratio

Response Time [s]

15

0.6 5 0.5

0 Time

Abbildung: Influence of Live Migration on HTTP Response Time and service level during high load. 17 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

Model selection Stepwise: AIC • Akaike Information Criterion (AIC), lower values are better • AIC = 2k − 2 ln(L)

• k: the number of independent parameters of a model • L: the maximum likelihood

• Stepwise model selection approach

• Find trade-off between number of parameters (model size) and

goodness of fit (model quality)

• Start with empty model, in each step a variable can be

added/removed, until the AIC can not be decreased further.

• Does not guarantee to find a global optimum, but typically gives a

near-optimum result within reasonable time.

18 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

Model selection

Exhaustive: LEAPS • Comparison with the stepwise AIC approach • Exhaustive all-subsets-regression

• Find best of all possible models for given range of model size

• Computationally intensive even if number of variables is limited

19 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

AIC -4200 -4250

AIC value in current step Final AIC value

AIC value

-4300 -4350 -4400 -4450 -4500 -4550 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Step

Abbildung: The AIC value quickly improves during the first steps and then slowly converges to the final value, thus allowing a trade-off between model complexity and precision.

20 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

AIC Standard Deviation of Residuals

0.018

Best value

0.017 0.016 0.015 0.014 0.013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Step

Abbildung: The residuals’ standard deviation quickly improves during the first steps and then slowly converges to the final value, when using stepwise AIC. 21 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

AIC

Abbildung: QQ-Plot of standardized residuals for the model produced by stepwise AIC. 22 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

AIC

Abbildung: There is no visible correlation between the residuals’ variance and the time series. 23 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

LEAPS 0.97

Best value

0.96

R2Adj

0.95 0.94 0.93 0.92 0.91 0.9 1

2

3

4

5

6 7 8 Model Size

9

10 11 12

Abbildung: Higher model complexity yields better model quality when using LEAPS, but the increase is rather small. 24 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

LEAPS Standard Deviation of Residuals

0.025

Best value

0.024 0.023 0.022 0.021 0.02 0.019 0.018 0.017 0.016 0.015 1

2

3

4

5

6 7 8 Model Size

9

10 11 12

Abbildung: Higher model complexity yields better regression error when using LEAPS. 25 / 31

Predicting Web Service Levels During VM Live Migrations Model selection

AIC: Most influencing variables of final model Variable Intercept wp01 load5 wp01 swapUsed wp01 residentSize SQ src host cpu proc s src host cpu proc s SQ wp01 cpu util vmnorm SQ wp01 cpu util vmnorm wp01 load5 SQ wp01 freeMemRatio SQ

Meaning VM UNIX load5 VM swap used The squared amount of resident memory used by the qemu-kvm process. Tasks created/s, source host. CPU util measured inside VM The squared UNIX load5 of the VM. The squared ratio of free memory inside the VM.

Estimate 2.395e+00 -1.871e-02 -7.656e-07 -4.652e-14

Std. Error 5.069e-01 2.627e-03 8.809e-08 1.652e-14

Pr (> |t|) 3.00e-06 3.67e-12 < 2e-16 0.00506

-2.475e+00

9.166e-01

0.00716

1.091e+00 -1.328e-01 9.517e-02

4.133e-01 3.187e-02 2.316e-02

0.00856 3.63e-05 4.64e-05

-1.140e-03

1.918e-04

5.22e-09

1.976e-02

9.462e-03

0.03727

Tabelle: The most influencing and significant variables of the final model produced by stepwise AIC. 26 / 31

Predicting Web Service Levels During VM Live Migrations Conclusions

Conclusions

Qualitative 1

Impact of live migration on SL depends on amount of workload

2

Tighter SLAs can be fulfilled for low workload situations

3

High workload: SL decreases massively

4

⇒ Live migrating when VM has only medium load

5 6 7

Avoid multiple live migrations within short time ⇒ Inertia effect for recently migrated VMs

Avoid senseless flip-flop migrations and stabilize service level

27 / 31

Predicting Web Service Levels During VM Live Migrations Conclusions

Conclusions

Quantitative 1

SL variance during a live migration to 90% predictable using only a single variable, the UNIX load5 average.

2

Models with 12 variables can explain 95% of variance

3

Models selected by AIC/LEAPS are of comparable quality and robust to statistical tests

4

AIC is way faster, so LEAPS is not really necessary

28 / 31

Predicting Web Service Levels During VM Live Migrations Conclusions

Conclusions

Technical 1

Systems using live migration as a mechanism to realize a more energy efficient target distribution need to consider the UNIX load average, at least for web servers and comparable

2

Typical hypervisors do not collect and export this information

3

Needs to be done by VM introspection

4

Related efforts by qemu-kvm and libvirt developers to pass-through the VMs’ memory utilization (free mem)

5

Should be extended to export load information

29 / 31

Predicting Web Service Levels During VM Live Migrations Conclusions

Conclusions

Future Work 1

Influence of additional VMs • idle • utilized • and a mixed scenario

2

Linux and qemu-kvm: Kernel Samepage Merging (KSM)

3

Database VM migration

4

Bandwidth limits, maximum allowed downtime

5

Migration delay, energy consumption, service downtime

30 / 31

Predicting Web Service Levels During VM Live Migrations Conclusions

Q&A Thank you for your attention!

31 / 31