Impact-Sensitive Framework for Dynamic Change ... - University of Kent

2 downloads 0 Views 357KB Size Report
Tudor Dumitraş, Priya Narasimhan (Carnegie Mellon University) ... Solution architecture and implementation ... Generic architecture that takes into account:.
Carnegie Mellon

Impact-Sensitive Framework for Dynamic Change Management Tudor Dumitraş, Priya Narasimhan (Carnegie Mellon University) Daniela Roşu, Asit Dan (IBM Research)

Carnegie Mellon

Research Problem Enterprise SLAs (e.g., response time, availability, recovery time)

System Management Events (e.g., faults) & expected workload changes

Requests for HW & SW Upgrade

Generate List of Change Operations

Generate Timed Change Schedule Change Schedule

Orchestrator Change Changemanagement managementininaalive livesystem: system: ƒƒMinimize Minimizeservice servicedisruption disruption&&meet meetchange changerequest requestobjectives objectives ƒƒOptimize Optimizethe theoverall overallbusiness businessvalue valueofofthe thelive livesystem systemover overthe thechange changetime timehorizon horizon

© 2006 Tudor Dumitraş

2

Carnegie Mellon

Outline „

Case study ^

„

Hardware crash

Solution architecture and implementation ^

Components ^ ^

^ ^

„

Orchestrator Goal advisors

Interaction protocol Scheduling

Conclusions and future work

© 2006 Tudor Dumitraş

3

Carnegie Mellon

Related Work „

V. Kharchenko et al., “On dependability of composite Web services with components upgraded online,” WADS 2004 ^

„

IBM Tivoli Intelligent Orchestrator. http://www-306.ibm.com/software/tivoli/products/intell-orch. ^ ^

„

Estimates the “confidence in correctness” of composite Web services undergoing online upgrades

Performs resource arbitration Accounts only for immediate impact of resource changes

A. Keller et al., “The CHAMPS system: change management with planning and scheduling”, NOMS 2004 ^ ^ ^

Scheduling of operations to satisfy external RFC time objectives Focused on application deployment Doesn’t trade-off performance of live systems

© 2006 Tudor Dumitraş

4

Carnegie Mellon

Solution Approach Generic architecture that takes into account: ^

Enterprise SLOs & change request deadlines ^

^

Variation of key performance indicators (KPIs) over a long time horizon, optimizing long-term business value ^ ^ ^

^

Assessment of the overall impact of change schedules through interaction with multiple goal advisors

Transient impact, during change execution Permanent impact, after change Monitoring both performance and dependability metrics

Heterogeneous types/sources of change operations: ^ ^

System management events (e.g., faults, workload surges) Requests for Change (RFCs)

© 2006 Tudor Dumitraş

5

Carnegie Mellon

Sample Configuration: 2-Tier System

Service 1 Primary Service 2 Backup

Service 1: Class 1 Service 2: Class 2

ODR

© 2006 Tudor Dumitraş

Service 2 Primary Service 1 Backup

WAS1 DB1

DB2

DB3

WAS2

SLOs • Response Time • Availability • Recovery Time

XD Group

Service 2 Primary

Database WAS3

DB Group

6

Carnegie Mellon

Case Study: Hardware Crash

Service 1 Primary Service 2 Backup

Service 2 Primary

Service 2 Primary Service 1 Backup

WAS1 DB1

ODR

WAS2

DB2

DB3

Change operations: ƒ Remove node from XD group ƒ Add node to DB group

Database

ƒ Hand-off Service 1 to new node

XD Group

© 2006 Tudor Dumitraş

WAS3

ƒ Checkpoint DB for Service2

DB Group

7

Carnegie Mellon

Scheduled Change Operations WAS1

Checkpoint operation is delayed until Service 1 has new primary

WAS2 Remove node from XD

WAS3 DB1 DB2 DB3

Add node to DB Group

H-off

Crash H-off Checkpoint t

Workload (Service1) Delay changes until Service 2 workload (AppServer bound) decreases

Workload (Service2)

t

Business Value

Resp. Time (Service1)

t

Resp. Time (Service2) Recov. Time (Service 1)

t Availability of Service 1 decreases due to loss of a backup node

Expected Recovery Time increases due to lack of checkpoints

t

Recov. Time (Service2)

t

Availability (Service1)

t

Availability (Service2)

© 2006 Tudor Dumitraş

After backup node transfer availability goes back to normal

8

t

Carnegie Mellon

Solution Architecture & Interaction Protocol