2010 D.A. Menasce. All Rights Reserved. 1 .... calls to Call Center. QoS metrics: ⢠response ... commerce sites, call centers) have workloads that can be widely ...
An Introduction to Autonomic Computing Daniel A. Menasce Department of Computer Science George Mason University 1 © 2010 D.A. Menasce. All Rights Reserved.
Based on the papers: “The Vision of Autonomic Computing,” Jeff Kephart and D. Chess, IEEE Computer, January 2003. and other papers by D.A. Menasce, his students, and colleagues.
2 © 2010 D.A. Menasce. All Rights Reserved.
Motivation for AC • “…main obstacle to further progress in IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install, configure, tune, and maintain. – Need to integrate many heterogeneous systems – Limit of human capacity being achieved 3 © 2010 D.A. Menasce. All Rights Reserved.
Motivation for AC (cont’d) • Harder to anticipate interactions between components at design time: – Need to defer decisions to run time
• Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals • The workload and environment conditions tend to change very rapidly with time 4 © 2010 D.A. Menasce. All Rights Reserved.
Multi-scale time workload variation of a Web Server 3600 sec
60 sec
1 sec 5 © 2010 D.A. Menasce. All Rights Reserved.
Autonomic Computing • System that can manage themselves given high-level objectives. – High-level objectives can be expressed in term of servicelevel objectives or utility functions.
• Autonomic computing is inspired in the human autonomic nervous system: – “The autonomic nervous system consists of sensory neurons and motor neurons that run between the central nervous system and various internal organs such as the: heart, lungs, viscera, glands. It is responsible for monitoring conditions in the internal environment and bringing about appropriate changes in them. The contraction of both smooth muscle and cardiac muscle is controlled by motor neurons of the autonomic system.” http://users.rcn.com/jkimball.ma.ultranet/ BiologyPages/P/PNS.html
– The autonomic nervous system functions in an involuntary, reflexive manner. 6 © 2010 D.A. Menasce. All Rights Reserved.
Evolution of AC Systems • Automated data collection and aggregation in support of decisions by human administrators • Provide advise to humans suggesting possible courses of action • Take lower level actions automatically • Increase the scope and impact of actions taken automatically by systems in support of their AC behavior • Self-managing systems and devices will be completely natural
7 © 2010 D.A. Menasce. All Rights Reserved.
Autonomic Computing • Inspiration on self-governing systems such as social and economic systems in addition to purely biological ones. • Dimension of Self-Management: – Self-configuration – Self-optimization – Self-healing – Self-protection
8 © 2010 D.A. Menasce. All Rights Reserved.
Self-Configuring Systems • Automatic configuration of components within larger systems • Component registration – Need to advertise behavior – Need to advertise configuration options and mechanisms
• Automatic component discovery and integration
9 © 2010 D.A. Menasce. All Rights Reserved.
Self-Optimizing Systems • Complex middleware and database systems have a very large number of configurable parameters. Web Server (IIS 5.0) HTTP KeepAlive Application Protection Level Connection Timeout Number of Connections Logging Location Resource Indexing Performance Tuning Level Application Optimization MemCacheSize MaxCachedFileSize ListenBacklog MaxPoolThreads worker.ajp13.cachesize
Application Server
Database Server
(Tomcat 4.1) acceptCount
(SQL Server 7.0) Cursor Threshold
minProcessors maxProcessors
Fill Factor Locks Max Worker Threads Min Memory Per Query Network Packet Size Priority Boost Recovery Interval Set Working Set Size Max Server Memory Min Server Memory User Connections
10 © 2010 D.A. Menasce. All Rights Reserved.
Self-optimizing systems: motivation Software threads Arriving requests
. . .
queue for threads
cpu disk
Q: How does the response time vary with the number of software threads? © 2010 D.A. Menasce. All Rights Reserved.
11
2.5
Q: Why does the response time has this shape?
Response Time (sec)
2.0
Obs: For each workload level there is an optimal number of threads. 1.5
1.0
0.5
0.0 0
5
10
15
20
25
30
Number of Threads
L=1 © 2010 D.A. Menasce. All Rights Reserved.
L=2
L=3
L=3.5
SLA
12
Workload and QoS metrics Computer system Workload: • transactions • HTTP requests • video downloads • calls to Call Center
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate 13
© 2010 D.A. Menasce. All Rights Reserved.
Self-optimizing System Computer system Workload: • transactions • HTTP requests • video downloads • calls to Call Center
Controller
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Service Level Objectives © 2010 D.A. Menasce. All Rights Reserved.
14
Self-optimizing System Computer system Monitoring
Workload: • transactions • HTTP requests • video downloads • calls to Call Center
Monitoring
Controller
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Service Level Objectives © 2010 D.A. Menasce. All Rights Reserved.
15
Self-optimizing System Computer system Workload: • transactions • HTTP requests • video downloads • calls to Call Center
Parameter Change
Controller
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Service Level Objectives © 2010 D.A. Menasce. All Rights Reserved.
16
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g., homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction.
17 © 2010 D.A. Menasce. All Rights Reserved.
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g., homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction. • QoS has to be an integral part of the design of any computer system.
18 © 2010 D.A. Menasce. All Rights Reserved.
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g., homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction. • QoS has to be an integral part of the design of any computer system. • Online computer systems (e.g., Web sites, ecommerce sites, call centers) have workloads that can be widely varying.
19 © 2010 D.A. Menasce. All Rights Reserved.
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g., homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction. • QoS has to be an integral part of the design of any computer system. • Online computer systems (e.g., Web sites, ecommerce sites, call centers) have workloads that can be widely varying. • QoS autonomic control should be part of the design of complex online computer systems.
20 © 2010 D.A. Menasce. All Rights Reserved.
Results of a Self-Optimizing E-commerce Site Arrival rate
100 90 80
Arrival Rate (requests/sec)
70 60 50 40 30 20 10 0 0
5
10
15
20
25
30
Time (Controller Intervals)
"Preserving QoS of E-commerce Sites Through Self-Tuning: A Performance Model Approach," Menasce, Dodge and Barbara, Proc. 2001 ACM Conference on E-commerce, Tampa, FL, October 14-17, 2001. 21 © 2010 D.A. Menasce. All Rights Reserved.
35
Results of QoS Controller 0.8
0.6
0.4
QoS
0.2
.2 61 .5 63 .5 77 .3 80 .4 78 .5 83 .8 85 .7 85 .9 88 .2 88 .3 89 .2 90 .0 90 .3 89 .5 85 .7 81 .4 83 .7 79 .0 75 .9 73 .5 66 .2 64 .1 64 .4
.6
62
.1
36
.3
38
.4
41
.0
14
14
14
.4
0.0
-0.2
-0.4
-0.6
-0.8
Arrival Rate (req/sec)
Controlled QoS
Uncontrolled QoS
22 © 2010 D.A. Menasce. All Rights Reserved.
Experiment Results Arrival rate 100
QoS is not met!
90 80
Arrival Rate (requests/sec)
70 60 50 40 30 20 10 0 0
5
10
15
20
25
30
35
Time (Controller Intervals)
23 © 2010 D.A. Menasce. All Rights Reserved.
Dynamic Variation of Process Pool Size in Web Servers 45
Avg. No. Idle Processes
40 35 30 25 20 15 10 5 0 1
1.5
2
2.5
3
3.5
4
4.5
5
Avg. Arrival Rate (requests/sec) 20 Processes © 2010 D.A. Menasce. All Rights Reserved.
40 Processes
30 Processes
24
Dynamic Variation of Pool Size Scenario 1 2 3 4 5
Arrival Avg. No. Number of Rate Idle processes (req/sec) Processes 20 4.5 10.3 20 4.8 5.3 30 4.8 11.3 30 4.9 6.6 40 4.9 11.4
25 © 2010 D.A. Menasce. All Rights Reserved.
Self-Healing Systems • Automatic mechanisms to: – – – –
Identify failures Identify their root causes Determine how to repair the system Take into account complex dependencies between hardware and software failures
• Tools include: – Logs from various types of monitors – Data aggregation mechanisms – Statistical techniques to determine correlation between events and diagnose faults
26 © 2010 D.A. Menasce. All Rights Reserved.
Self-Protecting Systems • Automatic mechanisms to: – Defend the system against malicious security attacks – Prevent security compromises to occur due to component failures – Predict the onset of security attacks by analyzing events recorded in various types of logs and analyzing correlations that may lead to attacks.
27 © 2010 D.A. Menasce. All Rights Reserved.
Utility Functions U (X)
U (R) 1
1
0
0
Response Time (R)
U (S)
Throughput (X)
0
U (A)
1
0 Low
0
1
Medium
High
Security Level (S)
0
0.95
1
Availability (A)
U g (R, X, A,S) = f (U(R),U(X),U(A),U(S)) © 2010 D.A. Menasce. All Rights Reserved.
28
Architectural Considerations M A P E-K cycle
Autonomic Manager
Analyze
Plan Plan
Monitor KnowledgeExecute Execute Managed Element Autonomic Element 29 © 2010 D.A. Menasce. All Rights Reserved.
Architectural Considerations • Manage internal behavior and relationships with other autonomic elements guided by human-specified policies • Distributed service-oriented architecture is useful to support AC – SOA – Web Services – Grid Computing
• Autonomic Elements (AEs) may need resources from other autonomic elements – Need for robust and secure negotiation protocols for obtaining and releasing resources from other AEs – Need to monitor consumers for not overusing the AE’s resources – Behavior of one AE may depend on behavior of other looselycoupled AEs – Need formal languages (machine and human-readable) to express30 service contracts and SLAs with other AEs © 2010 D.A. Menasce. All Rights Reserved.
Engineering Challenges • How to program AEs? – Need tools to acquire and represent policies – Need tools to translate from high-level to low-level goals
• How to test AEs? – – – –
Repeatability issues Interconnected AEs AE interconnection and binding determined at run-time Difficult to put together controlled test environments
• Installation and configuration issues – Need directory services and brokers that locate AEs based on policies and SLAs
• Monitoring and problem determination – Continuously monitor suppliers to ensure compliance with SLAs and policies – Monitoring of consumers – Need statistical and aggregation techniques to be able to cope with huge amounts of data 31 © 2010 D.A. Menasce. All Rights Reserved.
Engineering Challenges • Relationship with other AEs – Interoperability can be achived through ontologies – Need to assess reliability and trustworthiness of other AEs – Negotiation with other AEs: • • • • •
• • • • •
Demand-for-service FCFS Posted-price Bi-lateral or multi-lateral negotiations over multiple attributes Third-party arbiter running auctions
Provisioning of resources after agreements are achieved Monitoring to check for compliance Security and privacy issues Robustness with respect to erroneous/not feasible policies. Mapping from high-level objectives to low-level ones.
32 © 2010 D.A. Menasce. All Rights Reserved.