Sep 13, 2011 - CRASH. Danilecki, Holenko, Kobusinska ... ReServE: An Approach to Increase Reliability in SOA systems ... Currently, recovery starts by hand â we are testing ... 3.00GHz CPU, Barracuda 7200.12 SATA 3Gb/s 500 GB HDD.
ReServE Service: An Approach To Increase Reliability in Service Oriented Systems A. D. Danilecki, M. Holenko, A. Kobusinska, M. Szychowiak, P. Zierhoffer {adanilecki,akobusinska}@cs.put.poznan.pl Poznan University of Technology
September 13, 2011
ITSOA project
The IT-SOA project aims: Self-healing Manageable
Easy monitoring
Virtualization Dependable
The research presented was partially supported by the European Union in the scope of the European Regional Development Fund program no. POIG.01.03.01-00-008/08.
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[1/27]
The IT-SOA toolkits from Poznan University of Technology DyMST: Dynamic Management SOA Toolkit : Obligations, Restrictions, Capabilities, Audit M3: Metrics, Monitoring, Management Failure Detection Service Module ReSP: Reliable SOA Platform Replication service Reliable Service Environment (ReServE) AtomicRMI RESTgroups Service-Oriented ad-hoc systems support Transaction support
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[2/27]
The Problem
In any distributed environment, failures are inevitable Existing fault-tolerance techniques have high cost associated Replication: improves high availability Transactions: highly restrictive: may be withdrawn even with transient failures Different organizations have different requirements wrt. fault tolerance
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[3/27]
The Constraints: solution should...
Respect the administrative independency of the different organizations Impose as minimal requirements on services as possible Allow to use both WS-*, Restful... Not interfere with the application level Be transparent to the applications Be automatic solution (allowing for separation of bussiness logic and fault tolerance)
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[4/27]
What we’ve done The reliability provided with external logging web services The organizations web service’s must follow few requirements and implement several methods available via standard interface Each organization may use provided proxy services, or implement their functionality on their own Each organization may implement their own, independent reliability policy, which is complemented by the possibilities offered by ReServE environment May be used standalone or with the rest of the ReSP toolkit Support for Restful services (support for WS-* possible) The ReServE services may be offered by many different organizations Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[5/27]
Assumptions The services are piecewise deterministic: Starting from some service state Sx , repeating the same sequence of requests (in the same order) results always in the same state Sy . As a consequence, response generated by a service depends only on its state and the request. The clients are piecewise deterministic Client execution is always the same, given the same responses
But the important thing is... Clients and Services may obey „reasonable” restrictions and requirements Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[6/27]
Normal execution
Ci t
ReServE
Si
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[7/27]
Service’s fault
Ci t
ReServE
Si
Danilecki, Holenko, Kobusinska ...
CRASH
ReServE: An Approach to Increase Reliability in SOA systems
[8/27]
Client’s fault
Ci
CRASH t
ReServE
Si
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[9/27]
General Architecture
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[10/27]
Problems to be solved
Message losses Detecting duplicates What to log? When to start recovery? From where start the recovery? How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost?
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[11/27]
The answers Message losses the messages are retransmitted until a response or acknowledgement is received Detecting duplicates a message identifier must be attached What to log? Taking advantage of HTTP semantics: may not log GET requests When to start recovery? From where start the recovery? How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost? Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[12/27]
The answers Message losses Detecting duplicates What to log? When to start recovery? Currently, recovery starts by hand – we are testing cooperation with FADE failure detector service using its callback capabilities From where start the recovery? Service Provider must expose information about the last request processed How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost? Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[13/27]
The answers Message losses Detecting duplicates What to log? When to start recovery? From where start the recovery? How to find out about request execution order? Service provider must attach an ordering information to the responses, reflecting the execution order What to do with service dependencies? postpone some outgoing requests; we must be careful here to avoid deadlocks How to ensure that service’s state changes seen by the client are not lost? postpone some service’s responses when necessary Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[14/27]
Failure-Free run
Ci Cj
a b
ReServE
Sj
a
c b b:1
c a:2
checkpoint last request b
Danilecki, Holenko, Kobusinska ...
last request a
ReServE: An Approach to Increase Reliability in SOA systems
[15/27]
The recovery after the failure 1
ReServE a b e:5
d:1 c:2 f:3
ordered queue
List of your recovery points, pretty please
unordered queue
C1: d C2: e
Sj checkpoint last request d
Danilecki, Holenko, Kobusinska ...
standby replica last request e
ReServE: An Approach to Increase Reliability in SOA systems
[16/27]
The recovery after the failure 2
ReServE a b e:5
d:1 c:2 f:3
Withdraw to C1
Sj
Danilecki, Holenko, Kobusinska ...
Done
c
f
Done
a,b,e
Done
ReServE: An Approach to Increase Reliability in SOA systems
[17/27]
Distributed Architecture Client A
Distributed rollback-recovery service
Service A
RMU
Client B
Service Repository Service B
RMU Client C
Service Repository
Service C Service D RMU
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[18/27]
The experiments
15 workstations connected by Gigabit Ethernet network, with 64-bit OpenSuse 11.3 (Linux 2.6.34.8-0.2-desktop-x86_64) 8GB RAM, Gigabit 82567LM-3 card, Core2 Quad Q9650 3.00GHz CPU, Barracuda 7200.12 SATA 3Gb/s 500 GB HDD Parameters (number of clients and requests) were tuned in order to not to saturate the network
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[19/27]
Read only requests ReServE overhead during failure-free runs (GET) 4000
average response time [ms]
3500
GET GET ReServE
3000 2500 2000 1500 1000 500 0 20
40
60
80
100
120
140
160
180
200
number of clients
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[20/27]
Small PUT requests ReServE overhead during failure-free runs (PUT) 4000
average response time [ms]
3500
PUT 4kB PUT 4kB ReServE PUT 32kB PUT 32kB ReServE
3000 2500 2000 1500 1000 500 0 20
40
60
80
100
120
140
160
180
200
number of clients
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[21/27]
PUT requests (cont) ReServE overhead during failure-free runs (PUT) 8000
average response time [ms]
7000 6000
PUT 4kB PUT 4kB ReServE PUT 32kB PUT 32kB ReServE PUT 128kB PUT 128kB ReServE
5000 4000 3000 2000 1000 0 20
40
60
80
100
120
140
160
180
200
number of clients
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[22/27]
PUT requests (cont) ReServE overhead during failure-free runs (PUT) 8000
average response time [ms]
7000
PUT 32kB ReServE PUT 32kB ReServE 6services
6000 5000 4000 3000 2000 1000 0 20
40
60
80
100
120
140
160
180
200
number of clients
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[23/27]
Recovery time Service recovery time 1000 No resource groups Low granularity resource groups Low granularity resource groups+semantics High granularity resource groups
total recovery time [ms]
800
600
400
200
0 2000
4000
6000
8000
10000
number of requests in queue
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[24/27]
Remaining problems and the future work
Performance is still not satisfying Better SOAP integration Security problems Browser clients What to do when piecewise determinism is almost true Integration with monitoring and audit tools from DyMST package Testing with real services and applications
Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[25/27]
Key points ReServE: outsourcing the reliability, using logging and external web services The web service’s using ReServE must follow few requirements and implement several methods available via standard interface Each organization may use provided proxy services, or implement their functionality on their own Each organization may implement their own, independent reliability policy, which is complemented by the possibilities offered by ReServE environment May be used as standalone tool or with the rest of the ReSP toolkit Support for Restful services (WS-* support possible) The ReServE services may be offered by many different organizations Danilecki, Holenko, Kobusinska ...
ReServE: An Approach to Increase Reliability in SOA systems
[26/27]
Thank you!