An Approach To Increase Reliability in Service Oriented Systems

ReServE Service: An Approach To Increase Reliability in Service Oriented Systems A. D. Danilecki, M. Holenko, A. Kobusinska, M. Szychowiak, P. Zierhoffer {adanilecki,akobusinska}@cs.put.poznan.pl Poznan University of Technology

September 13, 2011

ITSOA project

The IT-SOA project aims: Self-healing Manageable

Easy monitoring

Virtualization Dependable

The research presented was partially supported by the European Union in the scope of the European Regional Development Fund program no. POIG.01.03.01-00-008/08.

Danilecki, Holenko, Kobusinska ...

ReServE: An Approach to Increase Reliability in SOA systems

[1/27]

The IT-SOA toolkits from Poznan University of Technology DyMST: Dynamic Management SOA Toolkit : Obligations, Restrictions, Capabilities, Audit M3: Metrics, Monitoring, Management Failure Detection Service Module ReSP: Reliable SOA Platform Replication service Reliable Service Environment (ReServE) AtomicRMI RESTgroups Service-Oriented ad-hoc systems support Transaction support



[2/27]

The Problem

In any distributed environment, failures are inevitable Existing fault-tolerance techniques have high cost associated Replication: improves high availability Transactions: highly restrictive: may be withdrawn even with transient failures Different organizations have different requirements wrt. fault tolerance



[3/27]

The Constraints: solution should...

Respect the administrative independency of the different organizations Impose as minimal requirements on services as possible Allow to use both WS-*, Restful... Not interfere with the application level Be transparent to the applications Be automatic solution (allowing for separation of bussiness logic and fault tolerance)



[4/27]

What we’ve done The reliability provided with external logging web services The organizations web service’s must follow few requirements and implement several methods available via standard interface Each organization may use provided proxy services, or implement their functionality on their own Each organization may implement their own, independent reliability policy, which is complemented by the possibilities offered by ReServE environment May be used standalone or with the rest of the ReSP toolkit Support for Restful services (support for WS-* possible) The ReServE services may be offered by many different organizations Danilecki, Holenko, Kobusinska ...


[5/27]

Assumptions The services are piecewise deterministic: Starting from some service state Sx , repeating the same sequence of requests (in the same order) results always in the same state Sy . As a consequence, response generated by a service depends only on its state and the request. The clients are piecewise deterministic Client execution is always the same, given the same responses

But the important thing is... Clients and Services may obey „reasonable” restrictions and requirements Danilecki, Holenko, Kobusinska ...


[6/27]

Normal execution

Ci t

ReServE

Si



[7/27]

Service’s fault

Ci t

ReServE

Si


CRASH


[8/27]

Client’s fault

Ci

CRASH t

ReServE

Si



[9/27]

General Architecture



[10/27]

Problems to be solved

Message losses Detecting duplicates What to log? When to start recovery? From where start the recovery? How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost?



[11/27]

The answers Message losses the messages are retransmitted until a response or acknowledgement is received Detecting duplicates a message identifier must be attached What to log? Taking advantage of HTTP semantics: may not log GET requests When to start recovery? From where start the recovery? How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost? Danilecki, Holenko, Kobusinska ...


[12/27]

The answers Message losses Detecting duplicates What to log? When to start recovery? Currently, recovery starts by hand – we are testing cooperation with FADE failure detector service using its callback capabilities From where start the recovery? Service Provider must expose information about the last request processed How to find out about request execution order? What to do with service dependencies? How to ensure that service’s state changes seen by the client are not lost? Danilecki, Holenko, Kobusinska ...


[13/27]

The answers Message losses Detecting duplicates What to log? When to start recovery? From where start the recovery? How to find out about request execution order? Service provider must attach an ordering information to the responses, reflecting the execution order What to do with service dependencies? postpone some outgoing requests; we must be careful here to avoid deadlocks How to ensure that service’s state changes seen by the client are not lost? postpone some service’s responses when necessary Danilecki, Holenko, Kobusinska ...


[14/27]

Failure-Free run

Ci Cj

a b

ReServE

Sj

a

c b b:1

c a:2

checkpoint last request b


last request a


[15/27]

The recovery after the failure 1

ReServE a b e:5

d:1 c:2 f:3

ordered queue

List of your recovery points, pretty please

unordered queue

C1: d C2: e

Sj checkpoint last request d


standby replica last request e


[16/27]

The recovery after the failure 2

ReServE a b e:5

d:1 c:2 f:3

Withdraw to C1

Sj


Done

c

f

Done

a,b,e

Done


[17/27]

Distributed Architecture Client A

Distributed rollback-recovery service

Service A

RMU

Client B

Service Repository Service B

RMU Client C

Service Repository

Service C Service D RMU



[18/27]

The experiments

15 workstations connected by Gigabit Ethernet network, with 64-bit OpenSuse 11.3 (Linux 2.6.34.8-0.2-desktop-x86_64) 8GB RAM, Gigabit 82567LM-3 card, Core2 Quad Q9650 3.00GHz CPU, Barracuda 7200.12 SATA 3Gb/s 500 GB HDD Parameters (number of clients and requests) were tuned in order to not to saturate the network



[19/27]

Read only requests ReServE overhead during failure-free runs (GET) 4000

average response time [ms]

3500

GET GET ReServE

3000 2500 2000 1500 1000 500 0 20

40

60

80

100

120

140

160

180

200

number of clients



[20/27]

Small PUT requests ReServE overhead during failure-free runs (PUT) 4000


3500

PUT 4kB PUT 4kB ReServE PUT 32kB PUT 32kB ReServE

3000 2500 2000 1500 1000 500 0 20

40

60

80

100

120

140

160

180

200

number of clients



[21/27]

PUT requests (cont) ReServE overhead during failure-free runs (PUT) 8000


7000 6000

PUT 4kB PUT 4kB ReServE PUT 32kB PUT 32kB ReServE PUT 128kB PUT 128kB ReServE

5000 4000 3000 2000 1000 0 20

40

60

80

100

120

140

160

180

200

number of clients



[22/27]

PUT requests (cont) ReServE overhead during failure-free runs (PUT) 8000


7000

PUT 32kB ReServE PUT 32kB ReServE 6services

6000 5000 4000 3000 2000 1000 0 20

40

60

80

100

120

140

160

180

200

number of clients



[23/27]

Recovery time Service recovery time 1000 No resource groups Low granularity resource groups Low granularity resource groups+semantics High granularity resource groups

total recovery time [ms]

800

600

400

200

0 2000

4000

6000

8000

10000

number of requests in queue



[24/27]

Remaining problems and the future work

Performance is still not satisfying Better SOAP integration Security problems Browser clients What to do when piecewise determinism is almost true Integration with monitoring and audit tools from DyMST package Testing with real services and applications



[25/27]

Key points ReServE: outsourcing the reliability, using logging and external web services The web service’s using ReServE must follow few requirements and implement several methods available via standard interface Each organization may use provided proxy services, or implement their functionality on their own Each organization may implement their own, independent reliability policy, which is complemented by the possibilities offered by ReServE environment May be used as standalone tool or with the rest of the ReSP toolkit Support for Restful services (WS-* support possible) The ReServE services may be offered by many different organizations Danilecki, Holenko, Kobusinska ...


[26/27]

Thank you!

An Approach To Increase Reliability in Service Oriented Systems

An Approach To Increase Reliability in Service Oriented Systems

Suggest Documents

An Approach to Exception Handling for Service-Oriented Systems

Reliability Evaluation of Service-Oriented Architecture Systems ...

ESTIMATING RELIABILITY OF SERVICE-ORIENTED SYSTEMS - ijicic

A Heterogeneous Approach to Service-Oriented Systems Specification

an evolution process for service-oriented systems

[PDF] Database Systems: An Application-Oriented Approach ...

(eBook) Database Systems: An Application Oriented Approach ...

A Service-Oriented approach to Implementing an ... - Semantic Scholar

A SERVICE ORIENTED APPROACH TO MOBILE ...

An Object-Oriented Approach to Banking Information Systems

IC-Service: A Service-Oriented Approach to the ... - CiteSeerX

IC-Service: A Service-Oriented Approach to the ... - CiteSeerX

Transaction management in Service-Oriented Systems: requirements ...

A Container-Based Approach to Fault Tolerance in Service-Oriented ...

A model-based approach to Fault diagnosis in Service oriented ...

TOWARDS AN AGENT ORIENTED APPROACH TO SOFTWARE ...

AN OBJECT ORIENTED APPROACH TO THE ... - iMechanica

An Object-Oriented Approach to Basic Mechanics

AN OBJECT ORIENTED APPROACH TO THE

An Aspect-oriented Approach to Enforce Security

MESH: an Object-Oriented Approach to

An approach to automatically enforce object-oriented

An Object Oriented Approach to Workflow Software

Toward an Object-Oriented Approach to Software