Deadlock Detection in Distributed Decentralized ...

Deadlock Detection in Distributed Decentralized Systems

Deadlock Detection in Distributed Decentralized Systems Avoiding Deadlocks in Large Scale Enterprise Applications Ernest Stambouly

Author Note Submitted to distributed object technology standards specification proposal for concurrency pervasive service of the Model Driven Architecture specification, The Model Driven Architecture (MDA) at The Object Management Group (OMG), 1998.

1

2 Abstract In a distributed objects universe, concurrency control features that provide distributed business objects (DBO) with concurrency control capabilities must emerge from the architecture itself and not from enterprise application programmers who normally code for concurrency on a case by case basis. This paper proposes a platform-independent set of abstractions which supply concurrency control constructs such as locks, guards and other various synchronization schemes, but most importantly, it proposes a deadlock detection solution that can support multiple concurrency models pledged by DBO implementations to secure a built-into-the-architecture runtime concurrency service for any distributed and decentralized system. The concurrency service offers a set of features for application programmers to synchronize data as part of an implementation fulfilling synchronization obligations it has pledged to assume. Using the concurrency service will alleviate the programmer’s efforts when dealing with the pitfalls and complexities of concurrent programming in distributed and decentralized application spaces, where the programmer would only have to declare the required concurrency model, for a given type of DBOs, to play in a distributed application space occupied across multiple types, implementations, and system nodes. The programmer is granted coherent, reliable and, intelligent locking schemes for the detection and avoidance of deadlock embrace. Keywords: distributed architecture, concurrency, decentralized, deadlock, thread control, model driven architecture (MDA).

3

Introduction Designing and programming for concurrency can be very challenging for first-time as well as for experienced developers when faced with multi-threaded environments. Add to that concurrency in distributed systems, and the problems to solve are amplified by orders of magnitude. In addition, of our prime requirements for architecting large scale distributed systems is the absence of centralized control and governance (ownership of policies governing the execution and preemptive behavior). Our proposed analysis in this paper fulfill the requirements for distributed and decentralized systems. Our proposed solution is architectural, and this paper will define and qualify what this means. We propose an architecture of a distributed universe in which Distributed Business Objects (DBO) live. Business objects developers comply to this architecture to take advantage of the facilities at their disposal in order to reduce the complexity of concurrent programming. Every concurrency situation has to be analyzed from the perspective of a DBO needs, not from a desire to “optimize” every possible DBO implementation for thread safety and concurrency. Concurrency does not yield optimization, nor does it imply faster processing. This myth stems on two facts that are often overlooked by concurrency-oriented developers: First, the use of locks and execution “traffic-control” in a thread-safe implementation is expensive; second, concurrent processing does not mean parallel processing. The distinction between these two notions is too subtle, yet important enough to explain here. Two threads are said to execute in parallel when they are assigned two separate physical CPUs, and therefore are truly executing at the same time. With the use of a multi-processor

4 hardware, you can really reap the benefits of parallelism. But this has its own set of challenges, such as synchronizing access to shared resources, and difficulties that this paper will not delve into. Two threads are said to execute concurrently when they execute the same section of code (e.g., method), and they might be sharing the same CPU. If they do share the same CPU, the two threads have to timeshare, and the resulting performance is generally lesser than if they were executing sequentially the same non-synchronized section of code – subtract the overhead of timesharing and synchronization code. Of course, this is not always true; a mix of I/O intensive threads and computation intensive ones creates a nice balance which works in favor of timesharing. In this case, code written for concurrency produces better results than its sequential counterpart with a single CPU. When programming for concurrency, you also add a new variable to your execution environment. This variable induces the problems of liveness and safety. Improperly designed synchronization code can lead to thread starvation; one example is the all-too-known deadlock embrace problem, which we will focus on in this paper. This is the Liveness problem. Safety pertains to consistent object state: Arbitrary execution of concurrent threads against improperly protected state variables can leave an object in an inconsistent state. Safety and Liveness are the two opposite ends of a scale, when you tip one end up, it drags the other down. These are some of many considerations to be taken when embarking on concurrent design and programming. Keeping them in mind help you make better decisions as to the threading and concurrency fate of your DBOs.

5 Decentralized Concurrency in Large Scale Distributed Infrastructures Conceptual Description In this conceptual description, we cover concurrency scenarios for DBOs exchanging requests at a global level, always assuming a synchronous mode of communication as the default mode of communication. The intent is to explore a common threading behavior that applies to distributed settings and non-distributed settings alike. The discussion inspects how concurrency affects an architectural metaobject (AMO), understanding that metaobjects manage to concurrency control on behalf of application level objects. The particulars of the concurrency of each metaobject are covered in its architectural context—what role and function it plays within the formal architecture specification. Non-distributed setting In a typical scenario of sending a request, regardless of the way the collaborating metaobjects are distributed, we can conceptualize a thread starting its execution in the client implementation, making its way through the process space binding (PSB) and finally execute the targeted operation in the implementation of the server1 object. When the operation execution terminates, the same thread carries the reply back to the client using the same route. This simplistic scenario is depicted in Figure 1. Figure 1

Client

Send Request Scenario

Client Binding

Server Binding

Server

6 Distributed setting In a distributed setting where the client and server objects are on different processes or machines, the client and server communication is distributed. We define a logical thread to be a thread whose execution can span across processes or machines. A logical thread can be composed of one or more physical threads that extend each other and whose execution is always synchronous with respect to each other. The physical components of a logical thread are ‘connected’ together by the architectural metaobjects that handle remote communications. PSBs handle synchronization of execution within a process space, and a request message broker (DRB) handles the same across process spaces. With a logical thread, the scenario of Figure 1 above looks exactly the same. Simultaneous requests Let’s consider now the scenario where two clients make simultaneous requests to the same DBO. Figure 2 depicts this scenario (the thread path for the reply is omitted.)

Client

Client Binding

Server Binding

Server

Client

Figure 2

Simultaneous Requests

From the illustration in Figure 2, we can make the following observations: •

The client binding always executes on the client thread.

•

The server binding can always execute on the same client logical thread.

•

Either the client binding or the server binding can impose concurrency control policies on the DBOs they reflect2.

7 The architecture reflection capability provides the basis for the separation between concurrency control and business object level processing: Because all inter-DBO communication is handled by architectural metaobjects, the code that implements the business behavior a DBO can potentially remain free of synchronization constructs, and the metaobjects involved in request delivery can provide the appropriate mechanisms for concurrency control. In our simultaneous requests scenario, the binding on the server side can easily synchronize the execution of the simultaneous requests if the implementation that it binds is declared to be non-synchronized. This form of synchronization can also be extended to the client-side binding: if a particular server implementation is expected to receive a high volume of requests, the client-side binding can limit the flow of request delivery on behalf of its client implementation by suspending the delivery of requests from its end as long as it has a predetermined number of pending messages. In order to do that, it needs to have available to it the information about the concurrency model of the server implementation, and it needs to know that the server-side binding that it targets is a potential point of contention. This feature is handled with a request flow control scheme that is outside the scope of this paper. The same logical thread that carried the request carries the reply back to the client. You can think of the thread as folding back to its point of origin. Self Calls An object sending a request to itself follows the same threading behavior as if it was any other client: The thread emanates from the object, goes through the client-side binding to its server-side binding and back into its implementation. Let’s consider the implications of this statement with respect to synchronization.

8 Figure 3 illustrates the scenario where a DBO sends a request to itself while another client sends it a simultaneous request. Figure 3

Self Call versus Client Call Thread T2

Client Binding

Thread T1 Client Binding Client

Server Binding DBO A

Non-Synchronized Implementation First let’s consider the situation where the implementation of DBO A is nonsynchronized, in other words, it relies on its server-side binding to serialize the execution of incoming messages. If thread T1 reaches the server-side binding before thread T2, it will go through and execute first. If T2 arrives at the server-side binding right after T1 began execution, it will have to wait for it to complete (using localized guard mechanisms). But there is something wrong with this situation: If T2 is the thread for a self-call, then T2 must have been executing in the implementation of DBO A before T1 got a chance to execute. We said that the implementation of DBO A is non-synchronized, and therefore T2 could not have possibly went through the server-side binding synchronization guard. Therefore, the only way this can happen (T2 versus T1) is if T2 was spawned as a result of T1. In other words, T2 is an extension of T1, and therefore, based on the definition of a logical thread, T1 and T2 are the same logical thread. This means that even if the implementation of DBO A spawns a physical thread T2 in order to perform the self-call, T2 is considered the extension of T1: This has two implications: •

T1 has to wait for T2 to execute; T1 cannot proceed while T2 executes.

9 •

Because T2 is the same logical thread as T1, the server-side binding will allow its execution, because as far as its synchronization guard is concerned, this is the same logical thread reentering the implementation of DBO A; the guard allows the re-entrance of a thread because the responsibility of the server-side binding guard is to make sure that no two threads execute concurrently past its point. In order for the implementation of DBO A to reify the notion of a logical thread, it has to

use the threading facilities provided by the official concurrency service defined by the formal distributed architecture3. The diagram in Figure 4 depicts the correct view for the logical thread of execution in this particular situation. Figure 4

Self Call versus Client Call – one logical thread Thread T2 Thread T1 Client Binding Client DBO

Server Binding Server DBO

The second scenario, where T2 reaches the server-side binding guard before T1 does is trivial because in this case, T2 will wait for T1 to complete, regardless of whether it makes selfcalls or not. Synchronized Implementation The second situation is when the implementation of DBO A is synchronized, and therefore the server-side binding does not control its concurrency, but allows the implementation itself to handle and synchronize its concurrent threads of execution. Let’s imagine that multiple threads are concurrently executing within the implementation of DBO A. Any one of these threads can perform a self-call, and from the server-side binding perspective, it will not matter

10 because it does not serialize or guard the execution of concurrent threads for a synchronized implementation. So self-calls in fully synchronized implementations do not present the same problems that exist with non-synchronized implementations. Deadlock Detection A common situation where two DBOs can starve in a deadlock embrace is when the following occurs: •

they both have references to each other (two-way association);

•

they both send a request to each other using their own separate threads;

•

the implementation of both DBOs is non-synchronized, and therefore, the thread used to send the request is the only one allowed to run – this does not mean that if the implementation concurrency model is fully synchronized, then a deadlock cannot occur. The diagram in Figure 5 illustrates this situation.

Figure 5

Two-DBO Deadlock

Thread T1

G-time t DBO A Serverside binding Client Binding

DBO A

G-time t’ Client Binding

DBO B

DBO B Serverside binding

Thread T2

11 As in the previous thread diagrams, the blue wavy lines represent threads of execution. The diamond terminators represent a thread blocking on a guard; and the circles represent metaobjects (bindings in dashed outline). G-time t and t’ are the timestamps for when the threads T1 and T2 acquired their respective guards. In Figure 5 thread T1 is executing in DBO A and it sends a request to DBO B; thread T1 acquired DBO A’s execution guard at time t. Around the same time, thread T2 is executing in DBO B and sends a request to DBO A; thread T2 acquired DBO B’s execution guard at time t’. We said that the implementations of both DBOs are non-synchronized, so the guard allows only one thread at a time. When T1 reaches B’s server binding, it suspends on B’s execution guard because T2 is currently executing. The same thing happens to T2 when it reaches A’s server binding. Both threads are deadlocked. This situation cannot be prevented, but it can be detected. How can this deadlock situation be detected? We can start by asking the question: “Can one of the threads find out if the other thread is suspended, and what target it is waiting for?” T1, for instance, could follow the logical thread T2 and find out that it is suspended on the DBO binding that it is currently handling. This could be done using the data encoded in the metaobjects associated with the requests themselves. However, this cannot be done without incurring a remote call if the two DBOs are not co-located. The next best thing it for a suspended thread to find out: “What is the target of the thread that it is waiting on?”

12 We can envisage a dialogue between the two threads: “What is your target” T1 asks T2; “DBO A” replies T2. T2 can easily answer this question because it has this information available to it from its metaobject data. Now that T1 knows that T2 is targeting the very object that itself is handling, it needs to figure out what T2 is currently doing: T2 can be in one of two situations: 1. T2 is currently suspended waiting for A’s guard – so it is in the same situation that T1 is in. 2. T2 has finished its request execution on A, and it is on its way back. This means that T2 has acquired A’s guard, executed and relinquished the guard even before T1 got to A. T1 can detect the first condition, but it cannot detect the second one. For the first condition, T1 can rely on the guard timestamp to figure out what is the condition of T2 as described in expression E1: (E1) If t’ > t then T2 is blocked on A’s guard for sure This means that if T2 acquired B’s guard after T2 acquired A’s guard, then T2 couldn’t have possibly went through A’s guard. In this case T1 can throw an exception and retreat – recovery measures can be taken – this is a different topic and will not be covered here. In this case, T2 proceeds and the deadlock is resolved. If E1 does not hold, then its inverse expression E2 holds: (E2) If t’

Deadlock Detection in Distributed Decentralized ...

Deadlock Detection in Distributed Decentralized ...

Suggest Documents

Deadlock Detection in a Distributed

Deadlock Detection & Deadlock Prevention of Distributed System

Deadlock detection in distributed systems - Computer

Deadlock Detection in Distributed Object Systems - CiteSeerX

Fault Informant Distributed Deadlock Detection Using ...

Distributed Deadlock Detection using Fault Informing Probes

Deadlock Detection Views of Distributed Database

Deadlock prevention in a completely decentralized

A decentralized deadlock detection and resolution ... - Springer Link

Optimal Deadlock Detection in Distributed Systems Based ... - CiteSeerX

Distributed Deadlock Detection in Mobile Agent Systems - CiteSeerX

Distributed Deadlock Detection in Mobile Agent Systems - CiteSeerX

Deadlock Detection in Distributed Operating System - Google Sites

A Decentralized Deadlock-free Concurrency Control ... - CiteSeerX

AVOIDING DEADLOCK IN DISTRIBUTED DATA BASES ... - CiteSeerX

Deadlock in Distributed Operating System - Google Sites

DEADLOCK PREVENTION AND DETECTION IN ...

Decentralized Detection in Resource-limited

DECENTRALIZED DETECTION IN UNDIRECTED ... - CiteSeerX

Deadlock Detection for Resource Allocation in ...

Competitive Distributed Deadlock Resolution and Resource Allocation

Decentralized Monitoring of Distributed Anytime

A COMPILE-TIME DEADLOCK DETECTION PATTERN

Deadlock Detection and Resolution for - CiteSeerX