Capability-Based Coordination for Open Distributed Systems

4 downloads 113465 Views 4MB Size Report
This thesis investigates a capability system which is based on the tuple- ...... enable tickets to be passed through tuple-spaces, yet another basic type—which ...
Capability-Based Coordination for Open Distributed Systems

Nur Izura Udzir

Submitted for the degree of Doctor of Philosophy

Department of Computer Science

July 2006

Abstract The tuple-space based model, also known as Linda, offers an alternative to the conventional point-to-point communication framework with regard to coordinating and synchronising agents’ activities. The shared data space provides a medium for communication and facilitates the coordination among the interacting agents. The clear separation between the coordination and the computation concerns relieves the agents of the messy aspects of communication, leaving them free to concentrate their time and space for other more crucial aspects of computation. Linda is also distinguished by its temporal and spatial separation properties, as well as its independence from any computation language or machine architecture—essential properties for coordination in open systems. It is useful, even important, to have some sort of control mechanism in coordinating agents in open distributed systems. As open systems need to be scalable, capabilities may provide the best-fit solution to overcome the problems caused by the loosely controlled coordination of Linda-like systems. Acting as a ‘ticket’, capabilities can be given to the chosen agents, granting them different privileges over different kinds of data—thus providing the system with a finer control on objects’ visibility to agents. This thesis investigates a capability system which is based on the tuple-space like coordination paradigm, in order to provide more finely controlled features whilst maintaining the flexibility of the tuple-space model. One drawback of capabilities is that they can only be applied to named objects—something that is not universally applicable in Linda since, unlike tuplei

spaces, tuples are nameless. To overcome this problem, the thesis introduces the novel concept of multicapabilities, which generalise capabilities to collections of objects, hence demonstrating how the advantages of capabilities can be extended to tuples. Multicapabilties enable some applications—which are not feasible (or even possible) in the standard Linda—to be more efficiently implemented: tuples can now be garbage collected; the deadlock breaking mechanism can be refined; and there is the potential of data being efficiently cached, for a more optimised resource management in the system. As capabilities and multicapabilities provide a finer control in the system, by controlling object’s visibility to agents, they also provide means to facilitate private communications in the otherwise ‘public’ broadcast communication model. Capabilities can be combined to produce capability-valued expressions, another concept introduced in this thesis, for a richer control in coordinating agents in open distributed systems.

ii

Contents 1 Introduction

1

1.1 Tuple-Space Based Coordination . . . . . . . . . . . . . . . . . . .

2

1.2 Capability-Based Coordination: A Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.4 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2 Literature Review

13

2.1 Coordination in Distributed Systems . . . . . . . . . . . . . . . .

13

2.2 The Linda Model . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.2.1

Linda-derived models . . . . . . . . . . . . . . . . . . . .

19

2.3 Capability-Based Control . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.1

Capabilities in TS-based models . . . . . . . . . . . . . . .

25

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3 Why Capability-Based Systems?

35

3.1 Loose Control in TS-Based Systems . . . . . . . . . . . . . . . . .

35

3.2 Security Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.3 Why Capabilities?

38

3.3.1

. . . . . . . . . . . . . . . . . . . . . . . . . .

Capabilities versus access control lists

. . . . . . . . . . .

39

3.4 Why Capability-Based Coordination? . . . . . . . . . . . . . . . .

42

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

iii

4 Lindacap: The Capability-Based System

45

4.1

The Capability Model . . . . . . . . . . . . . . . . . . . . . . . .

46

4.2

Methods of Capabilities . . . . . . . . . . . . . . . . . . . . . . .

50

4.3

Capability-Valued Expressions . . . . . . . . . . . . . . . . . . . .

55

4.4

Multicapabilities . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.4.1

Multicapability vs. Tuple-Spaces vs. Scope . . . . . . . . .

58

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

4.5

5 Abstract Description of Lindacap 5.1

5.2

5.3

5.4

Unicapabilities

63

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

5.1.1

Basic structure . . . . . . . . . . . . . . . . . . . . . . . .

63

5.1.2

Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

5.1.3

Derivations of unicapabilities

. . . . . . . . . . . . . . . .

65

Multicapabilities . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

5.2.1

Basic structure . . . . . . . . . . . . . . . . . . . . . . . .

68

5.2.2

Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.2.3

Expressions . . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.2.4

Derivations of multicapabilities . . . . . . . . . . . . . . .

72

Capability-Valued Expressions . . . . . . . . . . . . . . . . . . . .

76

5.3.1

Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

5.3.2

Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . .

83

5.3.3

Product . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

6 Implementation 6.1

6.2

93

Implementation Overview . . . . . . . . . . . . . . . . . . . . . .

93

6.1.1

PyLinda . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

6.1.2

Lindacap . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

Implementation Phases of Lindacap . . . . . . . . . . . . . . . .

97

iv

6.3 Phase 1: Implementing Unicapabilities . . . . . . . . . . . . . . . 100 6.3.1

Creating a tuple-space (and unicapability) object . . . . . 101

6.3.2

Descriptions of the primitives . . . . . . . . . . . . . . . . 102

6.3.3

Using a TS capability . . . . . . . . . . . . . . . . . . . . . 103

6.4 Phase 2: Implementing Multicapabilities . . . . . . . . . . . . . . 104 6.4.1

Creating a multicapability object . . . . . . . . . . . . . . 105

6.4.2

Descriptions of the primitives . . . . . . . . . . . . . . . . 108

6.4.3

Using a multicapability . . . . . . . . . . . . . . . . . . . . 109

6.4.4

Storing and retrieving tuples . . . . . . . . . . . . . . . . . 110

6.4.5

Passing capabilities . . . . . . . . . . . . . . . . . . . . . . 112

6.5 Input/Output Operations in Lindacap . . . . . . . . . . . . . . . 112 6.5.1

Implementation issues . . . . . . . . . . . . . . . . . . . . 115

6.5.2

Descriptions of the input/output primitives

. . . . . . . . 118

6.6 Phase 3: Implementing Capability-Valued Expressions 6.6.1

. . . . . . 119

Descriptions of the primitives . . . . . . . . . . . . . . . . 123

6.7 Phase 4: Implementing Resource Management Applications . . . . 125 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7 Applications

129

7.1 Garbage Collection of Tuples . . . . . . . . . . . . . . . . . . . . . 130 7.1.1

Implementation . . . . . . . . . . . . . . . . . . . . . . . . 132

7.1.2

Keeping track of capabilities . . . . . . . . . . . . . . . . . 138

7.2 Deadlock Breaking . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.2.1

Implementation . . . . . . . . . . . . . . . . . . . . . . . . 141

7.2.2

Example: Stable marriages . . . . . . . . . . . . . . . . . . 143

7.3 Private Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.3.1

Setting up private channels using capabilities . . . . . . . . 151

7.3.2

Example: Secure bidding . . . . . . . . . . . . . . . . . . . 154

7.4 Replication and Caching . . . . . . . . . . . . . . . . . . . . . . . 157 v

7.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

8 Evaluation

161

8.1

Experimental Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.2

Input/Output Operations . . . . . . . . . . . . . . . . . . . . . . 164

8.3

8.4

8.5

8.2.1

Writing tuples . . . . . . . . . . . . . . . . . . . . . . . . . 164

8.2.2

Reading a tuple . . . . . . . . . . . . . . . . . . . . . . . . 165

8.2.3

Input/output using capability-valued expressions . . . . . 167

Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.3.1

Storage overhead . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.2

Memory exhaustion: Garbage collection of tuples . . . . . 175

Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.4.1

Multicapability creations . . . . . . . . . . . . . . . . . . . 180

8.4.2

Basic input/output operations . . . . . . . . . . . . . . . . 181

8.4.3

Single tuple-space: Stable marriages . . . . . . . . . . . . . 182

8.4.4

Multiple tuple-spaces: Private channels . . . . . . . . . . . 186

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

9 Conclusions

193

9.1

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

9.2

Contributions of the Work . . . . . . . . . . . . . . . . . . . . . . 198

9.3

Closing Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Bibliography

201

Glossary

217

vi

List of Figures 1.1 Generative communication in Linda . . . . . . . . . . . . . . . .

4

4.1 A capability object referring to a tuple-space object . . . . . . . .

46

4.2 A capability object . . . . . . . . . . . . . . . . . . . . . . . . . .

50

4.3 Tuple-space operations using multicapabilities . . . . . . . . . . .

58

5.1 A multicapability referring to a group of objects . . . . . . . . . .

68

6.1 Layered implementation of Lindacap . . . . . . . . . . . . . . . .

98

6.2 Creating an object in Lindacap

. . . . . . . . . . . . . . . . . . 100

6.3 Requesting a multicapability and creating the region . . . . . . . 106 6.4 Storing tuples in PyLinda and Lindacap (region implementation) 111 6.5 Region and tag implementations of Lindacap . . . . . . . . . . . 122 6.6 Storing tuples in the tag implementation of Lindacap . . . . . . 123 7.1 A graph representation of the references/capabilities in Lindacap 134 7.2 An example for a finer garbage collection . . . . . . . . . . . . . . 136 7.3 The stable marriages solution using inp in a non-capability system 144 7.4 An incorrect attempt at creating a private channel in the capabilitybased system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.5 Creating a private channel in the capability-based system . . . . . 152 7.6 Private channel protocol in the capability-based system . . . . . . 153 7.7 Secure bidding in the capability-based system . . . . . . . . . . . 156 vii

8.1

Writing tuples in PyLinda, Lindacap-region and Lindacap-tag . 165

8.2

Reading a tuple in PyLinda, Lindacap-region and Lindacap-tag 167

8.3

Capability-valued expressions experiments: writing n tuples . . . 169

8.4

Writing tuples using capability-valued expressions . . . . . . . . . 170

8.5

Capability-valued expressions experiments: write n tuples, then read one tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.6

Reading a tuple using capability-valued expressions . . . . . . . . 172

8.7

Memory exhaustion problem . . . . . . . . . . . . . . . . . . . . . 177

8.8

Overhead added by tuple garbage collection scheme . . . . . . . . 178

8.9

Creating multicapabilities across n kernels . . . . . . . . . . . . . 181

8.10 Writing and reading tuples across n kernels . . . . . . . . . . . . . 183 8.11 Stable marriages experiments . . . . . . . . . . . . . . . . . . . . 184 8.12 Stable marriages: randomized lists of preferences . . . . . . . . . . 185 8.13 Stable marriages: best-case scenario . . . . . . . . . . . . . . . . . 186 8.14 A chain of private channels on 1 kernel . . . . . . . . . . . . . . . 187 8.15 A chain of private channels on n kernels . . . . . . . . . . . . . . 188 8.16 Setting up a chain of private channels on n kernels, and writing/reading data through the chain . . . . . . . . . . . . . . . . . 189 8.17 Setting up a chain of private channels . . . . . . . . . . . . . . . . 190 8.18 Input/output operations in a chain of private channels . . . . . . 191 9.1

A capability for capability objects . . . . . . . . . . . . . . . . . . 197

viii

Acknowledgements

In the course of my PhD, I had the great fortune of being supervised by a very supportive and helpful supervisor, always keeping his door open for me. His integrity, dedication, insightful advice and encouragement have inspired me throughout my doctorate, and the production of this thesis. Apart from his scientific support, he has also taught me valuable lessons and experience, not only in doing research, but also various other things, such as in humility, and life in general. My heartfelt gratitude to Dr. Alan Wood. I would also like to thank (in no particular order): • Andrew Wilkinson, without his PyLinda my work would have taken longer to be completed, and for his constant support in helping me with Python and PyLinda; • Dr. Jeremy L. Jacob for the suggestions and discussions, and knowledge imparted, and for ‘looking after’ me while Alan was away on sabbatical; • my examiners, Dr. Daniel Kudenko and Dr. Simon Dobson, for all their constructive comments and suggestions which surely helped in improving the thesis; • members of the Plasma (Programming Languages and Systems) research group at York for the ideas and suggestions given during my presentations of the work in the weekly group meetings; ix

• the administration, support and technical staff in the Department of Computer Science in York for their constant efforts and willingness to help; • my sponsors, the Public Services Department of Malaysia (i.e. the government of Malaysia) and Universiti Putra Malaysia (UPM) for the scholarship awarded to fund my study here in York; • my father Hj. Udzir Abdul Hamid, and the memory of my late mother, Hjh. Rashidah Said, both of whom have always believed in me; my family (especially my sisters) and in-laws in Malaysia for their prayers and supports; and • my friends (you know who you are) here in the UK and back home in Malaysia, for the encouragements and supports. Finally, but certainly not least, I would like to express my gratitude to my beloved husband Samsuddin for all the sacrifices, the patience, and for being understanding. And more importantly, for just being there. My thanks also to my children ‘Irfan and ‘Irdhina, who are my pride and joy, for their unconditional love and for understanding that sometimes “mummy has to work late”.

. . . and my humble gratitude to Him, who is the reason of my life . . . “Sesungguhnya solatku, ibadatku, hidupku, dan matiku, adalah keranaMu Allah Tuhan sekelian alam.”

x

Declaration This thesis is the result of my own work. It is not substantially the same as any report I have submitted for any other qualification at any other university. Parts of its contents have been published, in earlier forms, in the following papers: 1. Parts of Chapters 4 and 5 have been published in the Proceedings of the Seventh International Conference on Coordination Languages and Systems (Coordination 2005), 20–23 April 2005, Namur, Belgium. Lecture Notes in Computer Science 3454 [UWJ05]. 2. An extended version of the above paper is to appear in the journal of Science of Computer Programming, in the Special Issue on Coordination Languages and Systems (2006) [UWJ07]. This revised paper also includes updates on several application examples (from Chapter 7) and an overview of the implementation (part of Chapter 6). 3. Parts of Sections 7.1, 7.2, and 7.4 have been published in the Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2004), held on 9–11 November 2004, at the MIT Cambridge, MA, USA [UW04b]. 4. Section 7.3 is published in the Proceedings of the Second IEEE International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA’06), held on 24–28 April 2006, in Damascus, Syria [UW06]. xi

5. Another paper discussing a case study on the implementation of the Contract Net Protocol in variants of the tuple-space model, [UW04a] has been published in the Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 2004), 17–19 Feb 2004, Innsbruck, Austria. ACTA Press. • A part of this paper has also been presented in a poster entitled “Safe Contracting using Scope with Attributes” [UW04c] at the Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computer Science (PREP 2004), held on 5–7 April, at the University of Hertfordshire, UK.

xii

Chapter 1 Introduction Coordination is essential in open systems, where agents and active objects are free to join and leave the system at any time, i.e. they need not be defined prior to starting the infrastructure. Unlike the earliest computer systems which are isolated entities that communicate only with their human operators, modern computer systems and information environments today are interconnected and networked into largely distributed, open, and heterogeneous systems, e.g. the Internet. Computing itself is a process of interaction [OZKT01], where processes, or agents, need to communicate and cooperate with each other in order to gain information on other resources (hardware and software) needed. Their interactions must be coordinated and synchronized to maintain order in the systems. Coordination means managing the interaction and dependencies between the entities of a system [OZKT01]. There are several coordination models introduced over the past two decades, as will be discussed in Chapter 2. The discussion in this thesis is based on the tuple-space, or Linda model [Gel85, CG89] as an open distributed system. The next subsections present an overview of the tuple-space coordination model, which forms the context for the work, followed by the general concept of capabilities and how it can be incorporated in the tuple-space model. 1

2

Chapter 1. Introduction

1.1

Tuple-Space Based Coordination

The tuple-space, or Linda coordination model [Gel85] used as the focus for discussion in this thesis promotes generative communication where agents interact by ‘generating’ data (an ordered collection of typed values called a tuple) into a shared data space known as a tuple-space (TS). A tuple-space is an associative memory, which acts as a communication medium for the agents to interact. The coordination primitives enable agents to manipulate the tuple-spaces, such as writing and reading data. A tuple can be retrieved, destructively or otherwise, from the tuple-space by specifying a template whose pattern matches the tuple. The associative matching1 retrieval is non-deterministic: a retrieving agent may get any tuple that matches its template; and a tuple may be given to any agent specifying a matching template. In the conventional point-to-point communication model, the agents interact directly with each other, which has the advantage of providing controlled and directed communications. This form of communication, however, does not use the flexibility inherent in open systems, like the Internet. For instance, there is no separation of concerns between computation and coordination: the codes concerning the communication between agents (the Send/Receive instructions) are mixed and interspersed with other computational instructions. The absence of a separate medium that deals exclusively with the coordination aspects in the system means that the agents, on top of other computational activities, have to do all the ‘communication work’ themselves. The sender must address the message to a specific (set of) receiver(s), regardless of whether it is to be uni-cast, multi-cast, or broadcast to all. This obviously requires the sender to have prior knowledge of the receiver’s identity, either statically, or dynamically evaluated at run-time. The receiver, on the other hand, only waits for a message from any anonymous source. The sender is always active, while the receiver and the communication 1

See page 18.

Chapter 1. Introduction

3

system are passive with respect to the entire communication process. As an alternative, the tuple-space can be a very effective communication medium for interaction and coordination—the exclusive module of coordination codes actively does all the communication-related work. In this data-driven model, the interacting agents do not need to be concerned with how the communication is done—they simply drop, or pick up data they require from the tuple-space. In addition to that, the associative matching rules allow the agents to only specify a pattern of the data they want, rather than explicitly specify the exact tuple. Furthermore, the name-, space-, and time-uncoupling properties gives a new dimension of communicating where the agents involved are not required to be known to each other, nor exist in the same place at the same time in order to communicate. All these features relieve the agents of the burden of having to handle the coordination concerns, thus providing them with more space and time for other computational activities. Now a mature technology, research in tuple-space based coordination is providing general-purpose data spaces to create efficient large-scale implementations of open distributed multi-component systems. General evidence of successful implementations of the multiple tuple-spaces paradigm can be seen in the commercially developed Sun’s Javaspaces [FHA99] and IBM’s TSpaces [WMLF98], as well as projects such as PageSpace [CKTV96, CTV97], WCL [Row98, RW98b], Lime [PMR98] and TuCSoN [COZ00]. Linda’s power for coordination in an open, heterogeneous environment is well known. However, in order to profit from the advantages of open and flexible coordination mechanisms, a number of challenging practical problems need to be addressed. These have been noted by many authors in the past, and several solutions have been proposed, all imposing varying degrees of additional control by the system. Unfortunately, getting the optimum balance between flexibility and tighter control is difficult, and many of the proposed solutions lose the principal advantages that Linda-like systems have over many other models, particularly

4

Chapter 1. Introduction

the point-to-point communication model. There are many implementations of Linda, but for the purpose of the research, the assumption adopted of the system is one that accommodates multiple tuple-spaces; with arbitrary tuples; incorporates the (principled) inp and rdp [JW00], as well as the ‘bulk’ operations collect [BWA94] and copy-collect [RW98a]. The system is open, heterogeneous, and persistent, which makes the incorporation of garbage collection [Men00] imperative.

1.2

Capability-Based Coordination: A Motivation

In Linda, all communications are performed via a tuple-space: agents do not interact directly with each other. Consider a simple example: a producing agent A intends to send data to a consumer agent B. A generates a tuple t into the tuple-space TS, and t can be retrieved by B (Figure 1.1).

Figure 1.1: Generative communication in Linda

However, after t has been deposited into the tuple-space by A, it might not be retrieved immediately by B. In this case, the tuple will exist independently in the tuple-space. It is equally accessible to all agents in the system (who has access to the tuple-space), but bound to none [Gel85]. In contrast to the message-passing paradigm (in point-to-point communications) where data are sent directly to a

Chapter 1. Introduction

5

specified receiver, the tuple will be available to any agent that has access to the data space and can be manipulated either by an active data space or any other agent. This is when some form of control is necessary. In the context of open distributed systems, the ability to coordinate the agents coupled with the possibility to control the operations they perform is vital. The scope of this research focuses on incorporating capability-based control in tuplespace like coordination systems, emphasizing on controlling agents’ visibility on objects in the system, i.e. tuple-spaces and tuples (data). This research is in line with the general goal of the research in the field, that is to increase the system’s applicability. It is not practical for a system to have a loose control over agents’ action within it, particularly for a system designed for an open, heterogeneous environment. Providing a finer2 control over the agents’ interactions and coordination, than is available in the ‘classic’ Linda model, while at the same time maintaining the flexibility inherent in open system, is a challenge in a decentralized, and distributed environment. Incorporating capabilities results in a more practical, and safer3 coordination system. Moreover, the traditional notion of a centralized coordinator, which supervises the whole activity of a system, is no longer applicable or effective in an open, distributed environment. In order for systems to be scalable, it is essential to employ decentralized and locally managed components in charge of coordinating tasks and agents [COZ00]. One aspect of having a finer control is to be able to restrict what methods an agent is allowed to invoke on an object. Earlier work on coordination using object attributes [Woo99] demonstrated a simple solution to control agents’ access on objects in the system without resorting to any complex cryptographic security 2

The phrases ‘finer’ or ‘more refined’ throughout the thesis are used in relation to the

corresponding mechanism (within the context of the discussion) available in the standard Linda model. 3 ‘Safer’ in the more controlled sense, not in the security sense (Section 3.2).

6

Chapter 1. Introduction

approach. Together with access control lists (ACLs), this earlier paper also discussed the advantages of capability-based control [Dv66, Lev84] in a distributed environment. A capability is a ‘ticket’ held by the agent (not by the kernel4 nor the object it refers to), and contains all the information needed for the kernel to determine whether the requested operation is a valid one (i.e. allowed by the capability). The simple act of presenting the capability means that the authorization can be done ‘instantly’, without having to search for a list for verification. This is a key aspect of capability-based systems. The concept of capabilities is not new. Although various capability systems have been developed over the years, and is still an active research in some areas such as in object-based systems, it does not enjoy the same popularity in the tuple-space based systems.5 Despite the works of Chung and McDonald [CM02b] and Gorla and Pugliese [GP03], for instance, it seems that capability-based coordination has yet to demonstrate a significant impact to improve Linda-like coordination in the open distributed environment. Thus motivated, this research aims to investigate how capability-based control can improve the practicality of the tuple-space based coordination by focusing on refining the control features of the tuple-space models in a novel coordination system based on capabilities. Although capabilities have their drawbacks, mainly related to their management, it has been established that capabilities offer more dynamic control than access control lists (ACLs) [Woo99, CM02b, Sha99]—as shall be discussed in Chapter 3—thus making them more attractive for open systems. However, unlike access control lists, capabilities must refer to single named objects. In the 4

The term ‘kernel’ in this thesis refers to the underlying distributed mechanism that controls

all operations in the system. It represents the totality of the Linda ‘middle-ware’. 5 At least, none have been proposed as a ‘pure’ capability system, as capabilities are combined with other security techniques.

Chapter 1. Introduction

7

Linda context, tuple-spaces are uniquely identifiable—therefore, they can be referenced by capabilities—whereas tuples are anonymous: they can only be referred to using associative matching. Seeing that in order to achieve a finer control mentioned above, not only tuple-spaces need to be ‘protected’ by capabilities, but it is also vital to ‘protect’ tuples (by controlling their visibility to agents). This leads to the introduction of the novel concept of multicapabilities, as outlined in the next section. Multicapabilities enable nameless tuples to be referenced, thus making certain applications internal to the tuple-space systems possible: a finer garbage collection—where groups of tuples, rather than the whole tuplespace, can be reclaimed (Section 7.1)—and a more refined deadlock detection and breaking mechanism (Section 7.2).

Throughout this thesis, the term ‘capability’ shall be used to refer to capabilities in general, and the terms ‘unicapability’ or ‘multicapability’ accordingly when referring to a specific class. Whereas ‘capability’ in normal English usage is often synonymous with the rights, the term ‘capability’ used in this report represents ‘capability’ in a technical sense, where the rights—the ability to do something—are encapsulated in a capability (object), which also contains the referant’s identifier.

In this work, capabilities are viewed as not merely access control, but in a more general terms as visibility control. Visibility can represent security. Although this work does not address technical issues of security, it is indeed a crucial problem when dealing with agents with intelligence and autonomy, particularly those involved in some sort of confidential and sensitive business transactions or other critical applications (see Section 3.2).

A more detailed discussion pertaining to the motivation of the work is presented in Chapter 3.

8

Chapter 1. Introduction

1.3

Contributions

The main goal of the thesis is to define a novel capability-based system in the tuple-space coordination model. It demonstrates how capabilities can be used in coordination systems, particularly to control objects’ visibility to agents in the system, as a means for discriminating the type of operations permissible on certain objects (tuples or tuple-spaces). A run-time system, called Lindacap has been implemented, introducing several new original ideas that contribute to the overall thesis. They are: • Multicapabilities, which extends capabilities to collections of objects (Section 4.4). • Capability-valued expressions to perform combinatorial operations on capabilities (Section 4.3). • Garbage collection of tuples was not possible before in tuple-space based systems. But with the introduction of multicapabilities, tuples can now be garbage collected (Section 7.1). • Refined deadlock detection and breaking mechanism—capabilities can be used to refine deadlock detection and breaking using the principled inp (Section 7.2). • Capability-based private channels to facilitate secure communications between agents (Section 7.3). • Replication and caching of data using the information provided by capabilities (Section 7.4). Chapters 4, 5, and 7 will further elaborate on these concepts. Nevertheless, brief overviews are as follows.

Chapter 1. Introduction

9

Multicapabilities Capabilities must refer to named objects, but tuples are nameless, being accessed associatively. As tuples are also fundamental elements in the concept of tuplespace coordination model, it is imperative to implement capabilities for tuples, to achieve a finer control. This implementation involves one of the new concepts to be introduced, i.e. multicapabilities. Multicapabilities differ from the traditional notion of (uni)capabilities, which refer to a particular (named) object, in that they refer to groups of objects. If a permission in a unicapability allows its holder to operate (in a certain way) on the object it refers to, a permission in a multicapability allows the operation on an element within the group, not on the entire group referred to by the multicapability. More details are presented in Section 4.4.

Capability-valued expressions This research also investigates ‘combining’ operations for multicapabilities. For instance, a sum of two capabilities could be defined as representing a kind of disjoint-union of the two constituents. Other operations defined are subtraction and product. These operations provide a richer yet flexible control in the system. This idea is further elaborated in Section 4.3.

Garbage collection on tuples Garbage collection is a fundamental aspect in managing resources in persistent systems. Garbage collection has been proposed on tuple-spaces [Men00], but not tuples, since they are nameless, thus their references by agents—information that is essential for a garbage collection mechanism—cannot be maintained. Multicapabilities enable tuples to be referenced, therefore permitting unusable tuples (or a specific region of a tuple-space) to be garbage collected (see Section 7.1).

10

Chapter 1. Introduction

Finer deadlock breaking The semantic difficulties with the so-called ‘non-blocking’ primitives (inp and rdp) are well known. In order to overcome these, a version of inp and rdp was proposed based on a deadlock breaking mechanism [JW00]. Multicapabilities contribute towards improving, or refining the deadlock detection/breaking mechanism (as will be discussed in Section 7.2). Private channel One of the disadvantages of Linda is that is does not provide the facility for secure private communications between agents. All communication has to go through the shared data space, exposed to anyone having access to the tuple-space. Using capabilities, private channels can be established to enable private conversations between the agents, free of eavesdropping and interference (see Section 7.3). Data replication and caching The permission part in a capability provides the information about which agent can do what operation on which object. Therefore, if it is known that data in a collection can only be read (and not removed) by agents accessing the collection, then the data can be replicated and cached for optimization (see Section 7.4).

1.4

Thesis Organisation

This chapter has provided an overview of the entire thesis, which is structured as follows: Chapter 2: Literature Review. This chapter reviews the literature relevant to the thesis. It begins with a general overview of coordination in distributed systems, then focuses more closely on the two areas directly related to the thesis: the tuple-space coordination paradigm and capability-based control.

Chapter 1. Introduction

11

Several Linda variants are discussed, highlighting those using capabilitylike mechanisms to enforce control. Chapter 3: Why Capability-Based System? This chapter presents the motivation behind the research, with relevant background information, including a detailed discussion of the advantages of capabilities and their suitability for open systems. Chapter 4: Lindacap: The Capability-Based System. This chapter introduces the capability-based coordination system, reflecting on the methods of capabilities (such as restriction, transitivity and revocation). The newly introduced concepts of multicapabilities and capability-valued expressions will be discussed in detail. Chapter 5: Abstract Description of Lindacap. The abstract description of the capability system, including comprehensive descriptions of (uni)capabilities and multicapabilities, will be presented in this chapter. The concept of capability-valued expressions, which specify operations that can be performed on (multi)capabilities is also described. Chapter 6: Implementation. In this chapter, a number of issues and alternative techniques and implementations are presented, their relative strengths and weaknesses will be described and discussed. It will describe how the system is developed in stages, including the implementation of unicapabilities, multicapabilities, the capability-valued expressions, and the resource management applications. Chapter 7: Applications. A number of application examples will be presented in this chapter to demonstrate the usefulness of the capability-based system. The application examples include tuples garbage collecting mechanism, a more refined deadlock detection/breaking mechanism, and data replication

12

Chapter 1. Introduction and caching. Private channels can also be established in the capability system, to provide a facility for secure communications.

Chapter 8: Evaluation. In this chapter, some experimental results extracted from the implementation and the application examples are laid out to evaluate the system. Chapter 9: Conclusions. The final chapter of the thesis discusses the conclusions about the research described throughout the dissertation, and recapitulating the contributions. It will also present proposals for future work.

Chapter 2 Literature Review This chapter presents reviews of the literature relevant to the research. Two important areas are presented: the tuple-space coordination model, and capabilitybased control. The tuple-space based coordination paradigm is reviewed in Section 2.1 which gives an overview of the Linda model, followed by its evolution in the past two decades. The second part of the chapter discusses capabilities as an approach for control, beginning with the concepts of capabilities, and followed by reviews of several systems incorporating them, including some tuple-space based ones.

2.1

Coordination in Distributed Systems

In their 1992 paper, Carriero and Gelernter [CG92] argued that computation and coordination are orthogonal, in the sense that they are two independent aspects of programming. Computation is where a single computational activity is constructed, usually by a programmer, while coordination models are used to support communication between these computational activities. However, even though both models can be considered separately, they must be integrated to form a complete programming model. 13

14

Chapter 2. Literature Review The main concern of the work presented in this thesis is not the computa-

tion model, but rather the coordination part. Coordination languages are not fully-fledged, general-purpose programming languages. Rather, they are often defined as language extensions or scripting languages, exclusively concerned with coordination issues, as defined by Carriero and Gelernter [CG92]: “. . . the glue which allows us to build a unified program out of many separate activities, each specified using a computing language.” Ciancarini [Cia96] defined a coordination model as: “. . . a conceptual framework to model the space of interaction. Coordination models allow systems to be represented as multi-component assemblies, by defining which are the entities whose mutual interaction is ruled by the model, by providing the abstractions enabling the interaction between the entities, and by expressing the system’s governing rules.” The main purpose of a coordination system is to provide an effective and efficient communication between agents executing on different processors. Coordination languages (like Linda [Gel85]) have been proven to be sufficiently general to be used for building parallel applications [CG94], and for designing distributed computing platforms [WMLF98, Wal99]. There have been a number of coordination models developed over the past two decades. The earliest, and probably the most popular model is Gelernter’s Linda [Gel85], a model of concurrency developed at Yale University in 1985. Linda is a model based on generative communication, where agents communicate with each other by generating data into a shared data space, known as a tuplespace. As our work is centred on Linda-like models, the following section will elaborate more on Linda. In 1986, Agha introduced Actors [Agh86], a simple, rather low-level communication and synchronisation model that employs the concept of asynchronous

Chapter 2. Literature Review

15

message passing, with mailbox buffering. An actor (an agent) needs to know for certain the identity of the other actor it is to communicate with—certainly not a very desirable feature for open system communications. Linda, on the other hand, are more flexible—the interacting agents are decoupled in name, time and space, which have been mentioned in Section 1.1 (also see ‘uncoupling’ on page 18). ActorSpace [AC93] is a more flexible extension of the original Actor model, which allows an actor to send messages to several receivers, or listen to messages from more than one source. The IWIM (Idealised Worker, Idealised Manager) model [Arb96] uses ‘blackboxes’ to represent processes with specified input/output ports for stream-based communication. IWIM has been implemented in Manifold [AHS93]. However, this model has no shared data spaces, nor notion of the concept of temporal and spatial separation—the decoupling properties—as in Linda. Another popular model is Mobile Ambients [CG98, Car99, CGG00]. Ambients, the central construct in the model refer to bounded named administrative domains where agents exist and where computation happens [CG98]. An ambient has a name, can be nested, and can be moved as a whole. There are static ambients, as well as mobile ambients. Each ambient is a collection of locally running agents, which can be nested within other ambients, forming a tree structure. Each ambient moves as a whole, with all its subcomponents, and as agents are confined to ambients, they can indirectly move around the system (when the ambients move), but still be enclosed within the same ambient.

2.2

The Linda Model

Prior to the emergence of the tuple-space based coordination paradigm which supports generative communication [Gel85], there were three popular mechanisms and corresponding models of concurrent programming [And81]—monitors (shared variables) and semaphores; message passing; and remote procedure calls

16

Chapter 2. Literature Review

(RPC) [Nel81, BN84]. In Linda, the coordination model is based on open system broadcast communication via shared data spaces, a more general environment, rather than the usual point-to-point communication found in standard message-passing models. Without having to arrange for a definite rendezvous, the agents communicate indirectly via some common memory space for communication and synchronization purposes. The use of a shared data space for agent communication was first investigated in the artificial intelligence field with blackboards [NA79, EM88], information spaces where messages can be put and retrieved from. Generally, the Linda model is composed of four basic components: 1. Agents, or processes, or any active computing entities to be coordinated in the system. They may have been programmed in different languages. 2. Coordination medium, i.e. a shared data space used as a medium for coordination. 3. Data, or messages, to be communicated among the agents via the coordination medium. 4. Coordination primitives which govern the relationship between the agents and the coordination medium. One of the advantages of tuple-space based coordination is that the agents may have been programmed in different languages, and there is no need to reprogram them in order to coordinate them, because the agents are wrapped by inter-agent coordination mechanisms. This is why these models are relevant in the context of open systems, where the agents and their overall software architectures are not predefined. The fundamental objects in Linda, adopted by many other tuple-space systems are:

Chapter 2. Literature Review

17

Tuple-space (TS). A tuple-space is the coordination medium, i.e. the working environment of the system, which is a logical shared associative memory used to store tuples. It may contain multiple identical tuples, and there is no ordering of tuples within a tuple-space. The original Linda had a single tuple-space. However, in most later variants of Linda, a system may have multiple tuple-spaces. Each tuple-space has a name (reference) and tuple-spaces can be distributed around the system [RW98b]. Tuple. A tuple is an ordered collection of typed values (actuals). For instance, is a tuple with two values—the first value is of type integer, and the second value is a string. Template. A template is similar to a tuple, but the list may contain values (actuals) as well as value types (formals)—‘unspecified’ value of certain type, represented by the prefix ‘?’. A template is used only in input operations. Examples of templates are ,

Suggest Documents