Emergent Search in Large Distributed Systems Sergio Camorlinga1, and Ken Barker2 1
Computer Science Department, University of Manitoba, Winnipeg MB, R3T 2N2 Canada
[email protected] 2 Computer Science Department, University of Calgary, Calgary AB, T2N 1N4 Canada
[email protected]
Abstract. Large Distributed Systems (LDS) with ad-hoc connectivity and operation are becoming common in organizations and on the Internet. Self-adaptability to sustain a system-wide search is important to scale and find data with reliability. Many algorithms have been proposed including both blind and informed methods. However, they either make poor use of resources or have a burden of administrative activities to maintain tables and indices; other methods are so rigid that their applicability is questionable. This paper introduces a different approach based on Complex Adaptive Systems (CAS) called the Emergent Search. The Emergent Search fosters members’ independence and consequently selfadaptation and self-organization. Experimental results corroborate our hypothesis that a CAS-based algorithm provides a system-wide search with reliability that scales and is simple to implement.
1 Introduction Many Large Distributed System (LDS) environments are characterized by a dynamic, self-organization, and ad-hoc connectivity and operation of its decentralized members1. Examples of these environments are decentralized and unstructured Peer-to-Peer (P2P) systems [1][2], pervasive computing environments, some grid systems2, etc. The only constant is the variability of the member’s connectivity to the LDS. One moment a member can be present and the next unavailable either because the member is off, the member fails, or the member is not connected [3]. Implementing a search mechanism that is adaptable and scalable to these dynamic LDS environments is necessary to provide adequate data search and retrieval. Dynamic LDSs are common in the Internet community and large organizations as compared to other static, structured LDS schemes commonly documented in the literature. The LDS members’ dynamism has raised questions about the applicability of structured search algorithms previously published [4-6]. Consequently several search algorithms have been proposed that are more suitable for this kind of environment. A taxonomy based on object and peer information has divided the algorithms into blind and informed search methods [7]. Blind search algorithms do not have, or use, minimum information about other peers or object locations, whereas informed search algo1 2
In this paper, we use the term ‘location’, ‘peer’, ‘node’ and ‘member’ to mean the same. E.g. grids that have no common domain across the Internet.
rithms attempt to dynamically build tables and indices that are used to improve the search. Blind search algorithms are greedy in that they use brute force to search, consequently making poor resource utilization (e.g. bandwidth, messages, processing, etc). Informed search algorithms try to minimize the blind algorithms’ greediness by creating auxiliary management tables that somehow improve the search. However, its main limitation arises, similar to the structured search algorithms, because of its administrative requirement to keep statistics and tables updated. It is not clear how well these informed search algorithms will work on extremely variable LDS member populations. A hypothesis is that under extreme variable connectivity of the LDS members, algorithms should make LDS members as independent as possible (i.e. highly local) to avoid wasting resources (e.g. creating tables, sending table maintenance messages, computing statistics, etc). This paper introduces a different approach. One that is local, to foster member independence and consequently self-adaptation and self-organization, but it is global in that emergent outcomes result from the members’ activities. The proposed approach is called the Emergent Search scheme. Emergence is a global outcome that results from the quasi-independent activity of a large number of individuals. It is quasi-independent because there is minimum or nil communication among individuals. These individuals work in a community (i.e. a swarm) to achieve emergent behavior. The swarm environment is a Complex Adaptive System (CAS) that is highly adaptable, self-organized, and dynamic. These characteristics make CAS models suitable for the dynamic LDSs described and overcome some of the limitations of blind algorithms (in terms of brute force approach) and informed algorithms (in terms of administrative costs). Our basic hypothesis is that CAS algorithms based on squirrel behaviors provide essential mechanisms to implement an emergent search service in LDS. The emergent search is a system-wide search service provided by the continuous hoarding activities of squirrels. These squirrels are not aware of the global result. They only hoard their acorns3 according to their behaviors across the system. Those acorns will be used by each location to search locally. If the data is found then the original search location will be notified, otherwise the acorn will continue to be disseminated by other squirrels into other locations. Section 2 describes related work to the emergent search algorithm and some of their similarities and differences with other search algorithms developed for dynamic LDSs. Section 3 overviews the squirrel emergent search algorithm while Section 4 presents experiments carried on to understand several emergent outcomes under different scenarios. Section 5 discusses results and future work to conclude in Section 6
2 Related Work Complex adaptive system based models have provided emergent solutions for a variety of problems in different domains. Some examples where CAS has been applied are telecommunication routing [8], combinatorial optimization [9], resource allocation 3
An acorn could represent data (e.g. a file), a data identifier (e.g. file name) or a piece of data (e.g. file block) according to the context described in the paper. In this paragraph is used to represent a data identifier that is used to search for data.
[10], massive aggregation computing [11], etc. The solutions that emerge from the simple activities of a swarm’s members and the similarity with other natural complex systems (e.g. ecologies, brain, social systems, immune systems) have created a growing interest from the research community to explore CAS models as mechanisms to achieve solutions that otherwise would be difficult to obtain for complex problems. Interest has risen from areas as diverse as artificial intelligence, cognitive sciences, computational economics, mathematics, optimization, biology, psychology, neuroscience, and engineering. This research work expands current CAS work and uses CAS models as an alternative approach to existing blind and informed search algorithms. Blind search methods flood neighbors by different methods including: • All accessible neighbor nodes (e.g. the original Gnutella). • A set of accessible neighbor nodes [12]. • Iteratively through a set of neighbor nodes by increasing depth in the breadth first search [2][13]. • Through some of the neighbor nodes by means of random walkers [2]. These methods inherently waste resources because of the flooding scheme, with the exception of random walkers. However, random walkers’ performance is highly variable and its success depends on the topology and the data that is being search. Contrary to these flooding schemes, the Emergent Search algorithm proposed by this work is a living mechanism. It is a living mechanism because the peers by means of “squirrels” proactively and continuously move “acorns”. The search service emerges from this basic activity. No node initiates a search; the search is a continuous service within the LDS. Furthermore, the emergent search service minimizes messages by grouping several acorns within one message (i.e. a bag) and disseminates this message only to a random, relatively small group of peers. This greatly reduces resource waste and peers cooperate independently, which foster the self-adaptation and self-organization of a dynamic LDS.
Fig. 1. Distributed System with Sharable Data Acorns
Other blind search methods that use the network hierarchical structure by means of proxy servers or super servers have been developed. However, they depend on the hybrid architecture (i.e. proxy servers) to provide services [7]. The Emergent Search scheme does not depend at all on the network architecture and each LDS member is free to join and leave the network whenever they want. They participate dynamically and no member is indispensable to the CAS scheme. Informed search algorithms utilize a variety of methods to create administrative tables that help improve the data search. Query ranking [2], relative data probabilities [14], and indices [13][15] are commonly utilized mechanisms. However, these informed search algorithms incur costs associated with table management that does not exist in the emergent search model. It is unclear how informed search algorithms will function on highly dynamic LDSs where self-adaptation and self-organization is required. Emergent search by its CAS-based design fosters these two properties inherently.
3 The Squirrel Emergent Search Algorithm 3.1 Squirrel Emergent Search Algorithm Overview The search algorithms are based on squirrel complex adaptive behaviors. These behaviors are observed in squirrels when they hoard acorns in dispersed caches. The basic idea is that squirrels with simple hoarding activities disseminate acorns that contain identifiers. These acorn identifiers are used to dig out data acorns stored in the location storage caches where they were allocated [10]. Initially each location has its own squirrels (up to a maximum) according to its capabilities (Figure 1). Each location shares resources (e.g. files in sharable folders). Squirrels from different locations are independent of each other. The squirrels are unaware that their hoarding activities are used to search data. The search emerges as a global outcome of the activities of the independent members (i.e. squirrels) that work with simple activities but generate a system wide result. When a new member joins the LDS, there is no administrative activity to do besides joining the LDS. When a new search arrives at a location (Figure 2), the location’s squirrel puts the acorn id in a bag together with other acorn ids already existing that have been put there by other squirrels (if any) and hoards them in nearby locations. If a bag with acorn ids is placed in a location, the acorn ids are searched within this location. Any acorn id that is found is notified of the original search location, otherwise either the acorn id remains in the bag or the acorn identifier expires, terminating the data acorn search. New searches from this location are added to this bag (if any) and together continue to be disseminated by the local squirrel to other locations (Figure 3). It is important to stress the difference between emergent search algorithms and the blind and informed search algorithms. The whole LDS is a living organism from the perspective that has members joining and leaving continuously with no administrative burden. Once a member joins, its sharable resources are available and actively participate in the emergent search scheme according to its capabilities (i.e. by the number of sharable resources and by the number of location’s squirrels available). Each squirrel
works locally so that if the peer disappears, the emergent search algorithm self-adapts to its new context without interruption. Neither loss of administrative statistics nor waste of resources occurs. The emergent search self-adapts by adding new peers and/or ignoring failing peers into the search scheme with no administrative costs or processing delays. Furthermore, by packing several acorn ids in bags, the squirrels reduce the number of messages they carry by a factor of ‘n’, where ‘n’ is the average number of acorn ids per bag. The more activity the LDS has, the greater the message reduction factor.
Fig. 2. A New Search Executed by a Squirrel
3.2 Emergent Search Algorithm There exists three acorn deposits per location called the living, search, and dissemination caches. The living cache keeps all acorn searches originated from the location that are still alive. The search cache holds acorns being currently searched within the shareable resources at the location. The dissemination cache holds acorns ready to be placed into search caches at other locations. A new acorn to be search is assigned to the living cache. This new acorn is also either placed in the search cache (i.e. to be searched locally first) or placed directly into the dissemination cache (i.e. if the algorithm implementation does not search locally first). The algorithm requires the following definitions: Ai = a new acorn id i created by a query (e.g. file identifier). Ak = an existing acorn id k being searched at the location Locj = a location j within the LDS that participates independently in the emergent search and can create search queries. LCj = the living cache at location j. SCj = the search cache at location j. DCj = the dissemination cache at location j.
Fig. 3. An Existing Search Executed by a Squirrel
When a new search query is created at a location, the following algorithm segment is asynchronously executed to insert the query into the CAS search mechanism: LCj