network names and its IP address is the domain name service. (DNS) [21] of the ... and create virtual backbones for discovering and registering resources.
1
IRIS: Design and Implementation of an Intentional Resource Indicator Service Mukundan Venkataraman, Puneet Gupta, Mainak Chatterjee, and Kartik Muralidharan
Abstract— This paper describes the design and implementation1 of IRIS: an intentional resource indicator service. IRIS springs from the concept that end-users should not be bogged down with network names when looking for a resource on the network, and should have the liberty to freely sketch and describe what they want and not where to go about finding it. IRIS archives the following in performing network searches for resources: (i) expressiveness in near English like queries, (ii) subjective searches leading to more relevant results, and (iii) complete transparency with increased usage. IRIS design bifurcates nodes in the networks to resolvers and end-hosts. Resolvers and resources form a network overlay using soft state communications. Network routing is coupled with user intent to dynamically map requests to available resources using this overlay, thereby allowing a graceful join and leave procedure. We perform algorithmic analysis of IRIS at an abstract level for the most computationally intensive steps for tree based recursive searchings. We also present implementation results in terms of memory, CPU usage, control signalling, and delay to demonstrate the performance of IRIS.
I. I NTRODUCTION Resource discovery in networks pertains to the ability to extract desired services available directly from the network. With hardware costs diminishing and the ability to connect disparate devices to the network, the potential for resource discovery with a large ensemble of heterogeneous devices is a promise coming true. Traditional searches are rather objective in nature: a user submits a query that is interpreted literally, and a simple list of resources is presented to the user. This omits important criterions: the context and subjectivity of the person issuing a search for a resource. Theoretically, as far as hard wired networks are concerned, resource discovery is both a manual and static process (for e.g., discovering something as basic as a printer in a new campus). The process is manual since the user has to explicitly configure his machine to identify available resources; and static, because, even if the user had configured a default printer which cannot handle color documents, calls to the printer requiring color prints will be issued to the default one without prior warning. The only workaround possible is for the user to explicitly use his own understanding and judgement in selecting a resource. This process obviously kills transparency, and one cannot go on to do more fancier things like delegate a document to the M. Venkataraman and M. Chatterjee are with the Department of Electrical and Computer Engineering at The University of Central Florida, Orlando, FL 32816. E-mail: {mukundan,mainak}@cpe.ucf.edu. P. Gupta and K. Muralidharan are with Pervasive Computing Wing, Software Engg and Technology Labs, Infosys Technologies Ltd, Bangalore 560100, India Email: {puneet gupta, kartik muralidharan}@infosys.com 1 For screen shots of IRIS working, please visit http://www.cs.ucf. edu/˜mukundan
least loaded printer or the nearest printer (while on the move) or connect to only, say, public or community printers. What is to be taken away is that these metrics are application specific and there is only so-much that a network layer implementation of resource discovery can achieve. A. Motivations for this work The following factors motivate our design: • Intentions: Computer networks is about two applications talking to each other, and it is the intention of the user sitting on top of the application that needs to be served. One of the foremost visions of pervasive computing is that devices around a user should serve him and his needs rather than him adapting to the idiosyncracies of the devices. Taking into account the subjectivity and context of a query intuitively results in more meaningful search results. • Extracting Resources: With an accepted trend that smart spaces and pervasive computing environments of the future will have deeply embedded devices which will make computation transparent to the user, the ultimate challenge will be to extract relevant services efficiently from these environments. • Low level naming paradigms: It is increasingly observed that in most distributed systems, naming of nodes for low level communication leverages topological location and is in general independent of any application (see, for e.g., [7]). In emerging classes of distributed networks, low level communication does not rely upon network topological location. Rather, low-level communication is based on names that are external to the network topology and relevant to the application. Since data is now self identifying, it enables activation of application-specific processing inside the network. • Names should mean what, not where: IRIS springs from the concept that users should not be bogged down by network names when looking for a resource. Rather, they should be able to express to the network what they want and not where to go about finding it (O’Toole and Gifford in [13]). Since IP addresses are synonymous to geographical locations, the first step is to decouple this bondage to introduce expressiveness; both from the users in what they want, and from the resources in what they offer. B. IRIS overview and contributions We present IRIS: Intentional Resource Indicator Service for resource discovery in networked environments. The IRIS
2
topology consists of hosts, resolvers and resources. Hosts are conventional end users who interact with the network. “Resolvers” within the network can either be hosts or specialized devices which perform the resource discovery with the aid of a database. We also devise a routing scheme earmarked towards IRIS for performing request-replies in such a network. Hosts generate queries that are subjectively parsed at the end hosts to generate an XML description of the user intention. This is sent to resolvers, which act in two steps: (i) identify potential resources that can service this request from a database; and (ii) communicate with the identified resources to get real time information (like level of loading, pricing, accessibility, proximity etc.) to further refine the solution space. The set of resources are then mapped into a reply XML and sent to the host. This is once again parsed at the host to display the resources in a meaningful order. Apart from request-reply dialogues, IRIS goes a step further in establishing user guidance and complete transparency. When exact match resources are not available, IRIS initiates a negotiation with the user to let him pick what is best available, and this is done by extensively using history of searches to identify the most important criterions within a search. Also, with extensive usage, IRIS so completely understands a user that it itself disappears from the users view (in a sense, explicit queries need not be provided since past data is sufficient to construct intentions), and all that is left are user intentions that directly fetch the correct resource. We have designed and implemented IRIS in two floors of the Infosys Campus [1], and our code base runs over 7000 lines in Java. In our quest to design IRIS, we achieve the following: • We integrate user intention with message routing, and in effect fuse the application with network level indirection which in traditional networks are treated separate. Since users do not specify a network address for a resource, it allows them to connect to resources even though the mapping between end nodes and network addresses change over time. • IRIS uses a decentralized network of resolvers which perform the actual resource discovery in the network. A resolver overlay network is created entirely using soft state beacons: both to discover/monitor resources and to maintain a resolver overlay. Entries not refreshed periodically are deleted, and this provides for seamless registration and de-registration of services. Resolvers further perform load balancing by spawning/terminating resolvers with growing/waning load, and in effect solve scalability problems. • A non-intrusive design which requires minimal changes to the infrastructure already present. Instead of performing changes to the fundamental resource, we create software overlays at the resolver that periodically monitors the status of the resource. • We keep the design consistent with end-system intelligence (as is seen in the Internet [24]). In other words, intelligence is pushed to the network edges while keeping the network “core” dumb. This has specific advantages: (i) new applications can be seamlessly deployed by a sim-
•
ple host update; (ii) allows for application level service description, rather than a network layer implementation of it. Ferreting resources based on hop count, available bandwidth or congestion can never match subjective requirements. We provide a basic blueprint for accommodating subjectivity in searches. IRIS keeps extensive track of past searches, allows users to use free form language descriptions in expressing intent, and evolves with and understands the user better in time. With extensive usage, IRIS itself disappears from the users perspective and all that is left is user intentions mapping straight to resources, providing complete transparency.
Fig. 1.
Schematic representation of what IRIS does.
II. R ELATED R ESEARCH Resource discovery has been an issue since the early networking days, and the earliest solutions were implemented in the network layer (in fact, as of today, this is the norm). Approaches that utilize this strategy are incomparable to IRIS since they have no provisions for capturing application specific metrics for resource discovery. Trying to fuse resource discovery with application specific metrics is a very recent concept indeed. One of the earliest proposals to mapping network names and its IP address is the domain name service (DNS) [21] of the Internet. A. Related Architectures Sun’s Jini [16] provides “federation of networked devices” over Java’s Remote Message Invocation (RMI) for spontaneous distributed resource discovery. Jini does not address dynamism or resource failure. Universal plug-and-play [20] uses a subset of XML to describe resources provided by devices, which is similar to our concept of using XML documents for “business cards”, but the design and philosophies differ greatly. The Service Location Protocol (SLP) [25], [23] achieves the discovery heterogeneous network resources using centralized Directory Agents. The Berkeley Service Discovery Service (SDS) [14] extends this concept with
3
secure, authenticated communications. IBM’s “T” Spaces [17] provide a lightweight database over which nodes perform queries. However, this framework is highly optimized for static client-server applications rather than dynamic peer to peer interactions. Related literature with similar philosophies would be the Information Bus by Oki et. al. [22], which allow applications to communicate with each by only providing the subject of data, as is the case with Salamander [18]. There has been work using such philosophies in entirely different domains like Jacobson’s multicast based self configuring Web caches [15], and Estrin et. al.’s diffusion based approaches [6] for data aggregation in sensor networks and the SPIN [10] protocol also for sensor networks. Numerous protocols exist that use a resolver like concept to aid network functionalities. CARD [3] avoids global flooding, and lets local resources (one hop) be discovered easily, while making every node maintain distant “contacts” for discovering remote resources. Kozat et al propose Network Layer Support [4] for resource discovery, and create virtual backbones for discovering and registering resources. Saeda and Helmy propose Rendezvous Regions [5], dividing a network topology into geographic regions, with each region responsible for a set of keys representing services in that region, and subsequently mapping this information by electing a few nodes to map the entire network. Cheng and Marsik propose piecewise network awareness [8] to leverage service advertisement and discovery as well as network awareness techniques. This work however focuses on reducing wireless bandwidth consumption and battery energy of mobile devices. Tilak et. al. [9] propose dynamic resource discovery for sensor networks to allow a modular implementation to discovery dynamic and heterogeneous resources with a little tradeoff between inter-operability and energy efficiency. Doval and O’Mahony propose NOM [11] for resource discovery in evolving networks, where nodes and resources may switch from on to off or devices move physically. Sailhan and Issarny propose a scalable service discovery for ad hoc networks [12] based on dynamic and homogeneous deployment of cooperating directories within the network. Stann and Heidemann in BARD [19] identify resource discovery to be the most expensive step in data dissemination in sensor networks, and counter global flooding by modeling searching and routing as a stochastic process. Note however, that none of these foregoing protocols focus on application centered resource discovery, subjective interpretation or an attempt to understand and evolve with the user. Also, to the best of our knowledge we are not aware of protocols suited or optimized for planned network deployments like an enterprise or home networking. B. How IRIS differs The design and goals of IRIS strongly overlap with that of INS [2], but the implementation and areas of focus differ greatly: (i) IRIS believes in incremental pervasive ability, and is more friendly with existing infrastructure. Little needs to be modified to set up IRIS (INS requires changes in resource drivers as well as compatibility with Chord, a peer-to-peer look up service that complements INS), (ii) IRIS uses XML
for data exchange, and XML has emerged to be a convincing standard for such purposes in the Internet, (iii) INS is highly tuned for self collaborative networks (like ad hoc networks) and established infrastructures would gain little from it. IRIS, on the other hand, has been tuned more towards deployment in enterprise mobility solutions; (iv) IRIS lets the user intent to take foremost focus, and allows him to express himself freely in language. IRIS strives best to understand the meaning of a users intent in a subjective fashion. INS does not provide any subjective interpretation; (v) IRIS in time starts to understand the user more and more with time, to a point where IRIS itself disappears from the users view, and all that is left is the users “understood” intentions and his domain of resources; (vi) IRIS does not propagate changes in the entire network, and makes changes to local nodes within a hop only. This makes the design extremely scalable and avoids excess networks traffic; (vii) IRIS proposes the use of a new routing framework that not just leverages an IRIS like philosophy, but is good as far as planned deployment of IT infrastructure like enterprise are concerned. Like INS, IRIS can complement such architectures. Unlike DNS, where resolvers form a static overlay, resolvers in IRIS collaborate with each other and prove resilient and dynamic. III. S YSTEM D ESIGN A. Hosts: Understanding user Intentions The IRIS host is internally partitioned into Application Layer and Processing/Intelligence layer. IRIS lets users to use free form strings to specify their intent to it. The text is primarily English to begin with. We present here a basic framework for capturing user intents which adapts itself to the user in time. IRIS comes with an initial built in vocabulary of keywords, and this collection is based on a survey made by us asking over a hundred people to list things that they think is appropriate when describing a printer resource. Users are free to add keywords to the vocabulary to further customize IRIS with time. In general, we identify a user intent to be of the following format: (adjective, proximity, urgency, flexibility, attr1, attr2 .. attr ’n’). A tabulation of initial possibilities is listed in Fig. 2. These are further elucidated: • Adjectives: An adjective is a word that describes in a relative fashion how the resource should be, based entirely on a users subjectivity. Its interpretation will mean differently for different users. Take an example of a “good” printer. This might mean a color laser printer to one user and ink-jets to another. To let IRIS interpret such intentions, we maintain a customization file in an XML format for every user. The file is initially populated with definitions derived from “most probable” understandings. However, as a users continues to use IRIS and makes selections of a resource for a given query, IRIS records the various attributes of a resource chosen by the user to its initial interpretation. This lets the customization evolve according to the user. Adjectives are further classified as strong and weak adjectives. A strong adjective describes many things about a resource that need not be explicitly mentioned.
4
Keywords Strong Adjectives
Usage “excellent, good”
Weaker Adjectives
no adjectives “Med, low, any” “near, nearest, close, closest” “least-loaded, fast, urgent” Negotiable: “try” NotNegotiable: “hard” Color: “color, bw” Access: Public, private
Proximity Urgency Flexibility Other Attributes
Defaults Color printer, high res. black-white, lower resolution nearest printer least loaded printer flexible Inflexible Color capabilities
Defaults
Fig. 2. Tabular representation of the default customization (stored as an XML) of adjective interpretations. Each field in the left hand column is a node in the XML tree that is parsed according to the inference presented. The definitions, however, change with continuous usage.
•
•
•
•
Proximity: The user can explicitly declare if the resource he wants should be close to him or if he is not bothered about its proximity altogether as long as he gets what he wants. Words like “near, nearest, close, closest” etc. trigger this clause. This is an assumed metric by default2 . Urgency: This describes how fast the user expects the turn around time for the resource to be. Words like “fast, urgent, hurry” etc. indicate such an intent. In the appropriate context of the resource, we understand this to be something which is least loaded, or something which can process a request really fast. For the printer example, this might mean a “least-loaded” printer (one with a small or zero pending queue of jobs to process). Flexibility: The user might use an intent to describe a resource, but not necessarily want all of them. In other words there is an implicit relative importance amongst the attributes themselves. Based on past selections of resources for given intentions, a list of attributes in a reverse priority list is created (from the most important attribute sought to the least important one). If an exact match cannot be found, IRIS will keep knocking off the least important parameter from the query and attempt searching until all resources of that type (wildcard) are found. Other Attributes: These are other attributes that are specific to the resource in question. For example, attributes like black-white or color, a Xerox or an Epson printer and so on would fall in such a category.
B. Explicit Query to IRIS As IRIS starts, a GUI pops up with a simple text box to accept user queries. When we look for a resource, we first generate a “vague” description of what we are looking for in our heads. It is easy to recall the last time one wanted to go to a restaurant, or a movie, or something you wanted to buy, or tried to look for a printer in a new campus. One intuitively decides on what one wants with a simple sentence 2 See
Lemma 4 in Section IV for the rationale behind this
Fig. 3.
Input query processing modules at the IRIS host
BlackWhite 1024x680 Public Laser 19 1 Fig. 4. XML describing the user intent (request-XML) for the example case of “good black white public printer”
of the form “I need a good black-white printer, and fast”. The motive of IRIS is to let you stop at that intent generation. This query string is a bunch of unprocessed data. IRIS first extracts from this a set of “tokens”. A token is defined as a word which potentially means something to IRIS. This is done by comparing individual words to known keywords, and the end result is a set of attributes for the resource in question. For the given query, the keywords recognized at this point are (good, black-white, public, printer). To generate the values associated with the attributes, the customization XML is parsed (in accordance with the current state of definitions, as in Fig. 2). This yields a set of attribute-value pairs used to define a user intent, which in effect is a request-XML. Going with the foregoing example, the request-XML thus constructed is shown in Fig. 4. Note that though the adjective “good” defaults to a color printer, the prescience of the token ”black-white” is reflected in the “Mode” field in the request. C. IRIS Guidance It is possible that the user wanted something, and no exact match could be made for two reasons: (i) The resource with such parameters is simply not available in the network and (ii) The user wanted proximity to be important, and such a resource is not available in the immediate vicinity. In such cases, we let IRIS guide the user to a resource.
5
In the former case, it is not possible to satisfy the user completely. What can be done, however, is present the user with the nearest match that most correctly matches his set of parameters in the right order of priority. IRIS starts to build a reverse priority list with increased usage, which is a list of most important parameters in a search in a reverse order (the least important parameter first). This is built by recording the most frequently occuring attribute and its corresponding value in the choice of resources made by the user in the past for his given intents. This lets IRIS understand parameters most important to the user for a given resource over a long period of time. For example, it is possible to have a list such as this: proximity, color, resolution, load. When guidance is looped, IRIS will knock off the first attribute (proximity) from the query (if present) and make an attempt, and then the second attribute and so on until the most important parameter remains. In the latter case, when the user wanted proximity and the resource was not available in the immediate vicinity, the resolver in the zone contacts neighboring resolvers with the user intent. the neighboring resolvers in turn perform a search, and the net result is communicated back to the user (with duplicate suppression). This process could go on with more resolvers being incrementally contacted in two, three, or more hops radius until either a match is found or if the user selects an appropriate resource. Note that the users intent is still served best, because IRIS still tries to find a resource in as little number of hops as possible. D. I “Trust” IRIS In time, IRIS adapts to the user. It records the queries typed by the user, the state of the query as it was when a selection of a resource was made (this is more relevant when the user was flexible and negotiation continued till he found a resource was chosen). The “state” of the query is defined in terms of the dimensions of describing a resource: adjective, proximity, urgency, flexibility and other attributes. Note that history is recorded only if the user commits to a resource by selecting it. When extensive profiling has been performed, it is possible to not say anything to IRIS, and let it do best from what is understands of the user. By tracking history, IRIS itself constructs a query, lets in parse in the users context, and constructs a request-XML.
Fig. 5. Topology layout for IRIS. The given topology is logically partitioned into zones, with at least one resolver per zone.
driver code, thus changing the fundamental “printer”. To counter this, we design a resource proxy. This is a code snippet that is not in the resource, and instead resides in the resolver. As soon as a resource is identified, a piece of code for that resource type is initiated. This performs the necessary monitoring of the resource and keeps the resolver updated. Resolvers have two main functions: (i) Parse incoming request XML’s from hosts to identify user intention, query the database to find resources that match the request, and communicate with the resources to extract real time information and (ii) wait for resources to connect to the network. Once a new resource is identified, static information (or immutable fields like name, type, capabilities etc.) from the resource is entered into the database. An overlay for that resource is additionally created that continues monitoring it. We allow resolvers to spawn additional resolvers if the load on a particular grid exceeds a certain limit, and allow spawned resolvers to terminate themselves if the load is beneath a threshold. Such designs are popular, and are good as far as load balancing and congestion are concerned.
IV. N ETWORK T OPOLOGY AND ROUTING E. Resolvers and Resource Overlays Resolvers are network entities that aid in the performance of the network. They could either be conventional hosts or specialized devices to this end. Resolvers are passive entities that wait for both resources to connect to a network and for hosts to submit queries on well known ports. Resolvers also perform resource monitoring via resource overlays. The concept of resource overlay springs from the requirement that we make little changes to the existing infrastructure and embed the entire logic in software. Probing for real time data about a resource (like the queue-length) becomes a lot easier if the resource co-operates with periodic broadcasts of real time information about itself into the network. Taking the printer example, this might require changes to the device
Conventional routing schemes in ad hoc networks or static infrastructures are in general unsuitable for IRIS like architecture for a variety of reasons. Since resources are to be discovered in a given radius from the user, we need a mix of controlled flooding and extensive interactions between resolvers and available resources. IRIS design gains little from schemes like AODV [29] and DSR [30]. Assuming a transmission radius of 50 meters, we conducted a simple experiment with 10 hosts within a typical IRIS grid of 50 × 50 meters. The number of control messages used by AODV and DSR rises exponentially with number of hosts. To counter this, we designed a topology control mechanism and routing scheme to better suit the needs of IRIS.
6
Fig. 7. Fig. 6.
Abstract representation of the parsing file
Host-Resolver-Host interactions
A. Topology maintenance The IRIS topology is logically partitioned into grids, and we place at least one resolver per grid. Assuming a transmission range of λ meters, we choose grid sizes to be λ2 × λ2 . Resolvers periodically advertise themselves using beacons, which are used by hosts and other neighboring resolvers to maintain a list of proximate resolvers. Since the transmission range extends well into the adjacent grids, hosts end up having a strong list of potential resolvers to service requests. When hosts (and neighboring resolvers) fail to receive three consecutive beacons from a resolver, it is marked “down” and its entry deleted from the neighbor table. In effect, our routing table is simplistic in that it only requires local states to be maintained. No global routing state is maintained. This has direct consequences: (i) resources may join and leave without registering or de-registering with any central repository, and the current list is assured to be stable and correct; (ii) users shall continue to be mapped to resources even though such a mapping may change in time with network dynamics, and (iii) when a resolver fails, the failure is gracefully handled with simple changes in neighbor tables in its local vicinity. B. Simple queries
resolvers with a copy of the query. These resolvers in turn extract available resources in their vicinity. The resulting set is not collated at the gateway resolver to suppress duplicate resources, and the list communicated to back to the user. This is depicted in Fig 5. V. P ERFORMANCE A NALYSIS The most computationally intensive steps are the ones involving recursive parsings. The following Lemmas establish algorithmic analysis of IRIS at an abstracted level of tree based recursive searchings. Refer to Fig 7 for the abstract model chosen for XML representations. The following table defines the symbols used in our lemmas and corollaries: na d ra rv T (d)
Number of attributes present One half of the depth of name tree Range of attributes possible Range of values possible Time to parse at depth d
Lemma 1: Number of non-conflicting parameters/attributes required to completely define a resource has a strict upper bound
C. Routing IRIS Guidance
Proof: A non-conflicting attribute is one that does not contradict another attribute already present in the query (eg., a black-and-white color printer is conflicting, in the sense the user wants a color printer after explicitly asking for a black-and-white only printer). This lemma states that there is a definitive number of attributes, k, that can be used to completely define a resource type. When an end user tries to describe a resource to IRIS, he uses a limited set of attributes to describe what he wants. In normal circumstances, it takes a maximum of six attributes to precisely describe an intent to IRIS.
When an exact match to a query cannot be made or if the user is not satisfied with the results, IRIS guidance is looped. This usually means that resources available in one hop cannot satisfy the user. In such cases, the gateway resolver which could not service the request contacts its neighboring
Lemma 2: Recursive tree parsing has a complexity of Θ(nd ) for a tree of depth ”d” Proof: Parsing of the XML document precisely describing the attribute-value pairs, or av-pairs is the most computationally
Hosts generates a request-XML based on user intentions and send them to the nearest resolver using a HTTP connection for parsing. The choice of a resolver is purely based on proximity (link quality or location information when available), although more sophisticated metrics may be applied (like load on a resolver itself, for example). The resolver extracts available resources that match such a query from the database, and contacts the resources for real time information about them.
7
intensive task at the end host. An XML document may be abstracted to an k-ary tree, where k is the number of attributes possible at a given node. Refer to Fig 7 for references to tree parsing in this lemma. Denote by T (d) the time required to parse a node at a depth d. We have the following expression describing the recursion: T (d) = na (ta + tv + T (d − 1))
T (d) = na (t + T (d − 1)) Assume the anchor case of T (0) = b, we have on expansion:
= na (t + T (d − 1)) ¡ ¢ = na t + na na (t + T (d − 2)) ¡ ¢ = Θ nda (t + b)
= =
For an XML document with strict ordering of attributes, values are only present at the leaves, and there is only one value for an attribute. Intermediate nodes are heads of other nodes. We hence have tv = c, where c is a constant. The expression ta + tv can be rephrased as ta + c or simply t, where t = ta + c. Rephrasing the previous recursion, we have:
T (d)
T (R) =
(1)
¡ ¢ A linear search results in T (D) = Θ nda · (ra + rv + b) , where ra and rv are ranges of the attribute and value fields. The above is true because for the search tree ta ∝ ra and tb ∝ rb for linear searches. This complexity seems to be rising with the value of na , but as observed in Lemma 1, there is a finite upper bound to it. During implementation, we have found na < 6. XML parsing of this order happens in the following cases: (i) parsing of customization XML at the end host to understand the semantics of the query, (ii) XML parsing at the resolver to decode the av-pairs of the query and (iii) parsing of the result in XML sent by the resolver to the host. Of the three cases, we have found the second parsing (query at the resolver) to be the most extensive since the possibility of the file being abound with all the parameters exists. Corollary: IRIS guidance is a linear cumulation of tree based recursive searches Proof: This lemma analyzes IRIS guidance. Assume a user query with n attributes. When IRIS returns a set of results based on this query, and the user wants to broaden the search within the existing query, IRIS knocks off the least important parameter (with n − 1 parameters). If the user is still not satisfied, it knocks off one more parameter and so on till a wildcard ”*” match of all the resource types are made. The following expressions captures the complexity of this step:
T (R) = T (n) + T (n − 1) + T (n − 2) + . . . + T (n − k) where k is the number of times the user lets the guidance loop (for k ≤ n). Upon expansion using Equation (1), we have:
k≤n X
T (n − k)
k=0 k≤n X³ k=0 k≤n X
´ (n − k)d − 1 · t + (n − k)d−1 b (n − k) − 1
¡ ¢ Θ (n − k)d (t + b)
k=0
¡ ¢ = Θ nd (t + b)
(2)
It is obvious that a single query (with no guidance) is a special singular case of k = 0, as the value of k increases, the values of (n − k)d diminish and converge to the case in Lemma 2. Lemma 3 More localized the traffic patterns, more is the saving on bandwidth, memory and energy consumption of participating devices Proof: To be successfully operating, one needs to have a resolver for every networked area. If the deployed area to be IRIS-enabled is large, one logically breaks it down into smaller grids that exhaustively cover the entire area, and each such grid having at least one resolver. In other words, every host should have at least one resolver in its proximity. For reasons of load balancing and robustness, and assuming the transmission range of a resolver to be λ meters, we will assume the grid sizes3 to be λ/2 × λ/2 Consider a topology of p × q square meters. Assuming the range of a resolver to be about λ meters, the minimum number of resolvers we need is given as: p×q Rn = 4 × d (3) e λ×λ Consider an epoch with n sessions, with each session generating xi number of packets, such that the total number of packets generated is given by (the ideal case): Pideal =
n X
xi
(4)
i=1
Pideal is the total number of packets generated at the host, and should be that number only for the most efficient usage of bandwidth and energy (for a goodput of one). Resolvers start generating duplicates permeating to the entire resolver network only for inter-group communication. In normal cases, they will return the most imminent resource available for usage. Denote by p the probability that the session is local (one hop) and by p the probability that the session is not local. Also note that the following implies: p=1−p
(5)
Using equation (4), we have the total number of packets generated as: 3 For such a configuration, the host is never deserted without a resolver even if a present imminent resolver fails. This is because resolvers in adjacent grids can easily service the host, since their transmission range exceeds into deep corners of adjacent grids [27]. Also, having multiple options for resolvers alleviates problems of growing congestion and load balancing.
8
Number of lookups (per sec)
1200
Fig. 8.
1000
800
600
400
200
Map of the floor where IRIS is implemented
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Number of Names in Namespace
PT otal
=
n ³ X
pi · xi + 4 × p · d
i=1
= =
n ³ X i=1 n X
¡
4·d
´ p×q e · x i λ2
Fig. 9.
p×q p×q ´ e + p (1 − 4 · d e) xi i λ2 λ2
¢ Rn + pi (1 − Rn ) xi
A. Name lookups (6)
i=1
The number of excessive packets generated is as follows: (7)
Pδ = PT − Pideal
For optimum protocol performance, Pδ should tend towards zero, or in other words, Pδ → 0. Restating the same using previous derivations is: n ³X i=1
xi −
n ¡X
(Rn + pi (1 − Rn ))xi
¢´
→0
Performance of name-lookup v/s number of names in name space
(8)
i=1
The above relation shows that larger the value of p (or in other words p >> p), the duplicates generated are less and the system tends more towards ideal bandwidth and memory usage. This observation can be extended to general observations of traffic patterns.
VI. I MPLEMENTATION P ERFORMANCE We have implemented IRIS in specific buildings at the Infosys Bangalore Campus [1]. IRIS was run on standard IBM PC’s, with configuration of an Intel P4 and 256 MB or memory. We use SQL-Server for the database and operate with Xerox DocuPrint printers, along with inkjets for direct implementation. We analyze IRIS performance for CPU utilization, bandwidth, memory and time. We profiled IRIS to analyze it in real time. IRIS was run in seclusion to all other applications at the time of the tests. However, depending upon utilization by other machine modules already running (particulary the operating system and background processes), the CPU and memory utilizations return different values on different runs, albeit extremely close. Each test was run ten times and we recorded the average thus obtained from the runs.
One factor governing scalability is the ability to handle a large number of requests. We constructed an XML document in which we inserted random entries into the customization.xml file from 1 to 10, 000 in steps of 20, and recorded the time it took to perform a single name resolution (say tr ). The inverse of this (1/tr ) gives an indication of the number of queries per second that can be handled at a host (See Fig 9). This is true since tr gives a precise estimate for an independent query. The number of name-resolution went from a maximum of around 11000 per second (for a single entry in the customization) to a minimum of around 948 for 10,000 entries. The slope of the graph is around -1.005 lookups per second per names. Since the slope is fairly close to one, it is indicative that performance does not degrade too fast and is almost constant with a minor negative slope. B. Memory and CPU utilization For the same experiment, we recorded the memory allocated by the Java interpreter. The memory allocated should be a constant amount greater than the customization.xml file, which houses the name resolution information. The allocation was more or less uniform and greater than the actual file size by a small constant amount, owing to system pointers and computational workspace. The allocation went to a maximum of 1.48 MB. We have re-plotted the INS result alongside in same graph as well for a direct comparison. The difference in values exist because the name-resolution files in INS are a lot more informative and bulkier. We also recorded CPU and memory saturation (Fig 10) for an increasing number of names in the customization file. Our recordings are consistent with that of INS, and we found that the CPU saturated faster than bandwidth. We found that the CPU saturated at around 9800 entries (this is almost twice the value of INS. This is largely because we use an Intel P4 processor at a different clock speed compared to the INS experiments). Memory utilization was at a mere 3.45% (of the available 256 MB) even at 20000+ entries.
9
100
90
Saturation (percentage)
80
70
60
50 CPU Bandwidth Memory
40
30
20
10
0
0
0.5
1
1.5
2
Number of Names in Namespace (x10000) Fig. 10.
2.5 4
x 10
Percentage utilization of CPU, Memory and Bandwidth
Size of Namespace in interpreter (MB)
3.5
3 IRIS INS 2.5
2
1.5
1
D. Quantifying user satisfaction
0.5
0
the time taken is small. This is because inter-resolver routing is largely based on controlled flooding, which takes the least time compared to any routing protocol. This is evident since time to route a packet to a cached route is very comparable to a routing without cache. We have reproduced the INS time to discover a new name alongside for a direct comparison. There is an issue of the number of excessive packets that might be generated as the number of hops from the host increases, though the time it would take would always be a bare minimum since the method is controlled flooding. However, we have found that the average number of packets that are generated in a typical IRIS session to be much lesser than the average packets generated with other routing protocols because a combination of the following reasons: (i) IRIS will try its best to direct a user to imminent resources since it naturally tries to avoid flooding. This works good in two ways: one, the user gets a resource for immediate consumption, and two, excessive inter resolver traffic is avoided.; (ii) IRIS incrementally contacts neighbor resolvers: if a resource is not found immediately, and user lets the guidance loop, IRIS will contact its immediate neighbors first, and report results, and upon more calls for guidance, will slowly propagate deeper into the network. In effect, even flooding the resolver network is taken slowly with extreme caution; (iii) triggered updates typically use a variety and a large number of control packets which have to be transmitted to the entire network. Since this is avoided altogether, a large number of overhead packets are avoided.
0
5000
10000
15000
Number of Names in Namespace Fig. 11.
Size of the namespace in mega-bytes
C. Name discovery We handle network changes with a slightly different philosophy, and we consider changes as a norm rather than an exception. When a resource cannot be monitored by a resolver by its overlay or it stops advertising itself with soft state information, we do not propagate this to the entire network. Instead, only the resolver(s) which was monitoring the resource directly makes a silent note of this phenomenon. A request to a resolver for a resource is resolved with a philosophy of getting the closest such match to the user, unless the user explicitly asks otherwise. In effect, we try to keep network traffic as local as possible in accordance with Lemma 3. We plot the time to discover a resource (not a new name) as a function of the number of hops it is from the host (Fig 12). We found that time taken is dominated by message routing more than any other factor. Time to discover resources in the vicinity is a minimum, and most efficient as far as time, memory, CPU and bandwidth utilization are concerned. However, even as the number of hops increase from the host,
We collected opinions from 35 end users of IRIS to study user satisfaction. Subjects were asked to report a mean opinion score (MOS) in the range of 0 to 5, with 0 being the worst opinion and 5 the best. Particular attention is paid to the variation of this score with increased usage to capture the effectiveness of IRIS in understanding its end user. For this experiment, we consider 18 printers scattered across two floors. The average MOS thus obtained from various usage levels is shown in Fig. VI-D. Users were able to add new words to the IRIS vocabulary with increased interactions, and we found that MOS value particularly shot up with IRIS understanding these new words. Satisfaction again being a subjective metric, no two users reported a similar variation in MOS with increased usage. The plot however projects an increased satisfaction with continuous usage, establishing the effectiveness of subjectivity based searches. VII. C ONCLUSION Resource discovery is an important problem to solve, and with resources becoming deeply embedded in dense networked environments, revenue generation will come from advertising and discovering resources. We believe resource discovery with the user, his intention and his applications as the nucleus is an excellent way of approaching this. We are in full support with work like INS, and our design complements such approaches. We have successfully designed, implemented, tested and deployed IRIS, an intention based resource discovery mechanism
10
90 IRIS (No Cache) IRIS (Cached) INS
80
Time (ms)
70
60
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9
Number of Hops Fig. 12. Performance versus the number of hops between the host and the desired resource in question.
5
Mean Opinion Score (MOS)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
IRIS usage (number of times)
Fig. 13. Mean Opinion Score as reported by 35 subjects using IRIS to discover “printers” in a campus with varying levels of usage. A score of 0 denotes the worst opinion while 5 denotes the best opinion.
that lets users freely sketch their imagination, understands them, and evolves to understand more and more of the user with increased usage to a point where IRIS itself disappears and the users understood intentions always find him his type of a resource. We have deployed IRIS for ”printers in a building”. We have achieved the following parameters in our quest for such a design: precision, user satisfaction, context awareness, free form English representation of queries, scalability and a stable design. We conclude this paper with the hope that this work spurs more research in thrust towards ubiquitous computing. R EFERENCES [1] Infosys Technologies Limited, http://www.infosys.com. [2] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan and J. Lilley, “The design and implementation of an intentional naming system”, Proc. 17th ACM Symposium on Operating Systems Principles, pp 186 - 201, Kiahwah Island, SC. Dec 1999 [3] Ahmed Helmy, Saurabh Garg, Nitin Nahata, Priyatham Pamu, “CARD: A Contact-based Architecture for Resource Discovery in Wireless Ad Hoc Networks”. MONET 10(1-2): 99-113 (2005)
[4] Ulas Kozat, Leandros Tassiulas, “Network Layer Support for Service Discovery in Mobile Ad Hoc Networks” IEEE INFOCOM 2003 [5] Karim Seada, Ahmed Helmy, “Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Networks” IPDPS 2004 [6] D. Estrin, R. Govindan, J. Heidemann and S. Kumar, “Next Century Challenges: Scalable Coordination in Sensor Networks”, In Proc. ACM/IEEE MOBICOM, pp 263 270, August 1999. [7] J. Heidemann, F. Silva, C. Intanagonwiwat, R. Govindan, D. Estrin and D. Ganesan “Building Efficient Wireless Sensor Networks with Low Level Naming”, In ACM SOSP’01, Banff, Canada, 2001. [8] L. Cheng and I. Marsic, “Piecewise Network Awareness Service forWireless/Mobile Pervasive Computing”, Mobile Networks and Applications (7), Kluwer Publications, pp 269-278, 2002. [9] S. Tilak, K. Chiu, N. B. Abu-Ghazaleh and T. Fountain, “Dynamic Resource Discovery for Wireless Sensor Networks”, Network Centric Ubiquitous Systems, 2005. [10] W. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive Protocols for Information Dissemination in Wireless Sensor Networks”, Proc. 5th ACM/IEEE Mobicom Conference, Seattle, WA, Aug 1999. [11] Diego Doval and Donal OMahony, “Nom: Resource Location and Discovery for Ad Hoc Mobile Networks”, Proc. 1st Annual Mediterranean Ad Hoc Networking Workshop, Medhoc -Net, 2002. [12] F. Sailhan and V. Issarny, “Scalable Service Discovery for MANET”, Proc 3rd Intl Conf. on Pervasive Computing and Communications (PerCom’05), 2005. [13] J. O Toole and D. Gifford, “Names should mean what, not where”, Proc. 5th ACM European Workshop on Distributed Systems, September 1992. Paper No. 20. [14] S. Czerwinski, B. Zhao, T. Hodes, A. Joseph, and R. Katz, “An Architecture for a Secure Service Discovery Service”, In Proc. ACM/IEEE MOBICOM, pp 2435, August 1999. [15] V. Jacobson, How to Kill the Internet, Talk at the SIGCOMM 95 Middleware Workshop, available from http: //www-nrg.ee.lbl.gov/nrgtalks.html, August 1995. [16] Jini (TM). http://java.sun.com/products/jini/, 1998. [17] T. Lehman, S. McLaughry, and P. Wyckoff, T Spaces: The Next Wave, http://www.almaden.ibm.com/cs/ TSpaces/, 1998. [18] G. R. Malan, F. Jahanian, and S. Subramanian, “Salamander: A Pushbased Distribution Substrate for Internet Applications”, In Proc. USENIX Symposium on Internet Technologies and Systems, pp 171181, December 1997 [19] Fred Stann and John Heidemann, “BARD: Bayesian-Assisted Resource Discovery in Sensor Networks”, USC/ISI Technical Report ISI-TR-2004593, 2004. [20] Universal Plug and Play: Background. http://www.upnp.com/resources/UPnPbkgnd.htm. 1999. [21] P. V. Mockapetris and K. Dunlap, “Development of the Domain Name System” Proc. of SIGCOMM 88, Stanford, CA, pp 123133, August 1988. [22] B. Oki, M. Pfluegl, A. Siegel, and D. Skeen, “The Information Bus (R) An Architecture for Extensible Distributed Systems”, In Proc. ACM SOSP, pp 5878, 1993. [23] C. Perkins, Service Location Protocol White Paper, http://playground.sun.com/srvloc/slp white paper.html, May 1997. [24] J. Saltzer, D. Reed, and D. Clark, “End-to-end Arguments in System Design”, ACM Transactions on Computer Systems, 2:277288, Nov 1984. [25] J. Veizades, E. Guttman, C. Perkins, and S. Kaplan, Service Location Protocol, June 1997. RFC 2165 (http://www. ietf.org/rfc/rfc2165.txt) [26] Mukundan Venkataraman and Puneet Gupta, “Stack Aware Architectures for Mobile Ad Hoc Networks”, Internet Draft, Internet Engineering Task Force (IETF). May 2004. [27] Mukundan Venkataraman and R. Bhakthavathsalam, “Proxmiate Runner Protocol for Routing in Mobile Ad hoc Networks”, ACM/IEEE Communication Networks and Distributed Systems (CNDS’04), San Diego, CA. January 2003. [28] Puneet Gupta and Deependra Moitra, Evolving a Pervasive IT Infrastructure: A Technology Integration Approach, Personal and Ubiquitous Computing Journal Vol 8(1), Feb 2004. [29] C . Perkins, “Ad Hoc On Demand Distance Vector (AODV) Routing”, IETF, Internet Draft, draft-ietf-manet-aodv-00.txt, November 1997. [30] D. Johnson, D. Maltz and J. Broch, “DSR: The Dynamic Source Routing Protocol for Multi-Hop Wireless Ad Hoc Networks”, in Ad Hoc Networking, edited by Charles E. Perkins, Chapter 5, pp. 139–172, Addison-Wesley 2001.