PASS- A Service for Efficient Large Scale Dissemination of Time Varying Data Using CORBA John Zinky Linsey O’Brien Dave Bakken BBN Technologies Cambridge,MA 02138 fjzinky,linsey,
[email protected] Abstract A common class of wide-area distributed applications remotely collect time varying data and send it to consumers around the network. Some examples of these include network management, stock ticker data, and event logs. The environment in which these applications must operate often dictates the schemes for disseminating the data between the writers and the readers. If the transport channel can be optimized to match application’s behavior patterns and network resource constraints, sufficient improvements in application-level quality of service (QoS) can be achieved. The PASS1 system addresses this problem by using a flexible system of interconnected servers. PASS servers are distributed geographically around the network and are connected to readers and writers using the CORBA protocol. The forwarding policies used by the servers and the server interconnections can be customized for each application. Thus, PASS acts like an application-level multicast service with variable forwarding policies. PASS has been used to disseminate up/down status of a large number of devices to a network management system. The PASS forwarding policy used very little network bandwidth while responding to failures in half a network round trip time. Keywords: Distributed Objects, Remote Method Invocation, CORBA.
1. Introduction A classic problem in distributed systems is detecting the up/down status of a remote device. Since a device cannot communicate that it is down, another device must detect the up/down status. The basic detection scheme 1 This work was partally funded by DARPA contract N66001-96-C8529 and Rome Labortory contract F30602-96-C-0049
Vijaykumar Krishnaswamy Mustaque Ahamad College of Computing Georgia Institute of Technology Atlanta, GA 30332 fkv,
[email protected]
is called pinging. It involves sending a message to the device that is monitored, which returns the message back to the sender. When the sender receives this message, it knows that the device is up. The status information can be used in many different ways. Network management uses status to initiate repair of devices. Applications and middleware level services use status to help select which resource to use in order to deliver adequate performance and reliability [9]. Finally, end-users are interested in the reason their application does not work so they can direct their frustration at the appropriate support personnel or more constructively, to make alternative plans. Basically, these clients use status information to adapt to changes in resources and traffic. Status detection itself needs to adapt to the environment. Status should be detected locally and transferred quickly to all clients. Local detection can rapidly determine the status because the round-trip time between the detector and the device is relatively short. The transfer requirements for sending status from where it is collected to the clients are severe. For example, the population of clients can be large — often in the order of thousands — and the clients can be geographically distributed and may also be clustered arbitrarily relative to the underlying network topology. Additionally, the transfer mechanism must adapt to the client workload and the communication resources. Device status dissemination is one of many classes of time varying data applications that could benefit from support by the network infrastructure. Disseminating status to a large number of readers has application-level characteristics that can be exploited to improve the efficiency of the service. For example, consider a status display that shows the current value of a remote device’s up/down status. The display does not care about past status values. It just needs the current value as soon as possible. So if the
status service is running behind in its updates, due to lots of status changes or limited resources, then old values of status should be dropped in favor of the latest value. This policy is different for different applications. For example, a status log application would buffer events, instead of dropping. These application policies should be evaluated in the path between the writer and the readers. Thus,executing application-level policies at multiple location inside the network is a key requirement for making an effective service. But changing network infrastructure to do this is a slow and expensive process. Active-nets research [11] addresses this problem by allowing packets to be processed as they flow across the network, but Active-net technology is still years away from being deployed. To this end, we have prototyped and fielded PASS(Piece-wise Asynchronous Sample Service), a policy-based dissemination service that can be deployed today.
PASS does not introduce a new network protocol, but is implemented on CORBA, a high-level distributed object standard. PASS server object is the central element of the PASS system. It stores records and uses connections to efficiently transfer information from PASS writers to PASS readers. PASS writers translate the raw information into method calls (defined in CORBA IDL) on the PASS server object. PASS reader queries the PASS object for information. The PASS reader’s connection is highly efficient and adapts to network limitations. The PASS structure can be extended into distribution networks which can match the clients to the underlying communication resources.
PASS had a predecessor called MIST(Management Information Status Tracking). MIST was deployed as a remote poller for a network management system. Here MIST acted as a status channel between heterogeneous network management systems. It has more recently evolved into PASS, a system that disseminates status to adaptive applications for the Quality Objects (QuO) project [9].
This paper presents the PASS system. Section 2 discusses the PASS system infrastructure in detail, providing examples of some practical situations in which the system can be beneficial. In Section 3 we talk about how we can further optimize the system by distributing the PASS servers over the network. We also describe the components that would enable this distribution and present some interesting configurations of the system. Section 4 discusses a CORBA based implementation of this system. Section 5 covers a few experiments we conducted with PASS and the demonstrations in which the system was fielded.
2. PASS System We first present a brief overview of the PASS system. PASS assumes a distributed application that has many readers and writers distributed over the network. They interact in some specific pattern and use the publish-subscribe communication model to transfer data as in [10], i.e., the reader and writer are decoupled. The reader subscribes to a writer, which later publishes its data to all the members of its reader set. When this set becomes large, this method will result in a lot of messages and thus wastes network bandwidth. The PASS system was designed to transfer data between such readers and writers in a more efficient way, exploiting the characteristics of the application. The essential components of the PASS system include PASS writers, PASS readers and PASS servers. The servers act as repositories and smartly transfer data between the writers and the readers. The application level publishers and subscribers of data register for memberships with the PASS servers. On registration, the system provides the application a handle to a PASS reader or a PASS writer object. These objects communicate with the PASS server for data transfer. The data is transfered in the form of records. A record is a name-value pair. The value can be any predefined data format that is recognized by the system. Figure 1 shows the composition of a PASS system. PASS writers translate their client’s representation of the data into records and write them to the PASS server. PASS server processes this data using policies supplied by the application and writes them onto all the PASS readers listening to that channel. PASS readers convert the records read from the server to a format suitable for use by the application. PASS is analogous to the UNIX pipe. It can be viewed as a big distributed pipe which is composed of a lot of smaller pipes. Each of these smaller pipes have their own policies for transferring data through them. The writer-toserver connection is designed to run locally on the machine, or run over a local area network with adequate bandwidth. The reader-to-server connection is designed to work over the wide area. Since this connection has the constraints of limited bandwidth, it has to be associated with clever policies to transfer data efficiently. The UNIX pipe analogy is not strictly true because there can be many readers and many writers and the PASS system itself can have different configurations.
2.1. PASS Writers The PASS writer objects are used by producer applications to publish data to the consumer applications. A writer registers with a server that is co-located with it on the same machine or is in the same local-area network. Due to this proximity, frequent writer updates will not consume band-
Wrtr
Rdr
MP
SP
Pass Server SP
MP
Flow of Data
Rdr
Wrtr
LAN
WAN
Wrtr PASS writer MP Merge Policy Rdr PASS Reader SP Send Policy
time it gets a new request for a record. Based on the evaluation, it decides on the record value or appropriate time to send an update to the client. The application reads all of its data through the reader. Since all of these reads will result in messages traversing wide-area links, they can potentially overload the network. However, this can be minimized by associating clever send policies by the readers as is shown in the examples. The following is the API provided by the PASS readers to the applications.
GetRecords () GetAllRecords ()
Figure 1. PASS framework has Readers, Writers and Servers. Application-specific policies can also be hooked onto the system to optimize the data transfer. The dotted line shows the logical flow of data.
width on a wide area network. The server to which a writer registers is its home server. The application gets a handle to the PASS writer when it registers for membership with the PASS server. The writers themselves are not intelligent and forward all the information written into them to the server. The application also registers a policy (merge policy) with the server. The server consults this policy to decide on the course of action whenever the writer sends a new record to it. The API for the writers is as follows.
WriteRecord( RecordValue ) RecordValue – An instance of a Record, whose format is recognizable at the server. SetPolicy( PolicyName ) PolicyName – Name of the merge policy to be used at the server. The WriteRecord sends the record value to its home server. It is a blocking call. The SetPolicy call is used to set the merge policy for the records written into the server. In our Java implementation of PASS, PolicyName is the name of a java class that has the definition for the policy.
2.2. PASS Readers The PASS reader objects are created at applications that want to access data produced by the writers. The application subscribes to an abstraction called membership which is provided by the server. The application can also provide a policy (send policy) to the server for preprocessing the data being sent to it. The server evaluates this policy every
RegisterAtServer() GetRecordsAs() SetPolicy (PolicyName) PolicyName – Name of the send policy to be associated with the reader. The GetRecords call gets the next record. It is a synchronous call and hence blocks till the next record is available. If the reader registers as a callback object with the server using the RegisterAtServer call, then this call blocks locally, otherwise the call blocks at the server. The GetAllRecords returns all available records. The GetRecordAs is the asynchronous version of the GetRecord call. It returns a handle to a future object. The future object serves as a place holder for the record. The consumer application can reclaim the record from the future object at a later point, once it is made available. When the RegisterAtServer call is used, the server pushes the record to the reader based on its send policy. The SetPolicy call sets the send policy for that reader at the server. More details on the readers and writers can be found in the Section 4.
2.3. PASS Servers PASS servers do most of the work in the PASS system. They act as data centers and manage the connections to the readers and writers. The servers have abstractions called memberships. The readers and writers can register to the memberships and subsequently read from and write to them. The readers and writers access these membership abstractions through the server’s API. This way the server can synchronize all accesses to the membership object. There are policies associated with the memberships for data insertion (Merge Policy) and for update of the value to all the readers of the membership (Send Policy). Some of the common merge and send policies are discussed in detail in section 2.3.1
Pass Server Reader
M1
to deliver information in a user specific manner. Figure 2 shows how a channel facilitates the flow of data from one or more writers to a collection of readers.
Writer Reader
2.3.1. Policies
Writer Mn
Writer Reader
M Membership Data flow direction in a Channel
Figure 2. PASS Server. M1,...,Mn are Membership abstractions provided by the PASS server. The dashed lines indicate the direction of flow of information in the Channel. Channels can have multiple publishers and subscribers associated with them . Figure 2 shows the composition of a PASS server. The following is the API for the server.
RegisterAsReader( ReaderName, MembershipID, Policy) ReaderName – Name of the Reader MembershipID – Id of the membership to subscribe to Policy – Send Policy from the server to the reader. RemoveReader( ReaderName) ReaderName – Name of the reader to be deleted from the membership. RegisterAsWriter( WriterName, MembershipID, RecordType, Policy) WriterName – Name of the writer MembershipID – Id of the membership Policy – Merge policy for a new record. RemoveWriter( WriterName) WriterName – Name of the writer to be deleted from the membership. The RegisteAsReader call adds the reader to the reader set of a membership it subscribes for. RemoveReader removes the reader from the membership. RegisterAsWriter adds the writer to the writer set of the membership. If the membership does not exist, a new membership is created. The RemoveWriter call removes the writer from the membership. Thus a PASS writer-membership-reader, or a writermembership-readers or a writers-membership-reader combination can be considered as a Channel which can be used
The PASS server is a convenient place to implement the policies that decide how to merge and send records. Policies use server’s own resources (CPU and Storage) to reduce the network bandwidth consumed. Also policies can offer tradeoffs in the accuracy of tracking data vs. the bandwidth consumed on the network. For example if the data from the writer keeps changing frequently but the variations are all well bounded, the send policy might send one message which is the average of all the new values during that time period. Other policies are possible, here are a few examples. Translation: Detailed data can be translated into coarser values that change less often. For example, when tracking the load average of a host, ranges with hysteresis could be stored, instead of a number which is likely to be different with every update. Another example occurs when an up/down variable transitions several times between queries. The variable value could be set to some predefined FLAKY value so the client knows that the value is changing. Inference: The state of many status variables can be inferred from the state of others. For example a successful ping to a remote site implies that all the devices along that path are up. Inference policies would need to have a dependency between the variables stored inside the server object. Summary: New records can be created by the object to summarize the values of a group of records. For example the status record of a whole network cloud could be created and its value could be the percent of its routers that are up, or a bit vector with each element set or unset depending on the liveness of the corresponding router it represents. Correlation: Many variables can only be determined by correlating several other status variables. For example, all interfaces on a router must be down before the router can be declared to be down. Filtering: Some variables can be filtered out. For example, only the summary record is given in order to hide the internal structure of the network. Priority: Some variables are to be given priority over others. For example the status variable for a critical router that connects a regional network with a backbone network.
2.4. Examples The first example is a Liveness Detector application which detects the liveness of all the participants in a collaborative application with a large participant set. Normally, an application periodically pings the participants to check
Writer A=1
Server
Writer
Reader
Server
Reader
AddMember
A=1
write
B=1
write
A=1
write
B=1
write
write
B=1
write
A=1
write
B=1
write
A=1
write
T
update
T
read
A=1
write
A
B
B=1
write
1
1
A=1
write
B=1
write
B=0
write
Time Interval
SysInfo Record Avg CPU usage
B=1
write
A=1
write
B=1
write
B=0
write
Merge Policy : Update if Diferent Send Policy : Send all new Updates Reader Call Semantics : Blocking
read
Avg Memory Usage avg Nodes active
update
Merge Policy : Aggregate record fields Send Policy : Send Updates every T secs Reader Call Semantics : Register & Callback
Call Blocked update
A B 1 Local Area Traffic
update
0
Wide Area Traffic
Figure 3. Causal time line for Client Detector. The writes are more frequent. But the send policy sends a record to the reader only when there is a change in the status. Till such an event occurs, the reader is blocked at the server.
whether they are alive or not. If participant set is of size N then there will be N round trip messages for every T time units , where T is the time period with which the application pings the participants. Now consider building the same application using the PASS framework. It can be separated into two parts, a local detector component that is co-located with the participants and a display component that collects data from all the detectors and presents it to the user. The detector periodically pings the participant for its liveness. It registers with a PASS server that is located at its proximity, and in the process, gets a writer. All the detectors register to the same membership. The detector sends its ping results to the PASS server through its writer. The display component registers itself as a reader with the server and gets a handle to a PASS reader. The merge policy at the server, for the writer is to update the record to the membership only when the value of the new update is different. The send policy at the server, for the reader is to send a record only when the membership gets updated. A request for data by the application invokes the reader which makes a blocking call to the server for the next record. The server is located close to the detectors and hence the perturbations due to the periodic message from the detectors are not felt outside the local-area network. The causal timeline for this kind of interaction is shown in figure 3. As can be seen from this figure, the number of wide area messages generated for a N node system is N plus the occasional message whenever the participant flips state. Evidently there are far fewer messages between the server and the reader. The second example explains how the PASS framework
Local Area Traffic
Wide Area Traffic
Figure 4. Causal time line for System Resource Mapper. Here the reader registers with the server asking for an update every T seconds. The merge policy averages all the records that are sent to the server by the writer during this T seconds and sends the new record to the reader.
can be used in building an application to collect information on system wide resource usage and availability. The resources monitored may be the CPU, memory etc. This application periodically collects information from the nodes for CPU and memory usage. It also checks for the liveness of the machine. If the application was to ping the nodes with a time period of T and if there are N nodes in the system, then this would result in N roundtrip network messages generated every T seconds. If the same system was built using the PASS framework, there will be multiple PASS servers located close to the machines observed. The machines have detectors on them which have local writers registered with one of the servers. The display component of the application has a local reader registered with one of the servers. The detectors write the sysinfo records into the servers. A reader calls the RegisterAtServer function of the server and registers itself as a callback object at the server. The server merge policy is to update the existing record by averaging all the fields of the new sysinfo record with the sysinfo record that already exists. The send policy is to send the aggregated record to all the readers once in every T seconds. This will result in a causal time line as shown in figure 4. The average number of messages sent across the wide-area for a particular reader here is K (K < N ) where K is the number of servers present in the network. K is less than N because more than one machine registers with the same server. It is evident from these examples that by clever association of policies with the servers, the bandwidth requirements of an application can be reduced without affecting its
PASS Link Object
performance.
3. Network of PASS Servers The PASS system described above will reduce the network traffic if the readers, writers and the server are not located far from each other. If the readers happen to be far away from the writers and if the size of the reader set is large, then the network traffic deteriorates. One way to overcome this is to create multiple instances of the PASS server at different places in the network and at close proximity to the reader clients. These servers can be connected to each other in some configuration based on the application’s access patterns. This way the readers need not have to go the original server to gather data but can fetch it from a server close to them. A collection of these connections can now extend the channel abstraction to even bigger networks. The inter-server communication can be optimized based on the application behavior. This reduces the reader traffic on the wide- area network. PASS servers are connected to each other through PASS links. The tool with which we configure the network of PASS servers is called the PASS Channel Configurator (PASScc). The PASS server API has the function Connect, using which it can connect to other servers.
Connect (PASSServerLocation, PASSServerName) PASSServerLocation – Nodeid of PASS server PASSServerName – Name of the PASS server
To PASS server
R
Policy
W
Policy
W
To PASS server
R R W
Reader Writer
Figure 5. A link acts like a Reader from one server and as a Writer to another server. There is also a policy object to evaluate the data transferred on the Link. A link can be bidirectional too. If a link is enabled in the bidirectional mode, then a reader/writer pair is created in both the directions as shown.
The following is the API for the links.
SetLinkPolicy ( Direction, PolicyName) Direction – A flag indicating the direction for which the policy is to be set PolicyName – Name of a policy for that direction of link
3.1. PASS Links 3.2. PASS Channel Configurator The Links are CORBA objects that connect two servers. The composition of a link is shown in figure 5. The link has a reader=writer pair that is responsible for the transmission of the message from one server to another. The links have policies associated with them and evaluate the data that passes through them. The reader of the link is located in the address space of the server to which the connection was made. The link’s policy object is also placed in the same server process as the reader. Figure 6 shows the location of the different components of the link. As we can see from the figure, the evaluation of the message by the link’s policy takes place at the server which transmits data. This way, if the link policy decides to discard the message for some reason, the message will never get to the network. In figure 6, a unidirectional link got created between S1 and S2 as the result of a connect call executed at S2 with S1 as the argument. The connect call also takes a policy name and a direction flag as arguments. The link can be made bidirectional by setting the direction flag to bidirectional mode. If a link is made bidirectional then a reader writer pair is created for each direction.
PASS Channel Configurator(PASScc) is a program that creates the links between the server objects. PASScc can create virtual channels of any arbitrary configuration. It consists of a parser and link generator. The connections are specified in the link description language (LDL). PASScc takes in a configuration file that specifies the shape of the virtual channel network in LDL. The PASScc can create a point-to-point connection between servers or it can also connect them in some well known configurations like a ring, a star, a bus or a tree. These are all the basic configurations that can be directly generated using the LDL. Any other configuration is possible and can be considered as a collection of the basic types. Figure 7 shows the LDL script to configure a group of servers into star shaped network and the corresponding configuration. The PASScc does not create a link as soon as it parses a connection, but instead waits untill a basic component is completely parsed and then goes ahead and creates the connection for the component as a whole. This way if it encounters any errors in the specification, the channel is
Network Reference
From Membership Object
P
W
R
To Membership Object
R7 R6 Data Flow Address Space of
Star {
R2 Address Space of
PASS server S1
Center : ( Node1, R1) Corners : {
PASS server S2
P2
P3
P7
0 11 0 0 01 1 R1 0 1 1 0 0 1 0 1 P4
(Node2, R2,P2) (Node3, R3,P3) (Node4,R4,P4) (Node5, R5,P5) (Node6, R6,P6) (Node7, R7,P7)
1 P6 0 0 1 0 1 0 1 P5
R5
P Link Policy
}
R3
Figure 6. PASS Links Components and their Location. The Reader sits at the end that transmits data. The policy object is along with it. Records are evaluated everytime before transmission on the link. The writer is at the receiving end of the link.
}
R4
P2...7
Link Policy
R1...7
Servers
Figure 7. A Star Configuration and the Corresponding LDL Script. R1..R7 are the servers, Node1.. Node7 are the nodes at which they are located and P1..P6 are the policies of the links.
not created at all, thus maintaining the network topology in a consistent state.
3.3. Examples The PASS server interconnection can be configured in many different ways to match the application’s requirement of the underlying network resources and its data access semantics. A simple PASS interconnection is shown in figure 8. It shows two PASS servers being interconnected by a unidirectional link. The link has its own flow control policy. As we can see from the figure, the readers and writers register with the PASS servers close to them. This way the widearea traffic is only between the servers and it can be optimized using the policies specified by the application. It is possible to configure the system’s topology to improve the performance of a particular application. Some of the interesting configurations are discussed below.
R
R
W
W
L L
Pass Server
Policies
Pass Server
R W W
R
W LAN
WAN
Link
Readers Writers
R LAN
3.3.1. Grouping
A PASS object typically holds all the data records for the site. For example, an agent might poll many devices at one site and publish their status in a PASS object. This grouping is the smallest partition for tracking status in the PASS system. In order for a reader to track the status of one of the devices stored in an object, all the devices must be traced. Hence status variables that tend to be used together should be grouped together. For example, all the devices on status display could be grouped together.
Figure 8. A Simple Channel. The readers and writers are connected to the servers that are close to them. Any data transfer between remote clients is through communication between the servers via the link.
P1
L
Basic
Group1
R
Server R
W
Process R
Client
R
Client
Server
P1 Policy for Group1 P2 Policy for Group2
R Server
P3 Policy for Group3
Shadowed
Group2
P2
W
Server L
R
R
W
Client
Server Link
R
L
Link
Client
R
Client
Server
R
Client
Group3
P3
R Server
Server
R
R
Figure 9. A Channel supporting Grouping. The topology is a tree. Here by associating different link policies P1, P2 and P3 with the links, data can be distributed differently to different groups.
On the surface, using an object for this all-or-nothing group of status records seems to be inadequate. One would desire the ability to subscribe to only status variables the client needs. But the interface for subscribing is complicated and it becomes extremely difficult to get a right implementation. PASS’s basic grouping mechanism is simple to implement and field. Also, it ends up to be fairly powerful if configured intelligently. Over time we expect PASS to support subscribing to individual records.
Data records can be grouped along many dimensions besides geographic proximity. The grouping may be for administrative purposes to allow only selected clients to access the records. The group can be used to partition types of records. In one of the demonstrations at Joint Warrior Interoperability Demonstration (JWID) two objects were used, one to keep track of status information and another for QoS mesurements. Finally, the groups can be used strictly for naming purposes. A simple example of grouping is shown in figure 9. This example shows a PASS framework for a collaborative application which is rooted at one of the writers and has three different groups. The groups have different access rights to the message from the writer ( ex: a shared white board with privileges). The readers associate themselves with different groups. These access rights are incorporated into the framework as the policy objects P1, P2 and P3 for the links. These policy objects will now filter or transform the writer’s message so as to suit the group capabilities.
Client
Figure 10. A Shadow Channel. 3.3.2. Shadowing
This configuration technique is to have one PASS server to be a client of another PASS server. A link object would track changes in the original object and write them to the shadow object as in figure 10. The shadow object contains all the records of original, but would be located at different sites on the network. Each shadow object can service requests from a subset of readers and thus can spread the load over multiple servers. This way the number of client connections that go over the WAN to the original server can be reduced. Load sharing is an important aspect of shadowing of servers. The increase in the readerset size increases both the load on the network and the original server. Shadowing helps to spread the load over different nodes hosting the servers. This way the system performance can scale with respect to number of users as well as their geographical distribution. A more involved framework would be a distribution tree that can be formed with the root at the detector’s local PASS server object and have links with different policies as shown in figure 11. This distribution can have clients attached to servers depending on the connectivity they pay for. An application that pays for a higher bandwidth can connect itself to trunk B of the tree while the clients that pay less can be associated with trunk A. The filtering policies on the links might process the writer message based on the connectivity. If for example this framework is to be used to transfer video frames to clients, Client A located on trunk A will receive a low resolution image ( the I frames, P frames and a few B frames) while the Client B located on trunk B will receive a high resolution image ( the I frames, P frames and all the B frames). Shadowing can also be used to aggregate several objects to single objects. The aggregate object itself can be shadowed, either by making a distribution tree for the aggregate
R
Pass Server Implementation
Low Accuracy
L
Client
Server R
L
Server
R
Policy for Low Bandwidth
R
Send Policy1
Merge Policy1
Trunk A L Server
L
Membership1
Server R
Send Policy1
Merge Policy1
Server Server
MembershipN
Merge Policy1
Send Policy1
WImpl
L Policy for High Bandwidth
Server
RImpl
R R
Trunk B
Queue
Reader List Writer List
WImpl
L
RImpl
WImpl
R
W
Queue
Reader List Writer List
Server
L
R
Class Definitions P
R
P
RImpl R
R L
Server High Accuracy Client
R L
Server R
WImpl PASS Writer Implementation RImpl PASS Reader Implementation L
Links
P
Policy Definition
Policies
R
Record Definition
Figure 11. A tree based Shadow Channel.
itself or re-aggregating it at each site using the local shadow other than the original object. Availability is another feature provided by the shadow configuration. The PASS system should be configured to have multiple detectors that poll the same set of clients. The detector updates could all be merged into a single object at a shadow object that is assumed to be failsafe. The shadow object can have policies of creating the new record based on inputs from all the detectors. Examples will be ANDing or ORing all the inputs. Thus the detectors are no longer the single point of failure, increasing the availability of the system.
4. Implementation
The PASS system has multiple implementations based on C and C++, and more recently Java. The Java version was implemented on top of Visigenic’s Java ORB. The use of Java eliminated the heterogeneity problem of the underlying hardware and operating system. We were also able to make use of some of the features of the Java platform like code shipping, to dynamically load the definitions of policy and record objects at the remote nodes and instantiate them at runtime. Many components of this framework, like the server, reader and writer are all CORBA Objects. The applications must know the location of the servers a priori, so they can register to them. The PASS server hides all the CORBA interactions going on in the system under its API. This way the application level readers and writers do not have to worry about the CORBA semantics to invoke these abstractions. As far as they are concerned, they link a library that provides them handles to the required objects.
Figure 12. PASS Server Implementation. There are ReaderImplementations, WriterImplementations and Membership Objects. Readers and writers have reference to the membership object to which they are registered. The membership object has a queue of records. It also has the meta data about the readers and writers. The PASS server also maintains a local cache of the policy and record definitions.
4.1. Implementation of PASS Readers and Writers PASS readers are CORBA objects that the server provides, when the application registers as a subscriber of a membership. Whenever the application invokes the call RegisterAsReader, an implementation of the PASS reader is instantiated at the server, and a CORBA proxy of it is returned to the subscriber . PASS writers are CORBA objects that are similarly created whenever an application registers itself as a publisher of information to a membership. Whenever the application invokes the call RegisterAsWriter, an implementation of the PASS writer is instantiated at the server, and a CORBA proxy of it is returned to the publisher. Thus, PASS reader and writer objects manage the connection between the server and the application, and keep track of all the state and policies associated with this connection.
4.2. Implementaion of PASS Servers Figure 12 shows some internals of the PASS server. The server has a local implementation for all the readers and writers registered to it. The writer implementation has a merge policy associated with it. This merge policy dictates the way a new record can be appended to the queue. The writer also has a pointer to the membership object to which it is registered to. In the same way the reader has a send
policy object as a part of its state. This is consulted before a new record gets sent to the reader. It has a reference to the membership object it had earlier subscribed to. The membership object synchronizes the reader-writer access to it. The membership object has an internal queue of records. It also maintains meta data like the readers list and writers list. The server also maintains a local cache of policy and record definitions that are loaded remotely from writers and readers.
5. Experience PASS has been fielded in two demonstrations at the JWID. PASS’s original purpose was to transfer status information between network management systems. A system like PASS needs to be part of the network infrastructure in order for the end systems to adapt to network resources. Originally PASS was implemented on an ORB called CORBUS that was developed at BBN. Here are some of our experiences in fielding this prototype. In the first demonstration (JWID ’94), PASS was used to transfer the device status between two network management systems. Each system maintained a network map marking the up/down status of network devices and hosts. The PASS was used to pass status information between the two systems keeping the two maps synchronized. The PASS used much less than its allocated 2KB of bandwidth to maintain the status displays. The experiment demonstrated the benefits of the CORBA-based approach. PASS recovered flawlessly from the loss of satellite communication. The sole satellite link between the sites failed several times a day for hours at a time. When the link went down, PASS would detect the failure and mark the status map as not being updated. When the link came up, PASS would reestablish the reader-to-object connection and re-synchronize without any human intervention. This fault recovery was due to the excellent failure notification supplied by the CORBUS ORB. In another demonstration at JWID ’95, the PASS was fielded for two purposes. The first was to share real-time status maps between 5 sites around the country. The configuration for this exercise was similar to the first demonstration, except that 5 sites were fully interconnected resulting in 25 reader-to-object connections. The second use was for status distribution to applications and end-users. PASS was used by Comm-Server project which is part of ARPA sponsored JTF-ATD program. The Comm-Server makes status information available to applications and servers via CORBA based interfaces. The applications use
this information to adapt to changing network conditions. Both device status and site-to-site QoS measurements were collected and transferred to end-user displays. Each site had two PASS objects, one for status and one for QoS. A PASS management tool was developed to facilitate debugging as well as to provide a view of the overall status of the PASS system itself, i.e. a management system for the network management system. This tool displayed an iconic representation of the PASS programs involved and the flow of data between them. The up/down status of the programs, hosts, and network was indicated by changes in icon color. Additionally, CORBUS kernels and managers, had menus containing CORBUS commands to start/stop process and display helpful information. As the PASS configuration becomes more complex, we expect to spend more effort automating these tasks and improving PASS management tools. One of the operational problems with using PASS is that a CORBA ORB must be installed at each site. If the PASS was based purely on sockets, no additional infrastructure would be needed as sockets are part of any commercially available operating system. But in the recent years, most of the operating system vendors have expressed interest in including an ORB as a part of their system too and hence the lack of a CORBA-based infrastructure will not be a problem in the future.
6. Conclusion and Future Work The PASS system is a prototype of an important service that needs to be part of a network infrastructure. A new trend is for middleware, applications and users themselves to require more information about the status of the underlying resources that they use. A lot of new applications use this knowledge to adapt their behavior to the availability of network resources. If a framework or a mechanism is not provided, then the users assume the availability of resources and program for the same. If many such programs are run simultaneously, then they flood the network, affecting the performance. We built a system which through its abstractions like readers, writers, servers, memberships and channels, can effectively deal with the problem. The paper also described the CORBA based implementation for these abstractions. We hope that in future more network infrastructure will be based on application level protocols thus reducing the development time and stopping the spread of specialized protocols. We do have a working version of this system. We have started building workloads and applications for this framework. As more workloads and applications become avail-
able we plan to do a detailed evaluation.
References [1] A. D. Birrell and B. J. Nelson. Implementing Remote Procedure Calls. ACM Trans. on Comp. Sys., February 1984. [2] D. R. Cheriton, VMTP: A Transport Protocol for the Next Generation of Communication Systems Proc. SIGCOMM, pp. 406-415, 1986. [3] Michael Condict, Dejan Milojicic, Franklin Reynolds and Don Bolinger.Towards a world-wide civilization of objects. In Proceedings of the Seventh ACM SIGOPS European Workshop, pages 25-32, Connemara, Ireland, September 1996. [4] Jackobson V, “Congestion Avoidance and Control” SIGCOMM 88. [5] Karn, Partridge, “Improving Round-Trip Estimates in Reliable Transport Protocols” SIGCOMM 87. [6] Nagle, “Congestion Control in IP/TCP” RFC 896, January 1984. [7] Rose M, “A Convention for defining Traps for use with the SNMP”, RFC 1215, March 1991. [8] Partridge C, Mendez T, Milliken W, “Host Anycasting Service”, RFC 1546, November 1993. [9] Zinky J, Bakken D, Schantz R, “Architectural Support for Quality of Service for CORBA Objects” Theory and Practice of Object Systems, 3:1, April 1997 [10] Brian Oki, Manfred Pflugl, Aldx Siegel and Dal Skeen, “The Information Bus - An Architecture for Extensible Distributed Systems” SOSP 93. [11] Ulana Legedza, David J. Wetherall, and John Guttag, “Improving The Performance of Distributed Applications Using Active Network”IEEE INFOCOM 98.