Totally Ordered Reliable Multicast for Whiteboard ... - CiteSeerX

4 downloads 68787 Views 145KB Size Report
The TORM architecture provides not only a totally ordered data delivery service as the basis for maintaining consistency of whiteboard state, but also a .... If frequency of session message is lowered, recovery latency increases. Top RMS. RMS.
Totally Ordered Reliable Multicast for Whiteboard Application* Pei Yunzhang, Liu Yan, Shi Yuanchun and Xu Guangyou Department of Computer Science and Technology Tsinghua University, Beijing 100084 P. R. China Email: [email protected] Abstract Although it is impractical to design a generic reliable multicast protocol to meet diverse requirements of various applications, it is possible to provide a service that offers reliability mechanisms which are useful to a sizeable category of applications. As total ordering is an application-independent mechanism for collaborative applications to maintain consistency, providing such a mechanism in transport layer will greatly simplify the design of application. In this paper, we will present Totally Ordered Reliable Multicast (TORM), a reliable multicast protocol designed to meet the requirements of collaborative applications, especially whiteboard. The TORM architecture provides not only a totally ordered data delivery service as the basis for maintaining consistency of whiteboard state, but also a flexible data model for collaborative applications. Selective reliability is also achieved in TORM without exposing application data to transport layer, making the interface between transport layer and application layer simple and clear with improved flexibility and performance in both layers. Keywords: whiteboard, reliable multicast, collaboration, total ordering

1 Introduction IP multicast provides an efficient mechanism to disseminate data in a one-to-many or many-to-many manner, but without any guarantees of correct, ordered, or no-duplication delivery [1]. Some types of applications, such as whiteboard and file distribution are not tolerant to packet loss; they demand reliable multicast transport protocols for data dissemination in order to keep their persistent state consistent. It has been well recognized that multicast applications have a much wider range of requirements than unicast applications, and there is no TCP-like “one size fit all” solution for reliable multicast [2]. According to Application Level Framing (ALF) protocol architecture [3], the best way to meet diverse application requirements is to leave as much functionality and flexibility as possible to the application. Mechanisms that traditionally fall within the “transport layer”, e.g., loss detection and recovery should be tightly coupled with application semantics. There are roughly two types of applications that utilize reliable multicast as their transport protocol: file distribution and multi-point conferencing tools [4]. Each type of application has diverse requirements (e.g., conferencing tools are sensitive to delay while file distribution applications only care about data integrity), so the underlying reliability mechanisms differ dramatically. In file distribution applications, missing packets can be repaired after a round of file dissemination, but in conferencing tools, lost packets should be repaired as quickly as possible. This paper will focus on conferencing tools (especially whiteboard) and reliable multicast protocols they use. Since the reliable multicast protocol becomes an integral part of the application, we have to discuss them *

Submitted to CSCWD’99. Related area: collaboration technology 1

as a whole. Our particular interest lies in how specific requirements of an application shape the design of a reliable multicast protocol, rather than some normally discussed issues, e.g., protocol implementation or performance evaluation. Section 2 makes some discussions on related work. Section 3 describes our solution, Totally Ordered Reliable Multicast (TORM) designed to meet the requirements of collaborative applications. Section 4 describes some implementation issues of SameView which is a whiteboard application built on top of TORM. Summary and future work are covered in section 5.

2 Related Work SRM (Scalable Reliable Multicast) is an ALF based framework for reliable multicast. SRM was first implemented as an embedded component of wb, a widely used distributed whiteboard application developed at LBL. The successor of wb, MediaBoard also uses SRM as its transport protocol [6]. SRM has gained intensive research interest since it was first proposed, so in this section, we will make some detailed discussions on SRM and the application built on it, MediaBoard.

2.1 SRM Data Model The core of SRM framework is a hierarchical naming scheme, which is defined in Scalable Naming and Announcement Protocol (SNAP) [7]. SNAP is superior to traditional flat sequence number name space in that it allows application layer objects to be mapped into transport layer, so that receivers can tailor their reliability by requesting only portion of the name space. Figure 1 shows how MediaBoard objects are represented in this naming scheme: Source S Page A

Page B 1

2

3

Page C 4

Figure 1 MediaBoard Name space Each member in a session is identified by a globally unique identifier, the Source-ID, and each page is identified by a Page-ID which is a page number globally unique in the session. The sequence of drawing actions on a page is translated into a series of MediaBoard descriptors. Each descriptor has a globally unique name called UID comprised of Source-ID, Page-ID, and sequence number within the page. All data in the name space is persistent, i.e., data once created is never destroyed or replaced. Persistent data model is a prerequisite for receiver-initiated protocols (e.g., SRM), for the receiver set can be kept anonymous to source and necessary data for retransmission can be easily retrieved from the application whenever a receiver request it (e.g., a receiver can request data item by its name “Source S/Page B/3”). However, disadvantages of persistent data model are also obvious: The source has to cache all the data it has generated, even if much of it is redundant. Suppose a circle on a page has been moved 10 times. 10 move descriptors are created and cached, but in subsequent screen updates after these moves, only the last descriptor is useful. The receiver has to cache all the data received from all sources to detect duplicated packets, also with a great deal of redundant information. A container is the unit of selective reliability [6], because the application doesn’t have enough knowledge to figure out which descriptors in the page are really crucial, and which are transient and have been made 2

obsolete by subsequent descriptors. When a receiver requests a page, all data in a page has to be transmitted. If there have been several large bitmaps added to and later removed from the drawing space, most of the bandwidth would be taken up by objects that will never be drawn on screen! Although one of the main purposes of receiver-tailored reliability is to minimize data transmitted on network, there is still substantial waste of bandwidth with persistent data model. Receivers have to recover naming space before recovering actual data, so an extra delay is introduced and throughput decreases.

2.2 Concurrency Control of MediaBoard A collaborative application (e.g., whiteboard) must provide a means of concurrency control to ensure that shared data held by each instance of the application be consistent throughout the session. SRM leaves all the burden of concurrency control to the application, because SRM is designed to meet only the minimal definition of reliable multicast, i.e., eventual delivery of all data to all group members, without enforcing any particular delivery order [5] needed for maintaining consistency. Unfortunately, the way MediaBoard process descriptors may violate the consistency requirement. Each descriptor has a timestamp based on local clock. SRM assumes that clocks in participating computers are synchronized using external mechanism such as NTP [6]. In order to improve responsiveness and to reduce perceived latency, descriptors are handled to application as soon as possible. A descriptor is considered to be dependent if it references a UID of another descriptor (e.g., a move descriptor always depends on its target define descriptor). Independent descriptors are evaluated upon reception. Evaluation on descriptors that depend on missing descriptors is deferred until repairs for the corresponding descriptors have been received. However, descriptors without referencing another UID could possibly be inter-dependent (e.g., if two define descriptors are evaluated in different order, display order of the two objects will differ). As a result, MediaBoard gains responsiveness at the cost of consistency. If undone and redone are applied, the least level of consistency – final result convergence could be achieved; i.e., when a descriptor with an “older” timestamp arrives out of order, undo all actions before this descriptor and redo them all. However, this approach may still produce temporary inconsistency, and the user is likely to be confused by contradictory results from time to time. To achieve a higher level of consistency, a global causal order should be imposed on each descriptor and all descriptors must be executed in the same order at all sites throughout the session.

2.3 Protocols Supporting Total Ordering Reliable Broadcast Protocol (RBP) [8] provides totally ordered delivery of data to all group members. One of the members is designated the “token site” that is responsible for acknowledging packets sent from source. The role of token site rotates among all group members to avoid bottleneck on a site. The token site retransmits all missing packets upon requests from individual receivers. Each ACK has a timestamp used by receivers to order packets they receive. The order of packets is based on the order they arrive at token site, which may not be the order sent from source. Reliable Multicast Protocol (RMP) [9] extends RBP with flow and congestion control mechanisms and better handling of membership changes.

3

3 Design Overview of TORM 3.1 TORM Architecture One issue reliable multicast protocols have to face is the acknowledgement implosion problem. Local Group Concept [10] splits the burden of acknowledgement among all members by dividing the multicast group into Local Groups, each represented by a Group Controller for local retransmission and local acknowledgement processing. RMTP [11] takes a similar approach where a set of selected members are designated as Group Leaders and organized into a hierarchical structure for local recovery. One problem with these protocols is that Group Controllers (or Group Leaders) are selected among members; if one of them quit during the session, a new Controller has to be re-selected and all its descendants are subject to a long delay. For a whiteboard application, each group member is an instance of application used by a human-being who might leave the session at will, and other members should not be interfered by the leaving one. Although accidental failure of a Group Leader can still result in re-selection, members of a session become independent of each other, and robustness of the protocol is improved. We apply a client/server architecture to the design of TORM, in which Reliable Multicast Servers (RMS) function as a pure controller of a local group. Reliable Multicast Clients (RMC) find the nearest RMS and register their interests for reliable multicast service. The metrics for defining distance between RMS and RMC can be “number of hops” or other parameters, e.g., delay, bandwidth, or cost of fee. The architecture of TORM is depicted in figure 2:

Top RMS RMS RMS

RMC

Figure 2 Architecture of TORM The RMS at the root of RMS tree is called Top RMS. In order to achieve total ordering, all RMCs that have data to send must first unicast data to Top RMS, and the Top RMS adds a globally unique sequence number to each packet and multicast packets to all RMCs. If multicast is not available for some RMCs, a multicast “tunnel” has to be established to relay multicast packets to their destination. Top RMS also multicast session messages containing the highest sequence number to let RMCs detect trailing losses. Once a RMC has detected a missing packets, it unicast a NACK to the RMS in its local group which replies with missing packets. If the RMS itself has lost that packet, it will turn to its parent RMS for that packet. A RMS will not remove packets from its cache until all its direct descendants have correctly received and acknowledged them with an ACK. In TORM architecture, there is actually only one multicast source, the TOP RMS, so that tree structure can be applied to solve acknowledgement implosion problem. Another advantage of single-source architecture over multi-source architecture like SRM framework is that bandwidth consumed by session message is minimized. In SRM, all sources keep sending periodic session messages, consuming more bandwidth as the number of source increases. If frequency of session message is lowered, recovery latency increases.

4

3.2 Whiteboard State In TORM architecture, there's a distinct demarcation between application layer and transport layer. The transport layer provides a simple "generic message" interface that can be applied to a variety of application semantics. On the other hand, the application layer only deals with application-specific data. Each instance of application maintains an internal data structure for whiteboard state, which is a "final state" other than the "operation history" maintained by MediaBoard. The final state is a list of pages, each of which contains a list of drawing objects whose attributes, e.g., position, color, are always up-to-date. When a user has made some changes to an object (e.g., move to a new position), the operation is packed into a message and sent to Top RMS by a reliable unicast mechanism. A message can be arbitrarily large; fragmentation and reassembly are preformed automatically at transport layer. After the message is echoed back from Top RMS by multicast, each RMC reads the content of message, modifies the attribute of corresponding object, and then discards the message. Each RMS also keeps a copy of whiteboard state and updates it in the same manner. A late-join RMC can get whiteboard state from local RMS without bothering the data source. Immediately after a RMC is permitted to join a session, the local RMS will unicast to this RMC a DOWNLOAD message containing latest whiteboard state. The RMC will setup its initial state with this message, and then begins to work like other RMCs. During initialization, the late-join RMC receives other messages like a normal RMC but won't submit to application layer until initialization is done. In SRM, a late-join member sets up initial state by requesting missing packets, but repairs are multicast from the source to all members even if the application knows that only one member is requesting it. In TORM architecture, messages are only cached at RMSs until they have been acknowledged; RMCs needn’t cache any packets it has generated or received. With this approach, no redundant information is either kept in memory or sent on network. In SRM framework, both the application layer and transport layer use the same data structure because the application must be able to provide data for retransmission. This results in inefficiency because the application layer is working on a data structure optimized for another layer. Our solution is more flexible in that the application can choose whatever data structure that best satisfies its needs without concerning about how the data would be transmitted on network, so both efficiency and performance are improved.

4 Implementation We have developed SameView, a distributed whiteboard application based on TORM architecture. Functionalities of RMS and RMC are implemented in SameView Server and SameView Client, respectively. A SameView Client can perform a number of operations, including add/remove pages, add/move/remove/copy/cut/paste objects, move tele-pointers, etc. A user list is provided for better awareness of other participants. Simultaneous operations on the same object are performed in the same order at all sites. Because the initial creation order of operations at each source is preserved, creating a large object, e.g., an image object will block subsequent operations. Out-of-order execution may improve responsiveness, but it will violate consistency requirement of whiteboard. One problem we have experienced when implementing SameView is the initial delay for a late-join client. The whiteboard state can grow very large if the object list contains a number of image objects with embedded binary data. If the complete whiteboard state is downloaded when a late-join client requests it, the client has to wait a long time before current page can be displayed. Fortunately, embedded data of an image object is immutable; i.e., the content of an image is never modified during the session. As a result, the absence of 5

embedded data will not violate consistency of whiteboard state; it is only needed when displaying the image. In SameView, embedded data of image objects is stored in a separate list, which is not downloaded when a client requests initial whiteboard state. Should a client find an image object without the presence of its content, it will issue a DOWNLOAD_REQUEST message to local SameView Server, which replies a DOWNLOAD message to this client with corresponding data by unicast. With this approach, selective reliability is achieved because embedded data is only transmitted “on demand”. Therefore, the initial download time is greatly reduced and the current page can be displayed without waiting for all the data to arrive. SRM takes a different approach for selective reliability, which may further reduce initial delay for a late-join member under some circumstances (e.g., no redundant operations have been generated), by only requesting data for current page, but the time saved is often trivial because the size of data for object attributes (e.g., position, color) is usually much smaller than that for image content.

5 Summary and Future Work In this paper, we have presented TORM, a reliable multicast protocol designed to meet the requirements of collaborative applications, especially whiteboard. The TORM architecture provides not only a totally ordered data delivery service as the basis for maintaining consistency of whiteboard state, but also a flexible data model for collaborative applications. Although it is impractical to design a generic reliable multicast protocol to meet all requirements of various applications, it is possible to provide a service that offers mechanisms for reliability which are useful to a sizeable category of applications [1]. As total ordering is an application-independent mechanism for collaborative applications to maintain consistency, providing such a mechanism in transport layer will greatly simplify the design of application. Selective reliability is also achieved in TORM without exposing application data to transport layer, making the interface between transport layer and application layer simple and clear with improved flexibility and performance in both layers. In future work we would conduct more experiments to evaluate and enhance performance and scalability of the protocol. Flow and congestion control is another issue that needs to be addressed, for multicast applications in general have the potential to do more congestion-related damage to the Internet than do unicast applications [12]. In current version of SameView, servers are placed manually by a network administrator. In future version, we wish to integrate a distributed network service management system to automatically run SameView Servers on the most appropriate place based on network topology, current traffic and the distribution of SameView Clients.

References 1. 2. 3. 4. 5.

Gemmell J, Leibeherr J, Bassett D. In Search of an API for Scalable Reliable Multicast. Technical Report MSR-TR-97-17. Jun 23, 1997 Kuo F, Effelsberg W, Garcia-Luna-Aceves J, Multimedia Communications: Protocols and Applications. Prentice Hall PTR, 1998 Clark D, Tennenhouse, D. Architectural Considerations for a New Generation of Protocols. In: Proceedings of SIGCOMM '90. ACM. 1990. 201-208 Obraczka K. Multicast Transport Protocols: A Survey and Taxonomy, ISI, URL: http://www.isi.edu/people/katia/transport-related.ps.gz Floyd S, Jacobson V, Liu C, et al. A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing. IEEE/ACM Transactions on Networking, 1997, 5(6):784-803 6

6.

Tung T. MediaBoard: A Shared Whiteboard Application for the MBone. [Master Thesis]. U.C. Berkeley, 1998. URL: http://www-mash.cs.berkeley.edu/dist/mash/papers/tecklee-masters.ps 7. Raman S, McCanne S. Scalable Data Naming for Application Level Framing in Reliable Multicast. In: Wolfgang Effelsberg, eds. ACM Multimedia 98 - Electronic Proceedings. UK, 1998. URL: http://www.acm.org/sigmm/MM98/electronic_proceedings/index.html 8. Chang J, Maxemchuk N. Reliable Broadcast Protocols. ACM Transactions on Computer Systems, 1984, 2(3):251-275 9. Whetten B, Montgomery T, Kaplan S. A High Performance Totally Ordered Multicast Protocol, Theory and Practice in Distributed Systems, Springer Verlag LCNS 938, July 1995 10. Hofmann M. A Genric Concept for Large-scale Multicast. In: Proceedings of International Zurich Seminar on Digital Communications (IZS’96), Zurich, Switzerland. Feb 1996 11. Lin J, Paul S. RMTP: A Reliable Multicast Transport Protocol. In: IEEE INFOCOMM’96. 1996. 1414-1424 12. Mankin A, Romanow A, Bradner S, Paxson V. IETF Criteria for Evaluating Reliable Multicast Transport and Application Protocols. RFC 2357. Jun 1998

7