Beyond the Whiteboard: Synchronous Collaboration ...

Beyond the Whiteboard: Synchronous Collaboration in Shared Object Spaces Bernd J. Krämer FernUniversität Department of Electrical Engineering 58084 Hagen, Germany [email protected]

Abstract Synchronous collaboration via a network of distributed workstations requires concurrency awareness within a relaxed WYSIWIS model (What You See is What I See). Many applications let users navigate within highly structured object spaces, such as documents, multimedia courses, graphs, or nested tables, which can be distributed asynchronously. To support their manipulation through distributed teams, we provide a novel interaction paradigm, called finger. A finger serves to highlight objects of interest within the shared object space. Locations of fingers, their movements, and changes inflicted on objects can be signaled by means of operation broadcasting to other collaborators who need to be aware of it. Other than telepointers, fingers do not require window-sharing and are independent of the actual object presentation. This paper describes the interaction paradigm and basic set of operations. It uses a tele-teaching scenario to illustrate its features. However, its collaboration principles apply to many other CSCW areas, like shared authoring, trading, scheduling, crisis management, and distance maintenance.

1. Introduction The Internet is becoming the universal exchange ground for information suppliers and customers. As activities on the electronic market become more and more involved and stretch beyond the mere exchange of information, it becomes clear that networks must become more collaborative. They also must simulate some of the elements of the social processes – like room sharing, over-the-shoulder looking, working hand-in-hand – which are at the heart of business transactions, educational activities, decision making, and problem solving. Electronic Enterprise Engineering and Web-based education and training systems are examples of rapidly growing application domains for which such capabilities are of utmost importance.

Lutz M. Wegner Universität Gh Kassel FB 17 Mathematik/Informatik 34109 Kassel, Germany [email protected]

Collaborative learning and problem solving, for instance, involve negotiations, design alternations, debates over solution ideas and what-if scenarios, or last-minute changes before the homework must be submitted. These long and costly interactions must be supported by suitable tools for synchronous, distributed collaborative learning, which forms a sub-domain of Computer Supported Collaborative Work (CSCW). Other domains which require collaborative multimedia tools are team science and tele-medicine as mentioned in the comprehensive overview presented in [3]. Current technology tends to focus on either asynchronous groupware, i.e., email exchange and placement of data into intranets, or conferencing tools with video transmissions, shared whiteboards, and chat rooms. Commercial products supporting these modes include Lotus Notes, Microsoft NetMeeting, and CoolTalk.

1.1. Shared Views Today’s synchronous groupware systems are usually a combination of

desktop conferencing systems with applications for collaborative work across workstations, like e.g. in Xerox PARC’s Colab project, desktop conferencing systems with shared screens or shared windows, for which e.g. Rendezvous is a toolkit, electronic meeting and decision rooms, like e.g. GMD’s DOLPHIN, and media spaces, like Ishii’s TeamWorkStation. All of them adhere to a relaxed WYSIWIS model (What You See is What I See) to establish a common ground for discourse. But they differ in what can be shown: whether objects can be manipulated by each participant, how communication between users occurs, and how much each participant is aware of each others location, activities, or focus of attention. It is generally agreed that the initial brain

storming phase profits most from video conferences and shared whiteboards. They are important for establishing a social contact and for quick sketches. But they remain strangely detached from the actual objects of discourse. To support awareness, facilitate communication, foster creativity and establish some form of social contact, two techniques occur almost universally within synchronous collaboration environments: video transmissions with talking head transmissions or stage views on the one side and shared whiteboards for free drawing or annotating preexisting material on the other side. In the case of TeamWorkStation’s ClearBoard metaphor, this is even closely combined and users talk through and draw on transparent glass windows. If applications are involved, like shared authoring of a document or shared synchronous editing of a spreadsheet, screen sharing or window sharing can be added. However, like with sharing the whiteboard, problems arise in keeping views consistent in real-time and synchronizing change. Furthermore, display broadcasts increase the already high bandwidth requirements stemming from the video distribution.

data artifacts in [3]. Our approach makes use of the fact that courseware is based upon an abundance of highly structured objects: documents, maps, 2- and 3-D drawings, and graphs. Views on this space (views in the database as well as in the interface sense) need to be distributed and simulation and animation tools must provide synchronized access to the model data.

1.3. Fingers We suggest the concept of a finger to navigate amongst and manipulation of a set of objects, which – in most cases – will have a complex structure on their own making descending into them and escaping to their outer containers necessary. To remain independent of particular representations and to be applicable across a wide area of collaboration domains, we propose some type of general complex object data model as a suitable base for the collaboration network. More precisely, the model is a graph where nodes represent objects and edges express whole-parts relations and links to shared subobjects. Often, collections of complex objects just form a tree. This is, for example, the case for the course material depicted as a graph in Fig. 1.

1.2. Structured Application Spaces The main reason for the dominance of shared whiteboards, video conferencing and chat tools in the domain of synchronous CSCW is the relative lack of collaborationenabled application software. When existing applications are ported into this domain, they usually are collaboration transparent which means they are either single-user applications wrapped into some distribution layer or they are multi-user transaction-based systems, which, by definition, implies isolation from concurrency. Both cases are poor platforms for collaboration awareness. The alternative, namely to write software particularly geared towards synchronous collaboration, seems prohibitively expensive. Therefore, considerable efforts exist to provide mediator software, also called middleware, which would offer suitable communication and participation modes. Results from a recent workshop on this topic with many additional pointers can be found at http://www.objs.com/workshops/ws9801/report.html. The same holds true for the subdomain of collaborative teaching tools, like CSILE, CoVis, CLL, and ADVLEARN, which compete with collaboration tools on the Internet, like Hyperwave, O’Reilly’s WebBoard, GMD’s BSCW and TeamWave. Again, these approaches to synchronous learning networks are based mainly upon view sharing which implies that collaborators remain detached from the objects of discourse, their structure and fine-grain interrelation. In contrast, we aim at a database managed, visualized, shared object space similar to what is called multiple shared

Figure 1. Instance of a course shown as a tree in IFO-like notation Fingers are similar in nature to telepointers but expose a completely different implementation and potential. A telepointer assumes sharing of windows. A finger is independent of the concrete presentation of objects and can sit on a Gantt chart on the screen of one user and a Pert diagram shown on the screen of a second collaborator. A finger may point to a complex object and then be used to navigate down to the level of subobjects, which holds true for telepointers

only with restrictions. In addition, the focus of a telepointer can be fuzzy, i.e., circumvent multiple unrelated objects, whereas a finger always points to an atomic node or a logically coherent set of nodes designated by a composite node. In the next section we outline our finger concept and emphasizes the difference between picture passing and operation passing. The latter is particularly acute when the available bandwidth is small as finger operations are simple. Next we present the operation passing mechanism underlying the finger concept using a distributed learning scenario. Then we discuss minimal requirements collaboration objects must satify to enable collaboration. We conclude with a sketch of a technical solution that relies on TclDB, a navigational language that ties DBMS, visualization and distribution over the Web together. Users are invited to connect to http://www.db.informatik.uni-kassel.de/ escher/tcldbTclet/Welcome.de.html to get a first impression, even though some of the concurrency features are not operational yet.

2. Focus and Finger Operations Unless users are involved in initial brainstorming sessions, most synchronous collaboration is built upon existing material. We may therefore assume that the object space has been distributed asynchronously, e.g., over the Internet, on CD-ROM, by satellite or cable broadcast. Where parts are missing or need updates, they are fetched on demand from a server and are inserted into the partial database at the client site. Demand is created when a user navigates into territory which is not available locally. This can happen in two ways.

A user establishes a new residency (nimbus) by moving to an object. A user shifts his attention (focus) by looking at a new object. The terms nimbus and focus were introduced by Rodden [4] in his formal model of spatial awareness, which includes people, information and other computer artifacts as objects, but does not consider nested objects. The first type of move corresponds to the highlighting of objects on the screen using, e.g., a telepointer. This can be achieved by some picture passing mechanism similar to the X-window display protocol. This is sometimes called the centralized approach because the actual objects remain at the central collaboration server. Examples for this form are NetMeeting, XTV, SharedX, and Jupiter. Alternatively, one can choose a replicated approach which caches local copies of the collaboration elements. This requires elaborate synchronization protocols and is similar to distributed transaction systems [5]. Most CSCW text editors work according to this principle. Examples

for the replicated mode are DistEdit, Grove, GroupKit and MMConf, all referenced in [3]. A third approach, which we adhere to, is to work with a centralized object server that distributes visualization operations to be performed at the collaboration clients. A client, in turn, accepts navigational and editing commands from the participants. We call this operation passing to contrast it to view passing. We believe that, much like in display generation through command passing (e.g. in PostScript), it can save bandwidth and can easily be adapted to user preferences. For the actual pointer to the object we will introduce a new term, finger, with the convention that at any time a finger points to a single object. We should also mention that a user may have several fingers opened, but only one of them is the active one and can be used for manipulating objects and that a finger has an ownership, which is private or public. Within a graph, a finger corresponds to a node in the graph. On the implementation level, a finger corresponds to the (invariant) address of the node, in OODB-terminology to the object identifier (OID) of the object. In the strict hierarchical model, it can also be seen as the path from the root of the object tree to the sub-object to which a finger currently points. In Fig. 1 we have depicted the path corresponding to the finger on the link of 5.130 with solid lines. The second move occurs when a user slides his window to other parts of the object space which usually is much larger than what will fit onto a screen. This shift of focus is independent of where the user has placed a finger. Clearly, this approach supports synchronous collaboration very well. As a user slides his window to other parts of the object space, his finger disappears from the visible area but those of others might appear. Inversely, his locations become visible to others. Fortunately, the most common situation is that both subspaces coincide, i.e., a user has his window placed in such a way that his active finger – or a large part of what is visible of it – is centered in the window. Indeed, much of the display activity at the client side stems from the need to redraw a window as it must follow the movements of the active finger. The finger paradigm is part of the visual database editor ESCHER [6], which is based on a nested relational model. There, finger movements and visualization are closely tied to nested tables, but the paradigm fits any tree-structured data and can be adopted to multiple visualizations as well.

3. Operation Broadcast To illustrate our point, consider a scenario that involves two students, A and B, and tutor T. All three are engaged in an argument over some details in the design of a software controlled production cell, which is used as a case study in

a software engineering course and must obey strict safety requirements [2]. We assume the players are connected through a low bandwidth network of workstations. The abstracted production cell has feed and delivery belts, a rotary table, a robot with two arms, and a press. Fig. 2 shows a VRLM animation of the cell, which our students can access over the Internet. The argument is about the safety requirements which say that the press may not operate, while Arm 2 of the robot is pointing towards the press and is also extended (into the press).

to putting an address on top of the stack which represents the path to the object, it is also known as the push operation. Equivalently, pop means moving out (escaping) to the enclosing object. Next and back move from one sibling to the next, provided one has not reached the end, respectively start, of a collection yet. Readers might be surprised that both ”arrowdown” and ”arrowright” denote the same operation: remember that we talk about logical, tree-like representations of objects, which are independent of any visual (physical) representations. Thus, in Fig. 3 ”next” implies moving the finger to the next component, which is the line below. Would the finger have rested on, say, the ”authored” date, next would be the ”revised” date, which is to the right in Fig. 3. Thus both keys on the keyboard map to the same logical operation.

Figure 2. VRML representation of production cell We assume that A, B, and T possess (electronic) course material as indicated in Fig. 1 and additional formal and informal specifications of the safety requirements. Learner A asks about the cheek and places his active finger on the cheek. He does so by querying the cell, which places the finger on its set of components: feeder, deposit belt, rotary table, robot, and press. Internally, they are stored as tuples with various subcomponents. By descending into the press object, he finally arrives at the cheek component. If done iteratively, it would involve a sequence of steps from a small set of basic operations, which have several synonymous names and which are also associated directly with certain key strokes:

In, Enter, Push Out, Escape, Pop Next, Successor, !, " Back, Predecessor,

, #.

More precisely, going into a complex object means moving a finger to its first constituent subobject; in a tuple to the first attribute, in a set or list to the first element, in a paragraph to the first word, in a production cell to the first sub-component (say the feeder belt by agreement). Because it corresponds

Figure 3. VRML representation of production cell Alternatively, fingers may be placed with cursor movements and clicks with the mouse, which is what Learner A would probably do. A mouse click always positions the finger on the atomic element whose representation (rectangular bounding box) currently encompasses the cursor coordinates. A mouse drag with the left button depressed makes the finger follow the drag (in real time, at least in our table representation) to the smallest complex object, which has starting and current end position within its bounding box. When the mouse button is released, the finger rests on this area. Note that the analysis of where the active finger should move as a result of certain mouse events is done on the client side (Learner A’s machine). It employs some internal (invisible) fingers, which traverse the object to identify the proper

target object. Now the active finger can be moved from its previous location to the new position. This new position and any other navigational operations are transmitted to the collaboration server. If tutor T has set his client into a ”follow A” mode, the server will slide T’s window within the object space to the production cell and will transmit A’s active finger operations. Within the existing instructional material on T’s client, the operations are performed with a finger marked as belonging to A and its new position appears as a highlighted artifact in T’s representation of the cell. In turn the tutor might want to draw attention to the wording of page 5.130 in, say, the full html representation generated from Fig. 3. He would move his active finger onto the link, possibly leaving another as a bookmark on the VRML-cell. If that wording was a revision of what had previously been distributed, all students moving their window onto the same sub-space (signaling this change automatically to the server) would be notified that new data exists and would receive the relevant page to merge into their material. Those collaborators not looking at the revised locations would not receive the modifications until they issue a general update request some time later. In our prototype implementation of finger support in distributed, Web-based environments, we have to cope with the fact that the Web mode of information supply is stateless. It follows the connect-get-close paradigm of information interchange and provides mostly pre-compiled pages where contents are possibly fetched from a DBMS. Thus the Web is not directly geared towards synchronous collaboration. As Fielding et al. put it with regard to HTTP and the Web: ”Although this simple model of communication scales well for simple retrieval tasks, it is not sufficient for the complex interactions in software engineering (or in any collaborative work process) [1]. Luckily, through the use of applet technologies, i.e. small mobile code loaded from a server and executed at the client side in a so-called plug-in, we can establish a connection with a server database through which updates can flow in a bi-directional way. In our case we base our development on John Ousterhout‘s Tcl/Tk, where applets are called Tclets, but other solutions based on commonly available plug-ins are equally possible. The following steps describe the remaining architectural issues.

4. Minimal Requirements for Enabling Collaboration Awareness Clearly, maintenance and training handbooks, case studies of existing installations, or documentations are created and maintained with a variety of tools and are stored in an abundance of formats. In many cases there was a grouping

of elements present at object creation time, but the stored result does not contain the grouping and ordering anymore. Fig. 4 is such an example. It shows a Petri net for the pressrobot synchronization. The net was created with a Petri net tool that maintains the logical structure of the net. In the training material, the net would loose its grouping and ordering characteristics as it was converted to a bit map object suitable for printing or for screen presentation with standard browsers. To make it accessible for collaboration, e.g., for a joint exploration of the net’s dynamic behavior using token movements, it would need to be ”opened” again to permit descending into its logical structure.

Figure 4. Petri net with finger indication We thus must aim at minimal requirements and target for ”penetrable” as opposed to sealed documents:

objects know their structure and can provide identities (addresses) of their nodes objects don’t know their structure but a shadow object can be created and stored, which recreates this structure objects don’t have a stored structure and don’t warrant creating a stored structure; however, there is a filter that understands the structure and can create it, or relevant portions thereof, on the fly. Hypermedia systems, CAD-objects stored in objectoriented or object-relational databases would satisfy the first criterion. Addresses would be URLs, values of attributes, which have key property, Record Identifiers (RIDs) or Object Identifiers (OIDs). Some graph formats including some PostScript tools have the ability to recognize substructures like polygons and polylines, frames, etc. A Structure can be extracted and stored as a graph in a shadow document with links into the actual graph. Clickable maps might also fit into this area. The shadow documents give object identities to the collaboration software that wants to position fingers on certain objects but maintains an internal mapping to the recognized subobjects. Currently we are exploring these issues by interfacing our editor with Mathematica.

5. Conclusions We believe that collaboration awareness techniques within synchronous CSCW can simulate some of the social processes which underlie traditional activities. Thus it seems to be the most attractive yet most challenging form of networked interaction, in particular for CAL. Our approach concentrates on shared object spaces rather than on video conferencing and shared whiteboards although we recognize that video conferencing helps socializing and coordinating audio contributions. For these shared object spaces we introduced fingers, which are like cursors pointing to complex or atomic objects. They are used for navigation, for drawing other collaborators attention to a particular subspace, for manipulating object structures, and they can serve as bookmarks. Hierarchical structures, like trees and nested tables are especially suited for this type of structured interaction and a small set of operations is sufficient to define their movements and actions. We also stress the point that this complex object model is largely representation independent as there are many ways to highlight a substructure within a graph, table, hypertext, formula, or bitmap. If applications reveal the structure of their object spaces, fingers can navigate within these spaces and signal their presence. To achieve a relaxed ”What You See is What I See” model, it is sufficient to broadcast the finger operations to all those collaborators whose window presently intersects with the affected subspace of the finger movement. A working prototype based on our database editor ESCHER and the navigational language TclDB exists and produces, e.g., reactive visualizations like Gantt- and PERTdiagrams. Some aspects, such as concurrency awareness, are not fully operational yet. One reason is that ESCHER has no extended transaction concept yet, which would be suitable for CSCW applications. Unlike in ACIDtransactions, isolation must give way to concurrency awareness. With multiple fingers we can simulate this, but this includes no synchronization.

References [1] R.T. Fielding, E.J. Whitehead, K.M. Anderson, G.A. Bolcer, P. Oreizy, and R.T. Taylor. Web-Based Development of Complex Information Products. Communications of the ACM 41(8):84–92, 1998 [2] B. Krämer. A case study in developing complex safety critical systems. In Hawaii International Conference on System Sciences, volume V, pages 135–143, 1997. [3] A. Prakash, H. Shim, and J. Lee. Data management issues and trade-offs in cscw systems. IEEE Transactions on Knowledge and Data Engineering, 11(1):213–227, 1999.

[4] T. Rodden. Populating the application: A model of awareness for cooperative applications. In Proc. of the ACM 1996 Conf. on CSCW, pages 87–96, Boston, Mass., November 1996. [5] R. Strom, G. Banavar, K. Miller, A. Prakash, and M. Ward. Concurrency control and view notification algorithms for collaborative replicated objects. IEEE Transactions on Computers, 47(4):458–471, 1998. [6] L. Wegner, M. Paul, J. Thamm, and S. Thelemann. A visual interface for synchronous collaboration and negotiated transactions. In Proc. Advanced Visual Interfaces (AVI’96), pages 156–165, Gubbio, Italy, May 1996. ACM Press.