Using Database Management Systems for Collaborative ... - CiteSeerX

1 downloads 0 Views 88KB Size Report
Even if several authors edit that same text at the same time, they must work on one unique document at all times. The system has to guarantee this unique view.
Using Database Management Systems for Collaborative Text Editing Thomas B. Hodel, Marco Dubacher, Klaus R. Dittrich University of Zurich, Department of Information Technology, Database Technology Research Group Winterthurerstr. 190, CH-8057 Zürich, Switzerland +41 1 635 4576 {hodel, dittrich} @ifi.unizh.ch, [email protected] ABSTRACT

BASIC CONCEPTS

Word processing applications usually contain their own facilities to store and manipulate data. If documents could be ‘native’ represented in a database management system, all available database services could be used automatically. In these paper first, we discuss possible implications of a database based editor. Second, we introduce the architecture of such an editor, and third, we explain the most important transaction of such a system, the insert of characters.

We are convinced that word processing applications should store data in a ‘native’ way in a database and then benefit from the advantages of a database management system like querying the content, restricting access, persistent storage, inference and rule-based actions, multiple user interfaces, representing complex relationships among data, integrity constraints, backup and recovery, and much more. Databases are the best foundation for text mining and content and knowledge management. Existing tools can get direct access to the ‘native text database management system’ to get a complete history of the document creation, authors and readers, access history, access roles and security.

Keywords

Collaborative editor, native text database, distributed computing, CSCW INTRODUCTION

A database based collaborative editor

A significant gap lies between handling business (customer, product, finance, etc.) and text data (documents). Very often, word processing documents are stored somewhere within a confusing file structure with inscrutable hierarchy and low security. On the other hand, crucial data from an organization’s point of view are stored in databases. The infrastructure and the data are highly secure, multi-user capable and available for several other tools to build reports, content and knowledge. Our idea is to use a similar philosophy for texts. Therefore, we built a native text database to store text to enables security and collaboration, described in [1]. By native, we mean that we can store text in a structured way in the database, reduced to ordered characters, so that database transactions can be applied to it. In this paper we focus on the real-time-insert operation from one or several characters. This paper contains two main sections. In this chapter basic information, like the text database, the collaborative editor and architecture are explained. In the next chapter we discuss the most important database transaction within this solution.

A collaborative editor allows for the same document to be opened and edited simultaneously on the same computer or over a network on several computers. Right now we understand editing as to open and close a document, insert and delete characters, copy, cut, and paste one or several characters. All concurrency issues have to be solved within this approach, as well as message propagation, while multiple instances of the same document are being opened. Each insert or delete is a database transaction and immediately shown in the database. In principle, three different operations are conceivable when editing a text: characters can be inserted, deleted or changed. Changing means the manipulation of a character which already exists in the document. Replacing a character or changing the font type of a character, for example, corresponds to change operations. This set of operations, insert, delete and change, is complete. All other operations are a combination of these three operations only. The reading operation is not included because text editing is different from text reading. A database based collaborative editor is prepared for the followings requests:

LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT COLUMN ON THE FIRST PAGE FOR THE COPYRIGHT NOTICE.



Text must always be stored persistently, even in the editing mode. For a consistent storage of a document, even when several people are editing the same document at the same time, a database approach loans itself to this.



A security concept with user and roles specified for text editing has to be built.



Read, write, change, delete, and update rights for each separate character is necessary, depending on a user and role concept. This means that each instance has its own security setting.



All text-transactions need to be logged in an extended way.



Automatic versioning has to be supported in different ways. The user can define the way the system stores versions, like every hour, on opening or closing a document, and so on.



This function requests a flexible rollback. This means a rollback with some criteria like ‘undo all editing from user A during 10.15 pm and 11.05 pm’.



A collaborative environment has to be built with a message-propagating job. The database has to inform the accessing applications automatically if something is changing in an open document.



The dependency of data within the database has to be captured. Full data lineage has to be supported



Based on one instance or a group of instances, one can define a flow.



Collaborative awareness (i.e. notification or visualization of simultaneous editing) has to be produced.



A database schema for structure and layout has to be defined.



A placeless and flexible document structure has to be defined for individual configurable users and roles.



Based on the native storage of the text multi-channel publishing has to be supported without conversion.

Client server architecture

As shown in Fig. 1 an important part of the system is the Exchange package. It includes, among other things, the communication to the native text database. To be database independent, all communication runs across this abstract interface layer. D a t a b a s e Insert Operation Delete Operation Change Operation

An image represents just a part of the data (see Fig. 2). Within the database the mentioned properties like object ID, the reference After or Before and the Character-Value are represented. Within the editor the character and its position are held in the Character-Array and the object ID and its position in the ID-Array. Keeping a system consistent is well researched for distributed systems. Several consistency models with different semantics have been developed [2]. Each replica should remain in a consistent state during text manipulations, and each replica must process the manipulations in the same order. The semantics of the sequential consistency model [2] matches these requirements, and the remote write protocol is used to realize this model. Primary-based protocols, like the remote write protocol, assume a central instance for the data objects, what is given by the master model in the database. Edit or W1

Update Operation

1

Ed it or W4

A B C D

2

R1

Dat ab as e

Image

R2

A B C D

S y s t e m

Database connection

Exchange

BA

klsajdfhkasjdfhkajksjd klsajdfhkasjdfhkajksjd fhksjfhdksjdfhlasjfhdk fhksjfhdksjdfhlasjfhdk alsjfajfhlsdfssdfasdfa alsjfajfhlsdfssdfasdfa sdfasdfaksdfasksjfhdl sdfasdfakmarcoksjfhd kajsdhflkkasjhdflkajsh lkajundflkkasjhdflkajs dlfhalsksadfasdfasdfa hdlfhalsksadfasdfasdf sdfasdfasdfasdfasdfa asdfasibylleasdfasdfa sgdfsadfasdfasdfasdf ksdfsadfasdfasdfasdf

Fig. 2. Data representation

Image

E d i t o r Client-DocumentModel

idea. The second solution is to establish some kind of replication of the document within the editor. Most existing collaborative editors like REDUCE [4] and NetEdit [3] use this approach and replicate the text file locally. In fact, our editor does not have a replica of one part of the native text database in the sense of database replicas. The editor has a kind of a replica, a so-called image.

W2

Client-Process

W5

W3 Master-Model

A B C D Update Process

Update Process Application Server

Data Engine

Fig. 1. Architecture of the database based collaborative editor As described in [1] each document is natively stored in the database. Even if several authors edit that same text at the same time, they must work on one unique document at all times. The system has to guarantee this unique view. The question is how the document is presented to the editor. A first solution is that the editor gets the necessary data on demand. Due to bad performance, we will not dwell on this

Wx: Rx:

Operation like write, delete or change Read operation

Fig. 3. Remote write protocol Manipulating the image (W1) in an editor triggers a corresponding message (W2) being sent to the database system (see Fig. 3). One solution would be to block the whole system until all editors have been updated over the reports W3 and W5. However, this results in bad performance. An asynchronous notification service is needed. This approach weakens the fault tolerance of the collaborative editor, but does not have a high priority

within this context. Our solution is to push updates to the editor and for the reading process, the local image can be used. Hybrid approach versus centralized or decentralized approach

The decentralized approach has been implemented several times, as for example in NetEdit [3]. Note that NedEdit, nevertheless uses a central coordinator. The decentralized approach is based on so-called ‘replica’. During the loading of a document, a replica is transferred from the server to the editor. Editing text means changing the local copy and then propagating the change to all the other users who have the same document opened. The main problem with this approach is to know the total causal order of all operations. Operations correspond to text manipulations, which are invoked in the same document in different editors at the same time. Consequently, all operations must be handled in all editors in the same order as activated. In the centralized approach all operations are reported to a central server which then propagates these to each user. The great advantage of a central unit is that all operations are automatically ordered, where the need of any ordering algorithms is low. The disadvantage is that depending on the implementation, there is some latency during text manipulation, since all operations must go through the central server first. The most advanced centralized approach has been implemented in REDUCE [4]. Even if we have a client server architecture and a centralized native text database management system, this does not automatically mean that it is a centralized system. First, it is possible to use a distributed database environment, second the message propagating system (see Fig. 1) used in the Exchange package finally decides on the approach. Central approach

Decentralized approach

Synchronous approach with a central message propagating system Synchronous approach with a direct message propagating system

Asynchronous approach with a direct message propagating system

Fig. 4. Classification of the approaches Based on performance reasons a hybrid message propagating system was implemented. All time critical data are using a synchronous or asynchronous direct propagating system, all others a central propagating system (see Fig. 4). INSERT AND DELETE CHARACTERS

Inserting characters can be handled in two different ways. One possibility is to divide it into ‘one character’ events. For example, inserting 10 characters would entail 10 database calls. Another possibility is not to divide such events, and transfer this event as a whole unit, as one

transaction into the database. It is obvious, that the second solution has a better performance. In the following part, the inserting of several characters at one time is discussed. All statements apply also to the case, where only one character is inserted. Many aspects come together by inserting characters. On the left side, the collaborative editor is shown. For inserting characters the Document-Model and an instance of Document_Object is used. Within the Document-Model the status before inserting 1 and the status after inserting characters 2 is represented. In the example first, the text TAX stored 1. The character T has for example the ID 12, as stored in the ID-Array.

Fig. 5. Insert characters On the right side of figure Fig. 5 the class character, where the document is represented, is shown. The illustration shows that both, the Document-Model in the editor and the same document in the database (1-1,2-2),represent an identical content. The Document_Object instance calls the corresponding insert method within the editor. As parameters the position in the text, the inserted characters (as string), and style attributes are handed over. In our example this is the string END at position 1. Knowing the position the ID of the character Before and After can be found out without problems, by checking out the ID-Array. In the example the ID 12 is at the position 0 and the ID 18 at the position 1. With this information won in addition, the database system is called with the parameters END, IDprevious12, IDNEXT18 and the position 1. The reason for these parameters lies in the collaborating functionality, where consistency has to be guaranteed, and message propagating has to be offered. Now the database system gets these parameters and calls the inset method from the class character. First, this algorithm starts a new transaction for the case when an error arises and all actions have to be cancelled. Second, the reference character with the ID 18 is locked. Third, it is examined whether the ID of the character before it (IDprevious) corresponds to 12. Forth, as many sequential database IDs must be reserved as characters would like to

be inserted. In our example three characters are inserted, therefore three sequential database IDs must be reserved. Fifth, the characters are inserted into the database. Sixth, the insert transaction can be closed, if no errors occur. Finally, as confirmation the database system sends back the ID of the first inserted character. In the shown example ID 32 is sent. Editor (Client)

Database (Server)

Image

Deleting characters works very similar and the Fig. 6 is self explained, according the explanation from inserting characters.

Master Model

Document_Object

the characters are inserted into the Character-Array, they appear on the screen. With these steps, the database system was transferred from the consistent status 1 into the consistent status 2. During this insert process, all other clients, with the same opened document, get informed, too by the message propagating service.

Character_Object

Document-Model CharacterArray

1

2

0 1 2 3 4 5

T E N D A X

0 T 1 A 2 X

ID-Array

0 1 2 3 4 5

12 32 33 34 18 19

0 12 1 18 2 19

IDE 32 + IDN 33 + IDD 34 + Pos 1

Character-Instances ID

After Before CV

12

10

32

T

18

34

19

A

19

18

22

X

32

12

33

E

33

32

34

N

34

33

18

D

12

10

18

T

18

12

19

A

19

18

22

X

1

2

REFERENCES

1. Hodel T.B., Dubacher M., Dittrich, K.R., A database based word processing concept and prototype. Technical Report, University of Zurich, 2003. 2. Tanenbaum A. Distributed Systems, Prentice-Hall, 2002

Fig. 6. Delete characters

3. Zaffer A.A., Shaffer C.A., Ehrich R.W., Perez M., NetEdit: A Collaborative Editor. Technical Report TR1-13, Computer Science, Virginia Tech, 2001.

Then the Document_Object instance must update the Document-Model to the new status. This is simple, since the position is known. Next the ID-Array is adjusted. Now the model of the client is again consistent with the model of the database. The insert process is finally terminated. As soon,

4. Sun C., Jia X., Zhang Y., Yang Y.: REDUCE: a prototypical cooperative editing system. In: Proceedings of the 7th International Conference on HumanComputer Interaction, San Francisco, CA, USA, 1997, 89 – 92