Supporting Collaborative Layouting in Word Processing - CiteSeerX

3 downloads 98761 Views 170KB Size Report
'cleanly' and the responding data type represents a 'first-class citizen' [1] of a ...... http://partners.adobe.com/asn/acrobat/sdk/public/docs/PDFReference15_v6.
Supporting Collaborative Layouting in Word Processing Thomas B. Hodel, Dominik Businger, Klaus R. Dittrich University of Zurich, Department of Informatics Winterthurerstr. 190, CH-8057 Zürich, Switzerland {hodel, dittrich} @ifi.unizh.ch, [email protected], http://www.ifi.unizh.ch/

Abstract. Collaborative layouting occurs when a group of people simultaneously defines the layout of a document at the same time into a coherent set of meaningful styles. This activity is characterized by emergence, where the participants’ shared understanding develops gradually as they interact with each other and the source material. Our goal is to support collaborative layouting in a distributed environment. To achieve this, we first observed how face-to-face groups perform collaborative layouting in a particular work context. We report about the design and evaluation of a system which provides a large workspace and several objects that encourage emergence in collaboration conflicts. People edit documents that contain the raw text and they enhance the readability by layouting this content.

1 Introduction A significant gap lies between the handling of business data (customer, product, finance, etc.) and text data (documents). Documents are not treated as a product even though a lot of companies’ knowledge is stored within this structure. For a large-scale document management environment, local copies of remote data sources are often made. However, it is often difficult to monitor the sources in order to check for changes and to download changed data items to the copies. Very often, text documents are stored somewhere within a confusing file structure with an inscrutable hierarchy and low security. On the other hand, for operational functional data the infrastructure and the data are highly secure, multi-user capable and available to several other tools for compiling reports, data provenance, content and knowledge. Collaborative processes can be defined and applied to such data. In this paper, we focus on the database-based collaborative layouting problem within documents. Based on a database-based collaborative editor, collaborative layouting processes are developed. The presented algorithms enable collaborative structuring of text for layout-, styles-, flows-, notes-, or security purposes, with a fast and constant transaction time, independent of the amount of the affected objects. In part two, we present approaches for multidimensional structuring of text. Then, in part three, we evaluate the chosen approach and describe the developed collaborative database-based algorithms. Part four discusses collaboration conflicts and concludes the paper.

1.1 Problem Description Numerous word processing systems exist for documents, but no accurate collaborative layouting system is available (and no database-based text editor). According to our knowledge (see also part 1.3), no standard word processing application provides this functionality. Under a collaborative layouting system we understand the possibility to define the layout or to apply styles/templates simultaneously. Implementing such functionalities involves several aspects. The layouting system has to be designed in such a way that it is collaborative, i.e. that several people can define, add, delete and change the layout or apply templates simultaneously within the same document, and can immediately see actions carried out by other people. The defined layout, or part of it, should be dynamically changeable, as long as only a certain person has the permission to apply modification to it, and the consistency of the whole style can be guaranteed. 1.2 Underlying Concepts The concept of dynamic, collaborative layouting requires an appropriate architectural foundation. The lowest level is a collaborative editing / document management system. Our concept and implementation is based on the TeNDaX [6] collaborative editing system, which we briefly introduce. TeNDaX is a Text Native Database eXtension and makes use of such a philosophy for texts. It enables the storage of text in databases in a native form so that editing text is finally represented as real-time transactions. Under the term ‘text editing’ we understand the following: writing and deleting text (characters), copying & pasting text, defining text layout & structure, inserting tables, pictures, and so on i.e. all the actions regularly carried out by word processing users. With ‘real-time transaction’ we mean that editing text (e.g. writing a character/word, setting the font for a paragraph, or pasting a section of text) invokes one or several database transactions so that everything which is typed appears within the editor as soon as these objects are stored persistently. Instead of creating files and storing them in a file system, the content of documents is stored in a special way in the database, which enables very fast real-time transactions for all editing processes [7]. The database schema and the above-mentioned transactions are created in such a way that everything can be done within a multi-user environment, as approved by database technology. As a consequence, many of the achievements (with respect to data organization and querying, recovery, integrity and security enforcement, multiuser operation, distribution management, uniform tool access, etc.) are now, by means of this approach, also available for word processing. TeNDaX proposes a radically different approach, centered on natively representing text in fully-fledged databases, and incorporating all necessary collaboration support. Under collaboration support we understand functions such as editing, awareness, finegrained security, sophisticated document management, versioning, business processes, text structure, data lineage, metadata mining, and multi-channel publishing - all within a collaborative, real-time and multi-user environment.

TeNDaX creates an extension of DBMS to manage text. This addition is carried out ‘cleanly’ and the responding data type represents a ‘first-class citizen’ [1] of a DBMS (e.g. integers, character strings, etc.). 1.3 Related Work Only very little is available in the literature concerning the storage of layout and structure information of text documents in databases for collaborative applications work. [8] discusses various mechanisms for storing multimedia content in databases. It focuses on the handling of object types, DTDs and automatic object type creation, whereas the main goal of this paper is not only to show ways of storing structure information about text documents, but also how to maintain the integrity of this information in a collaborative multi-user word processor application. The Reduce project [13] implements a word processor that enables users to edit any part of a text document at any time in a collaborative way. The prototype CoWord [12] works as a plug-in for Microsoft Word, enhancing it with those collaboration features. The basis of documents that can be edited with CoWord are files, whereas this paper introduces data structures and algorithms for storing and editing text documents and their layout and structure information in a database. Unfortunately, hardly anything has been published about the internal mechanisms and data structures used by CoWord, thus making it impossible to compare it with the work done in TeNDaX. Apart from CoWord, no other collaborative word processor has been found which can handle layout information, and the editor from the TeNDaX project seems to be the only one to store all its data in a database. [2,3,9,10,14] describe approaches used in the well-known document formats MSWord-Doc, Adobe PDF and the Rich Text Format. All those documents describe how complex layout and structure information can be stored in files, but the mechanisms described are neither applicable for storing such information in databases nor do they account for collaborative issues, which are the two main subjects in this paper. Nonetheless concepts from these papers helped in finding an efficient way of storing layout information in databases for collaborative applications, as described in this paper. Further important resources for ideas concerning how to maintain and synchronize layout and structuring information for text documents both in databases and other applications, were taken from the documentation of the javax.swing.text classes [11].

2 Approaches for Multidimensional Structuring of Text To use the following terminology: Whenever the term ‘text document’ is used here, it refers to the digital representation of a text, either in memory or, on storage. A ‘TextBlockElement’ is a logical entity that represents a section of a text document between a start and an end point in a document. Both, the start and the end point of such a TextBlockElement will be called ‘borders’ of a TextBlockElement. An arbitrary number of visible or invisible characters, paragraph and page properties

together define a ‘style’ with a style name. Such a style can be applied to a TextBlockElement. One or more styles together build a so-called ‘style sheet’ that can be assigned to a text document. Each style’s name in a style sheet is unique for that particular style sheet. Assuming that style sheet A and style sheet B have styles with the same defined names, the layout of the text document can be changed by changing the style sheet assigned to it. 2.1 Multidimensional Structuring of Text A text document’s main purpose is to represent text. To increase the benefits and the readability of text documents, one can structure them in multiple dimensions. Most obviously, the text can be split into sentences, paragraphs, pages, chapters and so on. In addition to that, the readability can be further enhanced by using different styles to display the letters, e.g. bold, italic, underlined, different fonts and font sizes, and so forth.

Büntzli : Translate this to swiss-german Workflow BigBoss : Verify this.

owner = rw ; group = r ; others = -

Security

What does multidimensional mean? Notes Is this comprehensive?

Marked word

Marked word

Template

Marked word

Sentence

Bold

Italic

Underline

Layout Normal

the Text

This is an Example of multidimensional structuring of text.

Fig. 1. Example for multidimensional structuring of text

When working together on a text document, other features have proven to be very useful too, such as having a possibility to add comments to a certain section in the

text, to limit the read-write access on text, or to specify tasks that someone has to do with a certain part of the text. All of these applications depend on the fact that one can define a number of consequent characters as an entity or element in the text and can link such an element with the data defining its properties. Such a TextBlockElement could then define a logical block of the text (line, paragraph, chapter or book) or contain any data assigned to that section of text, as for example a comment or security information on it. In this way, a text document can be structured in multiple dimensions (see Fig. 1.). As the layout of a text is one of the most complicated dimensions of such a structured text, the rest of the paper will mainly be focused on issues concerning the storage and handling of layout information in a collaborative database-driven application. All of the conclusions drawn from that can equally be applied to any of the other dimensions. 2.2 Categories of Layout Information The main reason for applying layout to a text, is to structure a text to enhance its readand understandability. Such a structure most likely originates from a logical structure that the author sees behind the text. It can be - but doesn't have to be - visually expressed. There are different ways to structure a text. The simplest way is to use punctuation and line breaks. This can be accomplished by just adding the punctuation characters or invisible line break characters into the string that represents the text document. With these two tools the readability of a text can already be enhanced dramatically. Furthermore, one can apply different text attributes to any number of consequent characters to mark them or to divide long texts into titles, subtitles lists and normal text. This is a bit more complicated than simply adding punctuation and line breaks, since a whole section of characters has to be defined as one logical entity, in this case represented by a TextBlockElement. This shall be done in a memory saving manner and the TextBlockElements integrity shall be maintainable at a minimum number of operations when altering the text document before, inside or after the section represented by the TextBlockElement. Those TextBlockElements can either have an arbitrary set of properties or a predefined set of properties as defined in a logical style. Such a logical style is preferably defined in one central location, as for example in a style sheet (e.g. a CSS file, Cascading Style Sheet), together with all the other styles available in the text document to separate layout information from the text as far as possible. Each TextBlockElement represents a section of text in the document and has to move, shrink and grow as the text in the document is being edited. The combination of all the TextBlockElements comprises the logical structures of the document and these have to be stored together with the text.

2.3 Common Practices for Storing Layout Information in Files All word processors that can handle layout information have implemented a way of handling and storing the text and the layout information of a document together. There are several different approaches from which different concepts can be adapted and enhanced in a database solution. − Define a TextBlockElement as an object and assign layout and logical information to it. In this case, the text is internally represented not as one string of characters, but rather as a collection of objects containing parts of the text as strings of characters. − Blend in the definition of the TextBlockElements as a sequence of normal characters into the string of characters that build up the actual text. When loading such a document, the encoded information has to be parsed out of the string of characters and then visualized accordingly. Any mark-up language such as HTML or XML, works like that. Even compound documents in Rich Text Format (RTF) function this way. − Define TextBlockElements as objects separate from the string of characters that build up the text, and give TextBlockElements pointers to the first and last characters which are represented by the TextBlockElements. For supporting multidimensional structuring of text, the third option proved to be the most efficient one. 2.4 SGML & Markup Markup languages, like HTML, have proven to be very powerful in their ability to layout text and those such as XML, in representing machine readable data. Both are Standard Generalized Markup Languages (SGMLs) and share the concept that a string stored in a text file is recursively divided into sections by SGML tags to represent the structure of the data stored in the string in a tree manner. The tree only emerges from the string when it is parsed for the according tags. 2.5 Storing of Text and TextBlockElements in TeNDaX In TeNDaX no text files are used to represent the text, but on the server side a chain of CChar objects is stored in the database, and on the client side there is an array of character objects. The reasons why this structure is the best choice and offers a high level of performance are described in [4,5]. To add structuring information like the SGML-Structure of an HTML document to a file stored in TeNDaX, many different methods are available. In the following section some of these are presented and discussed. 2.5.1 As attributes of Each Character When every character is stored as a character object, the most "simple" way of storing layout information on text might appear to be storing it as additional attributes on every character (see Fig. 2.). This sounds very straight forward but brings

considerable disadvantages with it. First, there's a serious performance issue, both when it comes to the used memory and to necessary operations on changes. The space issue can be solved by using pointers to additional objects storing all the layout data for one or more sections of identically formatted text.

Fig. 2. Structure information on each character object

However that still leaves us with the transactional performance issue. For every change in the formatting, each of the concerned character objects has to be altered; in the worst case scenario, this would mean that if someone wanted to change the font size of an entire document, then every single character of the document would first have to be altered, both in the client and in the database. 2.5.2 As Tags of One or Multiple Characters As shown above, defining structural information on every single character is far too expensive. To decrease these costs the text could be split up into sections and the layout information could be assigned to that section instead of to every single character inside the section (see Fig. 3.). Such sections are also used in HTML, XML or any other SGML, and are defined with so-called tags. The idea is to mark the beginning and the end of a section with a tag. In HTML and XML this is done with a series of predefined characters which are embedded into the text. As in TeNDaX, the characters are stored as objects that can have multiple properties. It would even be possible to use only one single character object to represent such a start- or end-tag.

Fig. 3. Structure information as tags embedded into chain of character objects

Either way, there are still serious limitations to that technique. The Client and Database need to be equipped with mechanisms to efficiently insert, find, edit and delete the tags. Furthermore, multidimensional structuring of text becomes very complicated if tags are used, which are inserted into the text. 2.5.3 As an Alternate Data Structure The third option is to create an additional data structure representing the structure(s) of the document and only linking its elements to the chain of character objects.

Fig. 4. SGML tree structure in java

In the TeNDaX java client the java classes from the package javax.swing.text can be used to implement this functionality (see Fig. 4.). The HTMLDocument (javax.swing.text.html.HTMLDocument) stores the text internally in an array of characters, and the SGML - Tree that represents the layout of the HTML document which is being stored as a tree consisting of BlockElements and RunElements. Each

instance of BlockElement represents a subsection of the text which can in turn be divided into subsections. The leaf of such a branch is then represented by a RunElement which has a pointer to the beginning and to the end of the section in the text. On the database side there is no need to follow the suggestions made by the java implementation, which is why a simpler but similarly efficient implementation is possible. The question which we had to ask ourselves was: is it really necessary to store the required information in the form of a tree structure, or would a collection of TextBlockElement objects be sufficient? It turned out that a non-hierarchically ordered collection of TextBlockElements on the database side is sufficient for reconstructing the complete tree structure on the client side, as long as certain precautions are taken when synchronizing the client and database. In the following section of this paper, the newly constructed data structure on the database side will be explained, together with its advantages and disadvantages.

3 Evaluation Corresponding to the RunElements in java, CTextBlockElements in the database represent a selection of text and contain data that applies to that section. To keep the position and the size of such a section efficiently up-to-date and synchronous with the clients, the start and end borders of the section must somehow be marked in the text. In Java, this is accomplished with instances of the class StickyPosition. A StickyPosition represents the offset of a character in the text and moves together with the character whenever text is inserted or deleted before the StickyPosition. This is done by increasing and decreasing a counter every time text is inserted, depending on the position of the insertion. In the database, with potentially thousands of positions in thousands of documents, this solution would not be efficient enough. A far more efficient way is to add a pointer from the character object after the desired position of the border to the TextBlockElement that starts or ends there. When text is then inserted or deleted before, inside or after the section, the borders are still always before the same character. It’s only when deleting the character which actually links the pointer to a border, that care must be taken that the pointer is moved to the next character on the right. This now enables the definition of sections of text which are unaffected by insert or delete actions. However if one would like to be able to have multiple sections start at the same position (for example, the sections "book 1", "chapter 1" and "paragraph 1" start at the same positions), another data structure is needed. Instead of having pointers which point directly from the character object to the TextBlockElement object, TextBlockElementBorder objects can be used as an intermediate to implement this 1:n relationship. To simplify things even more, these TextBlockElementBorders don't even need instantiated objects, but only virtual borders represented by a unique identifier. The first character object inside a TextBlockElement has a pointer to such a virtual TextBlockElementBorder, and the TextBlockElement object has as it’s start attribute a pointer to the same virtual TextBlockElementBorder. The same applies accordingly to the first character object

appearing after the end of the TextBlockElement. A simple example is shown in Fig. 5.

Fig. 5. TextBlockElements on virtual borders

With this data structure it is not only possible to structure a text in one dimension, but rather in multiple dimensions, merely by using a different value for the BlockType attribute in the TextBlockElements in the database and a separate RootElement for the tree structure in the java client. 3.1 Loading and Synchronization of Structure Information As stated earlier, it is not necessary to store the complete SGML tree in the database in order to restore it in or to synchronize it with the clients. As line breaks are already embedded into the chain of character objects in the database, the system doesn't have to take care of splitting other TextBlockElements when a line break is inserted into the text. All other changes in the TextBlockElement tree of the client are directly coupled to the layout and formatting actions taken by the user. 3.1.1 Loading of a Document When a document is loaded from the database, first the complete set of characters, including all the line breaks, is loaded into the client. Then all the TextBlockElements of the document are loaded, and depending on the type of TextBlockElement used, an action is taken. For layout TextBlockElements this action would be to apply the properties defined in the TextBlockElement object to the section of text it represents. Since all the TextBlockElements have a unique object identifier and since it is always true that a TextBlockElement A, with an identifier higher than TextBlockElement B, is younger than TextBlockElement A, the TextBlockElements of a document can be

loaded in a chronological order. This again makes it possible for the java class that manages the tree structure (javax.swing.text.DefaultStyledDocument.ElementBuffer) in the client to reconstruct the tree, so that it then looks identical to any other instance in any other client that currently has the same document open. 3.1.2 Propagating Changes Whenever a user now initiates a change in the clients TextBlockElement tree, only the action that initialized this change has to be stored and propagated accordingly to the database and to the other clients. The insertion or deletion of one or more characters in the client does not affect the TextBlockElement structure, neither in the client nor in the database. The only action that has to be taken when deleting a character object from the database, is to check whether or not it carries a pointer to a virtual TextBlockElementBorder A. If this is the case, the pointer to A has to be moved to the next character object on the right that is not being deleted. If this character object already carries a pointer to a virtual TextBlockElementBorder B, the pointer to A can be dismissed and all references to A within the TextBlockElements of this document have to be replaced with references to B. Whenever the function is being called to locally create a TextBlockElement in the client, either the TextBlockElements OID is already known, which means that the same Element already exists on the database, or its OID is not yet known, which means that the action creating the TextBlockElement has been initiated in the local client and that the new element has to be created in the database as well. The creation of the new TextBlockElement in the database will then be propagated to all but the initiating client. When the TextBlockElement has been created on the database, the returned OID from the database is assigned to the Element in the client. If the OID for the TextBlockElement to be created is already given, the Element has already been created in the database and the creation of the Element in the client is due to a propagation action from the database or from another client. If a TextBlockElement with the specified OID already exists locally in the client, this means that an already existing Element has been altered in the database and therefore must also be altered in the client. To delete a TextBlockElement, the initiating client only has to call the according function in the database. If the deletion is successful, it is propagated to all clients whereupon they also locally delete the TextBlockElement. 3.2 Database Schema In the following section of the paper, we describe the used data structure that implements the structures of a document on the database side, and later we move on to discuss the algorithms. To define a TextBlockElement in a document, a pointer to a virtual border has to be set on the first CChar inside and the first CChar after the TextBlock. Pointers to the same virtual border then have to be set in the new CTextBlockElement. Depending on the type of TextBlockElement, the data for the TextBlock must then be set accordingly. In Fig. 6, the example of a TextBlockElement is shown, that defines

CFile

oid

oid

startBorderOid

StyleName

WorkflowDescr.

DefaultStyleSheet

endBorderOid

StyleSheetName

owner

….

...

blockType

FontName

blockDataType

FontSize

blockDataValue

Bold

fileId

...

CTextBlockElement

CChar

oid

nextCChar

CWorkflow

oid

characterValue

CStyle

oid

previousCChar

CTextBlockElement

virtualBorderOid

Class definitions

that the letters "TEN" have the style "Title 1", assigned from the style sheet with the name "Classic1", and a second TextBlockElement defines that the letters "TENDA" have the workflow task assigned to them that the user "theBoss" should complete the action "Sign this!".

...

CTextBlockElement

oid : 71

oid : 76 CStyle

sBOid : 12

sBOid : 12

CWorkflow

oid : 1584 eBOid : 19

oid : 234

eBOid : 22

WD : Sign this!

bT : CWorkflow

SN : Title 1 bT: layout Example instances of the classes

SSN : Classic1 bDT : CStyle

O : theBoss

bDT : oid

...

bDV : 234

FN : Arial bDV : Title 1 FS : 22 FID : 11

FID : 11 B : true ... CFile FID : 11 SSN : Classic1

CChars

….

oid 12

T

oid ...

null

E

oid ...

null

N

oid ...

19

D

oid ...

null

A

oid ...

22

X

...

Fig. 6. Database schema and samples

Splitting up the information contained in a TextBlockElement into three parts, makes it possible to structure a document in multiple dimensions, to assign simple data type - value pairs to a TextBlock or even to make references to complex database objects, as, for example, styles from a style sheet, simple tasks or complete workflows. To speed up the searches for CTextBlockElements with a reference to a given virtual border, a two dimensional index is maintained on CTextBlockElement.FileId and CTextBlockElement.startBorder, and another one on CTextBlockElement.FileId and CTextBlockElement.endBorder. These indices are guaranteed to have almost linear performance no matter how many documents are stored in the database.

3.3 Description of the Algorithms Used In the following section the algorithms for storing and manipulating layout and structure information in a database-driven collaborative word processing application are described. o is he symbol for an object in the system. o = object The elementary function (Elementary functions are assumed given. Their implementation varies with the programming language used.) delete(o) removes the object o from the system. delete(o) = deletes o

3.3.1 TextBlockElements The symbol c represents a character in the chain of characters of a text document stored in the database or the client. c = character in the text The elementary function index(c) returns the offset of c in the text. index(c) = offset of c The elementary function border(c) returns a reference to the (virtual) border at the position between index(c)-1 and index(c), if there is no reference defined to a virtual border at this position it returns the null value. border(c) = reference to border between index(c)-1 and index(c) The symbol b represents a border of a TextBlock between two consequent characters c1 and c2. b = border / index position between c1 and c2 whereas index(c1) + 1 = index(c2) Any number of consequent character objects in the text document can be defined as a logical entity or a TextBlockElement. The symbol e represents a TextBlockElement.

e = textBlockElement of the text The elementary function newElement() creates a new object of the type TextBlockElement. newElement() = the new element e The elementary functions start(e) and end(e) respectively return references to the virtual borders b1 and b2, at the beginning and at the end of the TextBlockElement e respectively. start(e) = starting border of e end(e) = ending border of e Does the TextBlockElement start and end at the same position, it is an empty but valid TextBlockElement of a text section with the length zero. In this case the start(e) equals end(e). To access the attribute values of a TextBlockElement e the elementary Functions blockType(e), dataType(e) and dataValue(e) can be used. These return for example "layout", "Integer" and "12". blockType(e) = e's type of block dataType(e) = e's type of data dataValue(e) = the stored value The function createTextBlock(c1, c2, blockType, dataType, value) inserts a new TextBlockElement, which represents the text section from character c1 to character c2. The function createTextBlock first checks if a TextBlockElement e with the given specifications already exists, that has to be given a new data value. Is this the case, the new value is being assigned to the TextBlockElement e. If not, it is being checked if on the character objects c1 and c2 a border is defined. If a border is already defined it fetches its border identifier. Else it creates a new border on the respective character object and then fetches its border identifier. Then a new TextBlockElement e is being created and its start, end, blockType, dataType and dataValue are being set. At the end the new or edited TextBlockElement e is being returned. createTextBlock(c1, c2, blockType, dataType, value) → e whereas if ∃e' ( start(e') = border(c1) ∧ end(e') = border(c2) ∧ blockType(e') = blockType ∧ dataType(e') = dataType ) e ← e' else e ← newElement()

start(e) ← createElementBorder(c1) end(e) ← createElementBorder(c2) blockType(e) ← blockType dataType(e) ← dataType dataValue(e) ← value fi dataValue(e) ← value The function createElementBorder(c) is defined as: createElementBorder(c) → b whereas if ! border(c) = null b ← border(c) else b ← newBorder(c) fi In createElementBorder(c) it might be necessary to define a new virtual border, which can be accomplished by using the elementary function newBorder(c). It defines a virtual border b which represents the index positioned between index(c)-1 and index(c). newBorder(c) = the new border, positioned between index(c)-1 and index(c) The merging of two borders, as described in this paper, is defined as follows: mergeBorders(b1, b2) while ∃ e (start(e) = b1 ∨ end(e) = b1 ) ∀c ( border(c) = b1 ) border(c) ← null ∀e ( start(e) = b1 ) start(e) ← b2 ∀e ( end(e) = b1 ) end(e) ← b2 end while All references to b1 in all the TextBlockElements are being replaced with references to b2. As it is possible that this function is being called at the same time by multiple users, it has to be ensured that at the end the function really all references to b1 have been replaced with references to b2. This assurance is being made by using the while-loop. 3.3.2 Styles und Style Sheets To store styles and style sheets in the database and in the client the symbol s is introduced for a style, e.g. "Title 1" with font Arial und font size 22. s = style

A style defines values for an arbitrary set of character, paragraph or page properties. Each property consists of an attribute name and an attribute value. The symbol a represents a collection of such attribute name - value combinations, e.g. "Font = Arial" or "Font size = 22". Such a collection can consist of an arbitrary number of attribute names - value pairs. a = {attribute name - value paris } To access the attributeSet a of a style s the elementary function data(s) can be used. It returns a reference to the collection of attribute-value pairs a of the style s. data(s) = attributeSet a of s The value of the style- and the styleSheet- attribute together build the unique identifier of the defined style in the database. The elementary function name(s) returns a reference to the value of the attribute "StyleName" of the style s, e.g. "Title 1" name(s) = name of the style s Assuming styleSheet s1 has styles defined with the same names as styleSheet s2, the two styleSheets s1 and s2 can define two different layout visualisations of the same document. The elementary function styleSheet(s) returns a reference to the value of the attribute "StyleSheetName" of the style s. styleSheet(s) = name of the StyleSheet s belongs to The elementary function newStyle() creates a new empty style object s. newStyle () = the new empty style s To create a new style s or replace an existing style s, the function editStyle(styleName, styleSheetName, a) is defined as follows: editStyle(styleName, styleSheetName, a) → s wheras if ∃s' (name(s') = styleName ∧ styleSheet(s') = styleSheetName) s ← s' else s ← newStyle () name(s) ← styleName styleSheet(s) ← styleSheetName fi data(s) ← a

As a certain combination of styleName and styleSheetName values are by definition unique within the database, editStyle replaces the attributeSet of an existing style with the same styleName and styleSheetName attributes or creates a new style with the given names and attributeSet. The function removeStyle(styleName, styleSheetName) is defined as: removeStyle(styleName, styleSheetName) whereas ∀s (name(s) = styleName ∧ styleSheet(s) = styleSheetName ) delete(s) To delete a complete StyleSheet from the removeStyleSheet(styleSheetName) is defined as follows:

system

the

function

removeStyleSheet(styleSheetName) wheras ∀s (styleSheet(s) = styleSheetName ) delete(s)

4 Conclusion, Collaboration Conflicts Since TeNDaX was built to support multiple users editing the same text document simultaneously, it has to be possible not only to insert and delete characters, but also to define TextBlockElements at the same time. To define a TextBlockElement, a reference to the start and to the end of the TextBlock as well as the TextBlockElement data have to be available. As the data of a TextBlockElement is created in one client only and cannot be accessed by any other, no collaboration conflicts can be expected here. However in order to be able to create a TextBlockElement on the database and then propagate it to the clients, the references to the start and to the end of the TextBlockElement have to remain valid until the TextBlockElements creation has been completed. If, for example, client A tries to create a TextBlockElement "E" starting at character object "2e49" and ending at character object "6a02", and, at exactly the same time client B deletes the character object "2e49" from its local character array and from the database, then by the time the TextBlockElement "E" should be created on the database, one of its borders no longer exists in the database, as it has been deleted just previously. As a consequence, the TextBlockElement cannot be created and the initiating user will receive an error message asking him/her to try again. This is one of three possible collaboration conflicts. The start character object or the end character object, or even both the start and the end character object of the TextBlockElement have been deleted. Everything else that is initiated by two or more different users affecting the same area in a text document does not really represent a technical conflict, as things down in the database and thus also in the clients happen sequentially, but probably just too fast for a user to realise the time shift between the actions. This might result in a situation where one user marks a word bold, for

example, and another user marks the whole sentence to be the style "Title 1"; depending on who's transaction is executed first on the database, the appearance of the sentence will look different, but technically spoken that’s not a conflict and therefore doesn't have to be handled by the system.

5 References [1]

[2] [3]

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12] [13] [14]

S. Abiteboul, R. Agrawal, P. Bernstein, M. Carey, S. Ceri, B. Croft, D. DeWitt, M. Franklin, H. G. Molina, D. Gawlick, J. Gray, L. Haas, A. Halevy, J. Hellerstein, Y. Ioannidis, M. Kersten, M. Pazzani, M. Lesk, D. Maier, J. Naughton, H. Schek, T. Sellis, A. Silberschatz, M. Stonebraker, R. Snodgrass, J. Ullman, G. Weikum, Widom, and J. Stan Zdonik, "The Lowell Database Research Self Assessment," Massachusetts 2003. S. M. Burke, "The_RTF_Cookbook," vol. 2004, 2003. http://search.cpan.org/~sburke/RTF-Writer/lib/RTF/Cookbook.pod S. Haigh, "A Glossary of Digital Library Standards, Protocols and Formats," Information Technology Services National Library of Canada, 1998. http://www.nlc-bnc.ca/9/1/p1-253-e.html T. B. Hodel and K. R. Dittrich, "A collaborative, real-time insert transaction for a native text database system," presented at Information Resources Management Association (IRMA 2004), New Orleans (USA), 2004. T. B. Hodel and K. R. Dittrich, "A native text database: What for?," presented at Information Resources Management Association (IRMA 2004), New Orleans (USA), 2004. T. B. Hodel and K. R. Dittrich, "Concept and prototype of a collaborative business process environment for document processing," Data & Knowledge Engineering, vol. Special Issue: Collaborative Business Process Technologies, 2004. T. B. Hodel, M. Dubacher, and K. R. Dittrich, "Using Database Management Systems for Collaborative Text Editing," presented at European Conference of Computer-supported Cooperative Work (ECSCW CEW 2003), Helsinki (Finnland), 2003. P. Iglinski, "An Object-Oriented SGML/HyTime Compliant Multimedia Database Management System," 1997. http://www.acm.org/sigmm/MM97/papers/iglinski/ACMMM97.html Microsoft, "Microsoft Word 97 Binary File Format," Microsoft, 1998. http://www.google.ch/search?hl=en&ie=UTF-8&oe=UTF-8&q=%22Microsoft+Word+9 7+Binary+File+Format%22&btnG=Google+Search&meta= Microsoft, "Rich Text Format (RTF) Specification, version 1.6," Microsoft, 1999. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/rtfspec.asp Sun, "JavaTM 2 Platform, Standard Edition, v 1.4.2 API Specification," Sun Microsystems, 2003. http://java.sun.com/j2se/1.4.2/docs/api/ C. Sun, "CoWord Prototype," 2004. http://www.cit.gu.edu.au/~scz/projects/coword/ C. Sun, X. Jia, Y. Zhang, and Y. Yang, "REDUCE: a prototypical cooperative editing system," ACM Transactions on Computer-Human Interaction, pp. 89 - 92, 1997. C. G. a. J. Warnock, "Adobe® Portable Document Format Reference 1.5," Adobe Systems Incorporated, 2003. http://partners.adobe.com/asn/acrobat/sdk/public/docs/PDFReference15_v6.pdf

Suggest Documents