A Model and Environment for Improving Multimedia ... - Springer Link

A Model and Environment for Improving Multimedia Intensive Reading Practices Thomas Bottini, Pierre Morizet-Mahoudeaux, and Bruno Bachimont UMR CNRS 6599 HEUDIASYC Université de Technologie de Compiègne B.P. 20529, 60206 Compiègne-CEDEX, France

Abstract. The evolution of multimedia document production and diffusion technologies have lead to a significant spread of knowledge in form of pictures and recordings. However, scholarly reading tasks are still principally performed on textual contents. We argue that this is due to a lack of semantic and structured tools: (1) to handle the wide spectrum of interpretive operations involved by the polymorphous intensive reading process; (2) to perform these operations on a heterogeneous multimedia corpus. This firstly calls for identifying fundamental document requirements for such reading practices. Then, we present a flexible model and a software environment which enable the reader to structure, annotate, link, fragment, compare, freely organize and spatially lay out documents, and to prepare the writing of their critical comment. We eventually discuss experiments with humanities scholars, and explore new academic reading practices which take advantage of document engineering principles such as multimedia document structuration, publication or sharing.

1

Introduction

Amongst all the reconfigurations of human practices which have emerged with the spread of digital technologies, documents critical appropriation tasks take up a particular place. Indeed, crucial aspects of our intellectual life rely on the ability to perform an efficient work with documents. In this perspective, digital document technologies have brought decisive innovations, which have forever changed the way we read and write — and consequently, think. We can especially mention some fundamental document engineering principles, which daily feed our ”document digital life”: document logical structuration, semantic annotations, hypertext dynamic maps, separation of content from presentation. These concepts are pivotal in academic context, where scholar practices (learning, critical reading and writing) constantly call for innovative document tools. The aforementioned operations have now reach maturity for textual contents, notably thanks to the development of the Web as a ”global document environment”, where writing, annotation and sharing operations are strongly articulated. Concomitantly, digital cognitive technologies have also dramatically changed reading practices by bringing readers face to face with non-textual contents. Since Vannevar Bush formalized the basics of what would become one of the founding utopias of multimedia work personal environments — the Memex [1] J. Liu et al. (Eds.): AMT 2009, LNCS 5820, pp. 226–237, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Model and Environment for Improving Multimedia

227

—, scholars still endure a lack of efficient personal software environment to read, to analyze and to elaborate non-textual documents. Yet, kowledge is nowadays frequently embodied in lecture recordings, pictures, music scores, etc. Such documents cannot be ignored due to the fact that they do not ”fit to print” [2] and to its technical tradition [3]. Due to their technical nature, typical interpretive and thorough appropriation operations — such as annotation, (re)structuration, fragmentation so as to pick quotes for a future writing, linking and spatialization to clarify content — which implies an intensive reading are not straightforward. The problem becomes even more complex if we consider the necessity of gathering all of these operations within a single software environment that enables the reader — who is also a writer — to carry out an intensive reading of heterogeneous documents aiming at producing a new critical document (memo, course material, dissertation, etc.). In this paper we propose a document model which handles the aforementioned multimedia intensive reading operations, as well as an experimental software tool for its use in an academic context. In the next section, we present a quick overview of the state of the art of the ”multimedia scholarly reading” problem. In section 3, we exhibit three requirements which are mandatory for an efficient model to handle multimedia intensive reading operations. The model is described in section 4, and the experimental tool that enables us to validate our theoretical approach is lastly presented in section 5.

2

Related Work

The complex ”multimedia scholarly reading” problem has been studied through many approaches, but rarely for its own sake and in the perspective to design and implement a complete personal environment. According to [4], scholars are poorly equipped when it comes to perform active reading tasks on hypermedia documents, despite they are nowadays part of their daily work. The authors propose a scholar audiovisual information system for linking and personal annotation of video documents, which rely on an unconstrained metadata schema. Bulterman [5] tackles the multimedia annotation problem in a document enrichment perspective. However, he only focuses on the annotation of multimedia presentations. Concerning ancient digitized manuscripts, [6] propose a model and Computer Human Interface principles to create and exploit annotations, while [7] emphasise the need to structure pictures so as to identify and articulate pertinent zones for active and scholarly reading tasks. Researches in the field of ”Spatial hypertexts” have underlined the founding role of documents spatial layout and relations in complex reading and writing activities [8,9]. In [8], Marshall and Shipman introduce Spatial hypertexts as ”most appropriate when there is no distinction between readers and writers”, while ”more prescriptive methods might hamper exploratory structuring”. It so happens that scholarly reading operations first and foremost rely on the ability to annotate, comment, cut out and freely organize contents (the scholarly reader

228

T. Bottini, P. Morizet-Mahoudeaux, and B. Bachimont

is a kind of writer, who ”writes his/her reading”). Moreover, interpretation — as every creative process — is neither immediate nor univocal, and thus needs to be supported by a flexible reading space so as to emerge. Our theoretical analysis of scholarly reading practices as creative activity is in keeping with Nakakoji and Yamamoto’s approach [10,11]. The authors address intensive reading and scholarly writing crucial aspects, and especially the space, as a mean to make structures emerge among elements. They view scholarly writing as ”a process of articulation” [10]. Interviews aiming at making explicit reading practices of scholars from different communities (mainly musicologists and humanities and social sciences teachers and students) drew out the need of an environment that combinates the aforementioned document operations [3].

3

Requirements for a Scholarly Reading Environment

We now expound three document requirements to formalize the need by which the previous section has been concluded. 3.1

A Semantic Approach to Intensive Reading

Requirement 1 — The model must support a semantic and structured approach of the intensive reading process, from the thorough appropriation to the elaboration of a new critical document. These concepts, which are the very basis of What You See Is What You Mean (WYSIWYM) document elaboration tools, are here exploited to design more efficient reading tools. Interviews of researchers, students and musicologists who have to analyze recordings or pictures have shown that they generally use non-semantic tools, such as sound or graphics editing softwares. These softwares appear not to be suited for conducting intensive reading and scholarly writing processes. Indeed, these are complex cognitive processes, which imply rich interpretive operations such as structuration, gloss, critical comparison, fragmentation and recombination. Handling these complex processes calls for a model which can articulate them in a single environment. As a result of this WYSIWYM-based approach, the model must be exploitable by a publishing tool to create the resulting critical documents (an example is given in section 5). Moreover, it will then satisfy the separation of content from presentation principle, which is the keystone of multi-media publishing tools (for the benefits of such a paradigm, see [12]). 3.2

Performing Unified Interpretive Operations on Heterogeneous Documents

Requirement 2 — The model must make possible the whole spectrum of interpretive operations that a thorough active reading requires for pictures as well as recordings and textual contents.


229

The reader must be able to perform the following operations on documents or fragments, regardless of their nature: 1. localize, qualify and structure (the ”three steps of indexing”, [13]); 2. annotate/enrich and link (the very basis of gloss). Interviews have shown that scholars mainly rely on a common word processor to perform most of these operations. Therefore they often neglect pictures or recordings because they are not able to ”have a grip” on them. 3.3

Structured Reading and Writing Spaces

Requirement 3 — The model must support a structured reading and writing space to organize and articulate documents parts, annotations and ideas that emerge while the corpus is glanced through. The reader has to freely compare, categorize and reorganize the documents fragments to make new ideas or unexplored interpretive trails emerge. An active and intensive reading relies on a content appropriation process along with a content creation process. Since the shape of the final document is unstable throughout the reading, the model must provide a flexible and convenient way to organize, articulate and structure content fragments at will. Observations about the founding role of space in complex cognitive work [8,10] are greatly corroborated by the observations we made on scholars practices. For example, musicologists need to spatialize score fragments so as to bring to light thematic variations, which are undetectable in the linearity of the score1 . The model must then provide constrained structures (charts, lists, trees) as well as freeform spaces to organize fragments. A critical appropriation can be a long process involving a great number of documents and interpretive trails, the complexity of which can not entirely be represented by a single resulting critical document. Thus, the model must keep track of the whole reading material — such leftover annotations and fragments or anything that could be summoned for a future reading — to share not only this final document, but the reading environment of its genesis. The model has then to support a broader acceptation of the notion of ”document”: from a monolithic object to a flexible, structured and semantically organized fragmentsmade corpus.

4

Designing a Conceptual Model

This section describes the model we have developed according to the above stated requirements. They can be achieved through combining the logical entities presented hereunder. In that purpose we use a UML class diagram, the visual expressiveness of which helps for a clearer understanding of the nature of the elements and their relations (cf. figure 1). 1

For an anthropological approach to the triad mind/space/writing, see [14].

230


Fig. 1. A flexible model to handle interpretive operations on a multimedia corpus

4.1

Managing Heterogeneous Raw Resources

The very first operations a scholar reader may want to perform is to pick and to recompose the documents parts to study, and to reject useless ones. Working on digitized paintings or musical recordings may require, for example, to ”materially” extract faces or to concatenate motives respectively. Fragments of resources files — such as a selection in a JPEG file or a segment of a MP3 or ASCII file — are stored into what we call Samples. Samples are clustered into MaterialEntities to handle selected fragments as a single entity. Subclasses of MaterialEntity and Sample are defined to handle the specific natures of picture, sound and textual contents. 4.2

Qualification, Annotation and Linking

Intensive reading relies on the ability to identify pertinent portions within documents. This is handled by the Selection class, which delineates a fragment of a MaterialEntity. Then, the reader may want to describe them with text content and metadata, to label them with a given category or to link them to other documents. These ”annotation” and ”gloss” aspects are handled by SemanticEntities, which can be associated to Selections, and interconnected by Links. A link may be glossed, consequently, Link is a subtype of SemanticEntity. 4.3

Document and Reading Space Structuration

To materialize, for example, the logical structure of a lesson, or to make emerge the pages/systems/staves material structure of a printed score, we define what we call StructuralEntities. They are subtype of SemanticEntity, and can be associated to a Selection to build the document hierarchical structure of unstructured MaterialEntity. Every interpretive unit identified in this way automatically creates its own MaterialEntity (which is a part of the main MaterialEntity). The same MaterialEntity may be encompassed in several


231

StructuralEntity, reflecting the need to process a given content in different interpretive contexts. StructuralEntity may also be created from scratch and freely combined, with no reference to a MaterialEntity, to store and organize gloss or notes. Moreover, StructuralEntity may be used to build different kinds of interpretive structures (lists, trees, charts, etc.), according to the users’ needs. In addition, any StructuralEntity can be viewed as a bidimensionnal space, within which sub-entities are freely laid out and scaled (the dataLayout field holds a list of StructuralEntities references associated tdr2o layout properties). 4.4

Implementation Issues

Data are stored in a single XML file and organized in a database way, which can be directly understandable and editable by the user. Despite the multimedia facet of the reading practices we deal with, we don’t rely on SMIL (contrary to [5]). Effectively, its presentation-oriented semantics does not fit well for a work on a heterogeneous fragments-made corpus.

5

Experimentations with the Environment

Critical work supposes the possibility to vary the documents visual presentation for their better understanding and articulation. Consequently, we strongly rely

Fig. 2. Circular map: visualizing structural and hypertext relations

232


on the Model-View-Controller design meta-pattern to take advantage of the model. We especially focus on synoptic structures visualization. The ”free” space illustrates the gain of flexibility when handling picture and text fragments (cf. figure 5). Structural (StructuralEntities) and gloss (Links) relations can be concomitantly visualized on a synthetic circular map (cf. figure 2). As a complement to these generic instruments, we developed dedicated tools for specific reading communities needs. 5.1

Lessons Appropriation and Publication

We have developed a specific audio tool for a group of humanities students who had to produce a critical analysis of philosophy lessons recordings (cf. figure 3 A). It firstly enables to build the hierarchical structure of the lessons (B). In addition, the students can freely set transversal thematic annotations (C). The Structural Entities and Selections identified in this way can then be commented (D).

Fig. 3. Audio structuration and annotation tool

Critical appropriation of lecture recordings can here be seen as a kind of multimedia authoring activity. Indeed, the students structure and enrich the source material with text (summarizations of debates that took place during the lessons, concepts explanations, references) or picture contents so as to provide a future reader with a more complete reading and argumentative experience. In this perspective, we used the WebRadio publishing chain2 to enable students to 2

http://scenari-platform.org/projects/webradio/fr/pres/


233

Fig. 4. An interactive flash publication of an enriched and structured audio content

export pertinent material of their reading environment as an interactive Flash online presentation (cf. figure 4). These innovative academic practices have raised some interesting methodological issues. Beyond the satisfaction to directly deal with multimedia contents, students have to take on a double role. As students, they share their scholarly comments with teachers who are also equipped with the environment, and can then import, annotate and ”review their readings”. The synoptic circular representation (cf. figure 2) then offers an efficient way to perceive the gloss and corpus organization. But the students are also ”publishers”, who produce a multimedia online publication for a wider public, and consequently have to select and reword complex philosophical arguments. These approaches can be concomitantly conducted thanks to the flexibility of our model. Other academic experiments are in progress with semiotics students who analyze press pictures, and computer music students who produce a multimedia comment of a piece, which articulates scores, recordings, composition softwares screenshots and textual contents. 5.2

Multimedia Musicological Analysis

We also collaborate in a participatory-design way with musicologists who annotate, cut out and produce analytical charts of musical pieces (cf. figure 5). The generic entities of the model can be derived to fit musicologists specific scholarly operations. For example, we use subtypes of Links to synchronize pertinent points between different scores and recordings which embody the same musical piece. The resulting ”synchronized score” can then be fragmented and annotated as simply as a single-media document. SemanticEntity and StructuralEntity

234


Fig. 5. A bi-dimensional workspace: free spatial positioning of elements

are used in conjunction with SynchronizedMaterialEntities, which articulate picture and sound samples. Figure 5 depicts the workspace where fragments can be listened to, studied, qualified, compared and classified. The use of free bi-dimensional space enable the musicologist to freely explore and make emerge semantic relations (which is the main cognitive benefit of the spatial hypertext approach in creative work, see 2). Furthermore, more formalized spatial structure like charts (see figure 5) or lists can be used to build new analytical documents which highlights one analytical criteria or another ([14]). These spatial structures does offer a visual organization of the documents, but are also kind of ”sound maps”, since every synchronized score fragment can also be listened to. Our environment gives rise to an epistemological shift in the century-old ”score chart-making” musicological practice. Manipulating synchronized scores enables the analyst to categorize fragments by hearing whereas traditional practices are only based on the visual appearance of the score. Moreover, the possibility to share the analysis environment (cf. 3.3) and not only a static chart enables the analysts to interact with the whole material and to materially involve it during the critical discussions between peers.


5.3

235

Benefits of the Model

Our experimentations bring to light the constituent role of cognitive technologies (here, computers) in intellectual practices. Indeed, providing the readers with new scholarly operations can lead to significant evolutions in such practices. Since ideas and concepts are embodied in documents, knowledge appropriation and diffusion tasks have to rely on concrete documents manipulation in order to be performed. The model we have presented has been designed to allow the materialization of the specific structures and connections a scholar may need to establish and build to organize documents during his/her reading, according to his/her interpretive methodology and to the ”interpretive conventions” of his/her scholarly community. From this point of view, we can classify the benefits of our model in two main categories : (1) the development of new scholarly reading operations and their strong articulation with more traditional ones, (2) the elaboration of hypermedia documents which are new semiotic forms that cannot be manipulated without computers. Widening the scholarly operations spectrum. In the experimentations with philosophy students (see 5.1), we have expounded the fundaments of what we could call an ”hypermedia reading chain”3 .o Our generic model then acts as a ”cement” in such a chain. Indeed, the generic objects (Material, Semantic and Structural Entities, Selections, Links) it provides build up a ”milieu” inside which a wide spectrum of scholarly operations can be articulated, regarless of the material nature of the involved contents. ”Traditional” scholarly operations (structuration, annotation, spatialization, description) can be finely articulated, which provides a complete interpretive experience. Moreover, the concepts of the model offer a coherent and convenient contents organization from which publication tools can transform the corpus (or part of it) into different document formats, according to the needs: interactive presentations for future readers, LATEX or OpenDocument files for written article preparation, minimal software environments to share analytical results amongst peers. The experimentations with musicologists have especially shown the scientific interest of our model which enables to augment traditional score analysis operations and to contribute to the elaboration of new ones. Thus, by emphasizing the need of an open and generic model of reference, our approach favours the emergence of new knowledge manipulation and diffusion practices. Beyond the multimedia scholarly reading practices, our model intends establishing principles which can be useful for more generic active reading practices. We can remark that after twenty years of personal graphical desktop environments, operating systems still lack from unified concepts for accessing documents in a more efficient way than what common file management systems propose. The interoperability of documents and softwares could greatly benefit from a deep reflection on the essential document cognitive operations (again, annotation, structuration, fragments delimitation and access, description, categorization, hyperconnection). 3

The word ”chain” connotes the idea of a reading ”path”, from the first appropriation of the corpus to the publication of a structured comment.

236


Building and interpreting new semiotic forms. The experimentations with musicologists presented in section 5.2 illustrate the basis of our approach of the hypermedia document elaboration and appropriation problem. Computers make possible the construction of new ”species” of documents, such as hypermedia documents (multimedia documents which are hyper-linked or synchronized). Our model enables a reader to easily build such documents by connecting (with Links) relevant instants in audio documents (SoundSelections) and relevant zones in pictures (PictureSelections). The resulting documents are also semiotic forms carrying concepts and ideas, which are thereby susceptible to be interpreted. The generic scholarly concepts of the model (annotations structuration, categorization. . . ) can then be put into practice, regardless of the nature of the contents (see section 3.2).

6

Conclusion and Future Works

In this paper, we have presented a model and a complete personal environment for supporting intensive reading operations on heterogeneous multimedia documents. They enable to better understand, comment and contextualize such a reading material, and to prepare its sharing and critical discussion among a reading community. Our approach consists in putting document authoring traditional concepts into practice. These concepts — publication, resources files editing and structuration, free creation/articulation/spatial layout of elements — enable to design more efficient reading tools. New academic document practices can now be explored, thanks to the genericity and the flexibility of our model. Moreover, it makes the critical analysis of hypermedia documents possible. The next step will be to provide the users with a built-in multimedia publication tool, in order to make them able to produce and to parametrize by themselves structured and enriched presentations based on SMIL4 or Adobe Flex formats. Such a device could enable the ”digital scholar” to have a complete control on the whole reading process, from the appropriation to the critical comment writing, publication and discussion.

Acknowledgments This work has been developed in the scope of the Poliesc project, funded by the Region Picardie and the European Social Funding Program.

References 1. Bush, V.: As we may think. The Atlantic Monthly 176(1), 101–108 (1945) 2. Ingraham, B.D.: Scholarly rhetoric in digital media. Journal of Interactive Media in Education, JIME (September 2000) 4

Synchronized Multimedia Integration Language, http://www.w3.org/AudioVideo/


237

3. Bottini, T., Morizet-Mahoudeaux, P., Bachimont, B.: Instrumenter la lecture savante de documents multimédia temporels. In: Holzem, M., Trupin, E. (eds.) 11ème Colloque International sur le Document Electronique, Paris, Europia productions (September 2008) 4. Aubert, O., Prié, Y.: Advene: Active reading through hypervideo. In: HYPERTEXT 2005: Proceedings of the Sixteenth ACM Conference on Hypertext and hypermedia, pp. 235–244. ACM Press, New York (2005) 5. Bulterman, D.C.A.: Using smil to encode interactive, peer-level multimedia annotations. In: DocEng 2003: Proceedings of the 2003 ACM Symposium on Document engineering, pp. 32–41. ACM Press, New York (2003) 6. Doumat, R., Egyed-Zsigmond, E., Pinon, J.M., Csisz` ar, E.: Online ancient documents: Armarius. In: Bulterman, D.C.A., Soares, L.F.G., da Gra¸ca, C., Pimentel, M. (eds.) DocEng 2008: Proceedings of the 2008 ACM Symposium on Document engineering, pp. 127–130. ACM Press, New York (2008) 7. Faure, C., Vincent, N.: Document image analysis for active reading. In: SADPI 2007: Proceedings of the 2007 international workshop on Semantically aware document processing and indexing, pp. 7–14. ACM Press, New York (2007) 8. Marshall, C.C., Shipman III, F.M.: Spatial hypertext: designing for change. Communications of the ACM 38(8), 88–97 (1995) 9. Buchanan, G., Blandford, A.: Spatial hypertext as a reader tool in digital libraries. In: Visual Interfaces to Digital Libraries [JCDL 2002 Workshop], pp. 13–24. Springer, Heidelberg (2002) 10. Nakakoji, K., Yamamoto, Y., Akaishi, M., Hori, K.: Interaction design for scholarly writing: Hypertext representations as a means for creative knowledge work. New Review of Hypermedia and Multimedia 11(1), 39–67 (2005) 11. Yamamoto, Y., Nakakoji, K., Nishinaka, Y., Asada, M., Matsuda, R.: What is the space for?: the role of space in authoring hypertext representations. In: HYPERTEXT 2005: Proceedings of the sixteenth ACM conference on Hypertext and hypermedia, pp. 117–125. ACM Press, New York (2005) 12. Mik´ aˇc, J., Roisin, C., Le Duc, B.: An export architecture for a multimedia authoring environment. In: Bulterman, D.C.A., Soares, L.F.G., da Gra¸ca, C., Pimentel, M. (eds.) DocEng 2008: Proceedings of the 2008 ACM Symposium on Document engineering, pp. 28–31. ACM Press, New York (2008) 13. Morizet-Mahoudeaux, P., Bachimont, B.: Indexing and mining audiovisual data. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds.) AM 2003. LNCS (LNAI), vol. 3430, pp. 34–58. Springer, Heidelberg (2005) 14. Goody, J.: The Domestication of the Savage Mind. In: Themes in the Social Sciences, November. Cambridge University Press, Cambridge (1977)