Annodex: A Simple Architecture to Enable Hyperlinking, Search & Retrieval of Time–Continuous Data on the Web Silvia Pfeiffer
Conrad Parker
Claudia Schremmer
CSIRO–MIS Locked Bag 17 North Ryde NSW 1670, Australia
CSIRO–MIS Locked Bag 17 North Ryde NSW 1670, Australia
CSIRO–MIS Locked Bag 17 North Ryde NSW 1670, Australia
[email protected]
[email protected]
ABSTRACT Today, Web browsers can interpret an enormous amount of different file types, including time–continuous data. By consuming an audio or video, however, the hyperlinking functionality of the Web is “left behind” since these files are typically unsearchable, thus not indexed by common text– based search engines. Our XML–based CMML annotation format and the Annodex file format presented in this paper are designed to solve this problem of “dark matter” on the Internet: Continuous media files are annotated and indexed (i.e., Annodexed), enabling hyperlinks to and from the media. Furthermore, the hyperlinks do not typically point to an entire media file, but to and from arbitrary fragments or intervals. The standards proposed in this context of the Continuous Media Web have been submitted to the IETF for review.
Keywords Continuous Media Web, Annodex, CMML, Open Standard, Metadata
1.
INTRODUCTION
From the Web’s very foundation by Tim Berners–Lee, academics, entrepreneurs, and users have dreamt of ways to integrate the Web’s various multimedia contents into one system for storage and retrieval. While there is a wealth of digital audio, images, and video data on the Internet today, it is increasingly difficult to find the information relevant to a specific situation, despite the ever increasing performance of search engines. The gap between the (almost taken for granted) existence of the searched material and the difficulty to access it plays jigsaw with many a user. Our idea and implementation presented in this article are simple but challenging: We propose to create a new standard for the way digital audio and visual contents are placed
[email protected]
on Web sites to make multimedia content just as “surfable” as ordinary text files. In this new standard, users can not only link from a text passage to e.g. a video, but into a specific time interval containing the information sought–after. The video itself is annotated with meta information linking to other resources in the Web (i.e., text, audio, images, video), enabling the user to jump from one digital resource to another, just as if “browsing” a Web site. We have named this extended Web the Continuous Media Web (CMWeb) — it enables hyperlinking and annotations of time–continuous media data. The technology is based upon the observation that the data stream of existing time– continuous data might be broken up based on a semantic concept, and that this structure enables access to interesting subparts, known as fragments or clips of the stream. Annotations and hyperlinks can then be attached to the fragments under consideration. When consuming (i.e., listening, viewing, reading) such an annotated and indexed — we will call it Annodexed — resource, the user experience is such that while e.g., watching the video, the annotations and links change over time and enable browsing of collections simply by following a link to another resource’s fragment. Figure 1 shows an example interface for the CMWeb, a browser implemented in Mac OS X. The various features of this browser include: (a.) Media Player: Display and control of the transport and playback of the video or audio. (b.) Browser: History of the browser (i.e., back and forward), reload, and stop buttons as well as URI of the file on display. (c.) Table of Contents: List of the fragments of the media stream, including timestamps and short description. (d.) Annotation: Additional text information for the current fragment. (e.) Hyperlink: Attached to the current fragment, a content– sensitive (i.e., time–dependent) hyperlink points to other multimedia resources — including frangments — on the Web (text, audio, image, video). In case the link targets another video/audio fragment, a keyframe may be displayed. The rest of this document is organised as follows: Section 2 gives a brief architectural overview of the Continuous Media
tionmark ? character. Please note that when we talk about the URI fragment scheme, we use “URI fragment”, otherwise we mean the CMWeb use of the word fragment, where it denotes the subpart of the media that is defined by the anchor tag. Per definition, URI fragments can be interpreted on the client application only [2]. However, media data is usally high bandwidth and large size data hence downloading the complete media file before performing the offset action is not desirable as the user has to wait an often unacceptable amount of time. Therefore, the same scheme that we use to access fragments locally is also proposed as a generic URI query scheme to tell the server to provide only the requested fragment(s) of the media.
Figure 1: A screenshot of our media browser for the CMWeb technology, implemented in Mac OS X. Web and introduces temporal URI addressing, the Continuous Media Markup Language (CMML), and the Annodex media file format. Sections 2.1 to 2.3 illustrate them in more detail, respectively. Switching the focus to the actual handling of the CMWeb, Sections 3 and 4 detail how the Annodex file format is being produced from an existing media format such as mpeg and how the file format is being distributed over the Internet. Section 5 looks at the information retrieval of the newly created Annodexed media file. In Section 6, we look at related work in the common endeavour to develop tools for universal multimedia access and detail common points as well as differences with regard to our approach.
2.
TECHNOLOGY OVERVIEW
All our implementations in the context of the Continuous Media Web are open source and are distributed via our Web site http://www.annodex.net [10]. The design of the CMWeb is based on three Internet specifications proposed to the Internet Engineering Task Force (IETF) [6] for review. They are the URI standard [2], extended by temporal URI references to form the Internet Draft Temporal URI Fragments [13] and the Internet Draft Specification of the Continuous Media Markup Language (CMML) [12], serving as a companion to the Internet Draft Specification of the ANNODEX Annotation Format for Time–Continuous Bitstreams [11]. These documents are work in progress and changes may be made as the technical discussions with standards bodies and other experts continue. For the latest versions check out http://www.annodex.net.
2.1 Temporal URI References A Web–integrated access to clips or temporal offsets in media documents requires URIs [2] that point to such resource references. We envisage two ways to point to subparts of the media: URI fragment identifiers and URI query components. URI fragments are specified in a URI after the hash # character. URI queries are specified after the ques-
The CMWeb format of fragment and query identifiers is specified in our Internet Draft [13] and is conformant to the URI RFC 2396 [2]. Temporal references start with the reserved character @, representing the time–continuous resource “at” a certain temporal offset. Having the @ character at the start simplifies the parsing of a temporal reference, helping to e.g. distinguish between a fragment given by name as #smpte-25 and a fragment given as a temporal offset as #@smpte-25=01:01:01:01. There are two types of temporal references that we are using: Temporal offsets and temporal intervals.
2.1.1 Temporal Offsets The specification of a temporal offset as a URI fragment or query is given as a name-value pair, where the name specifies a time scheme to use and the value is the time specification itself. The syntax is closely related to the specification of relative timestamps of the RTSP protocol parameters as given in RFC 2326 [14]. Examples for specifications of temporal fragment offsets are: http://foo.com/matrix.au#@smpte-25=10:07:33:06 http://foo.com/matrix.au#@npt=10:7:33.25 http://foo.com/matrix.au#@10:7:33.25 http://foo.com/matrix.au#@npt=36453.25 (all four specify the same time point) and rtp://foo.com/matrix.mpg#@clock=20020711T173045.25Z (for Thu Jul 11 05:30:45 UTC 2002 and a quarter seconds)
2.1.2 Temporal Intervals Temporal intervals can be specified as well. This is achieved by adding the reserved character minus - and another time specification that adheres to the time scheme used for the specification of the first time point. Examples for specifications of temporal intervals are: http://[...].au#@smpte-25=10:07:33:05-10:07:37:21 http://[...].au#@npt=10:7:33.25-10:7:37.8 http://[...].au#@10:7:33.25-10:7:37.8 http://[...].au#@npt=36453.25-36457.8 (all four specify the same temporal interval) and rtp://[...].mpg#@clock=20020711T173045.25Z20020711T173049.80Z
2.2 CMML The Continuous Media Markup Language CMML has been designed to cater for two different yet related uses: • Authoring of annotations, anchor points, and hyperlinks for time–continuous media as preparation for their integration with the media in an Annodex file. • Indexing for search engines within automatic content retrieval through extracting markup from an Annodex file. The CMML is an XML-based language to describe the content of a media file. It is an authoring language for annotating, indexing, and hyperlinking time–continuous media in the Annodex format. A CMML file contains structured XML markup where we have chosen the XML tags to be very similar to XHTML to enable a simple transfer of knowledge for HTML authors. CMML documents consist of three types of tags: at most one stream tag, exactly one head tag, and an arbitrary number of anchor tags. The stream tag is optional and describes the input media bitstreams necessary for the creation of an Annodexed media file. The head tag contains information related to the complete time–continuous media document. An anchor tag, in contrast, contains information on a fragment of media. Figure 2 shows an example of a CMML document, demonstrating the tags of Annodexed media. The Research Hunter
Welcome to CSIRO, the Commonwealth Scientific and Industrial Research Organisation of Australia. Our astronomers track the Galileo space probing station in Tidbinbilla. ...
Figure 2: Extract of a CMML file with stream, head, and anchor tags.
The XML markup of a head tag in the CMML document contains information on the complete media document. Its essential information contains: • Structured textual annotations in meta tags, and • Unstructured textual annotations in the title tag. Structured annotations are name-value pairs which can follow an existing or new metadata annotation scheme (such as the Dublin Core [5]). The XML markup of an anchor tag contains information on a fragment of media: • Structured textual annotations in the meta tags in the same way as for the head tag. • Unstructured textual annotations in the desc tags. Unstructured annotations are free text and mainly relevant for search applications. They can be seen in Figure 1, location (d.). • Anchor points (i.e., the id attribute) into the media document that an URI can refer to. Anchor points identify the start of a fragment. This enables URIs to refer to Annodexed media fragments via named anchors, specified with the hash symbol # analogous to HTML, for example foo/example.au#speaker1. A second type of URI pointing into an Annodexed media document can be given via a time specification, for example foo/example.au#@smpte=00:05:12 (see Section 2.1). These anchor points are responsible for the list of fragments in Figure 1, location (c.). • URI linking (i.e., the href attribute) out of a fragment of the media document to any other place a URI can point to, such as fragments in other annodexed media or HTML pages. This URI linking is responsible for the content–sensitive links in Figure 1, location (e.). Furthermore, it contains an optional textual annotation of the link (i.e., the hrefdesc attribute). While the desc tag describes the media fragment itself, the hrefdesc annotation describes why the fragment is linked to a given URI.
2.3 Annodex Media File Format The Annodex format enables encapsulation of any type of streamable time–continuous bitstream format (such as Quicktime, MPEG, or Windows Media) thus being independent of current or future compression formats. It is basically a bitstream consisting of continuous media data interspersed with the structured XML markup of the CMML file. This is performed by merging the anchor tags time–synchronously with the time–continuous bitstreams on authoring an Annodex file. The anchor tags are regarded as state changes in this respect and are valid from the time that they appear in the bitstream until another anchor tag replaces them. That tag may be empty. Thus, Annodex is designed to be used as both a persistent file format and a streaming format. Figure 3 shows an example of a bitstream of an Annodexed video file. Conceptually, the video bitstream, the audio bitstream and the annotation bitstream share a common timeline. But when encapsulated into one binary bitstream, this data has to be flattened. The figure shows roughly how this is performed.
... audio ... CMML ...
video
We selected the Ogg encapsulation format version 0 [9] as the encapsulation format for Annodex bitstreams as it provides for all the requirements and has proven reliable and stable.
3. AUTHORING
time
.anx
audio header
bit stream sequence
video frames
video CMML header header
...
CMML anchor audio frames
video frames
Figure 3: The merging of the frames of the video and audio with a structured CMML file into an Annodexed bitstream .anx. There are several advantages of having an integrated bitstream that includes the annotations in a time–synchronous manner with the media data. Firstly, all the information required is contained within one file that can be distributed more easily. Also, many synchronisation problems that occur with other media metadata formats such as SMIL [17] are inherently solved. Last but not least, having a flat integrated format already solves the problem of making the Annodex format media streamable.
To author Annodexed media, we must distinguish between files and live streams. The advantage of the former is that a file can be uploaded from the computer’s file system and annotated in a conventional authoring application. In contrast, the markup of a live Internet stream by its very nature has to be done on the fly. Annodexed media files may be created in a traditional authoring application (e.g. iMovie or Adobe Premiere may easily support Annodex in the future) or through use of CMML coming from metadata collected in databases. The authoring application should support the creation of: • Structured and unstructured annotations, • Anchor points, and • URI links for media fragments. Live Annodexed media streams cannot be created in an authoring tool; They must be created by merging anchor tags with the live digital media stream. A merger application, similar to that described in Figure 4, may insert anchor tags into the live stream at any point in time under the control of a user. Our current implementation of such a merger application is the line–mode tool anxenc (short for: Annodex–encode).
To perform the encapsulation, a specific bitstream format was required. As stated, an Annodex format bitstream consists of XML markup in the annotation bitstream interleaved with the related media frames of the media bitstreams into a single bitstream. It is not possible to use straight XML as encapsulation because XML cannot enclose binary data unless encoded as Unicode. The use of Unicode would introduce too much overhead. Therefore, an encapsulation format that could handle binary bitstreams and textual frames was required. The following list gives a summary of the requirements that we had for the Annodex format bitstream: • Framing for binary time–continuous data and XML. • Temporal synchronisation between time–continuous media bitstreams and XML on interleaving. • Temporal re–synchronisation after parsing error. • Detection of corruption. • Seeking landmarks for direct random access. • Streaming capability (i.e., the information required to parse and decode a bitstream part is available at the time at which the bitstream part is reached and does not come e.g. at the end of the stream). • Small overhead. • Simple interleaving format with a track paradigm.
Figure 4: Merging XML markup with media files.
4. DISTRIBUTION The distribution over the Internet of media documents in Annodex file format is based on URIs, similar to the distribution of HTML pages for the World Wide Web. Annodex media files are accessible via any of the protocols currently used to transport media formats, e.g., RTP/RTSP or HTTP. The basic process for the distribution and delivery of an Annodex media file is the following: A client dispatches a download or streaming request to the server with the specification of a certain URI. The server resolves the URI and starts packetising an Annodexed media document from the requested anchor or time, issuing a head tag at the start. Additionally, the fragment may be specified by a temporal offset or a named anchor tag.
As an alternative to streaming/downloading Annodexed media from a URI, we also envisage that different applications may prefer to retrieve only either the continuous media data or the CMML transcription. Examples are browsers which cannot handle the XML markup, and information collection applications such as search engines which do not require the media data, but just the textual annotations. This is possible via a content–type flag in the client request (see Figure 5).
6. RELATED WORK Hardly any user of the World Wide Web today can get along without extensive use of search engines. These engines, however, are still focused on text–based information retrieval, struggling with audio and video content. Existing work towards enabling searching of audio/video can be subdivided into three major directions, namely research, standards, and products.
6.1 Related Research Many research projects are investigating the idea to semantically search images, audio, and video. The general goal is to be able to retrieve the content of these media files by searching e.g. images showing the portrait of Franz Liszt or audio files with songs of Monty Python. Automatic speech recognition is a promising but not yet mature technology for content retrieval of audio files. Feature extraction, structure analysis, clustering, indexing, and annotating meta data are intensely investigated. Among many others, the publications [16] and [4] give an excellent overview of the ongoing efforts in this direction.
Figure 5: Network view of the Continuous Media Web.
5.
INFORMATION RETRIEVAL
For viewing Annodexed media documents, a special player or browser plugin is still necessary. This application has to split an Annodexed media document into its constituent header and anchor tags, and the media data (see Figure 4). A decoder is required for the given media encoding format to display the underlying media data. While playing back the media data, the application displays the hyperlinks and the annotations for the active fragment. In case the displayed media data is a file and not a live stream, it is even possible to display a table of contents extracted from the annotations of the file and browse through the file based on that. The hyperlinks allow the user to freely link forth and back between Annodexed media fragments, HTML pages, and other Web resources. This is transparent to the user, i.e., the user “surfs” the Annodexed media documents in exactly the same way that he/she is used to browsing the Web, because Annodexed media seamlessly integrates with the existing Web. Search engines can include Annodexed media files into their search repertoire quite quickly, because they are able to find annotations in the anchor tags in a standard way independent of the encoding and packetising format of the media data. This allows any media format to be spidered. In addition, the protocol should allow to download only the CMML markup from a published Annodexed media file. This will stop spiders from creating extensive network loads. It also reduces the size of search archives, even for large amounts of published Annodexed media, because a CMML file contains all searchable annotations for the media fragments of its Annodexed media.
Research projects have investigated thoroughly how to extract syntactic and semantic content through signal processing from digital audio and video [15] [1] [7]. Yet, it is quite unclear how to make good use of the extracted information. The CMWeb offers a framework for storing results from automated audio and video analyses as long as they can be described in textual form. The CMWeb work also differs from other research work by integrating the concepts of hyperlinking, textual annotations and metadata to enable browsers to point from and to fragments of continuous media and thus to integrate their browsing into the normal “surfing” of the Web.
6.2 Related Standards In the following we briefly summarize the three standards MPEG–7, MPEG–21 and SMIL, and compare their aim and capabilities to the Continuous Media Web.
6.2.1 MPEG–7 MPEG–7 [8] is an open framework for describing multimedia content, providing a large set of description schemes to create markup in XML format. MPEG–7’s markup is not restricted to textual information only — in fact, it is tailored to allow for the description of audio–visual content with low– level image and audio features as extracted through signal processing methods. Since MPEG–7 was not built with a particular aim on Web applications, it has not much resemblance to HTML. Instead, our CMWeb technology provides for a HTML–like textual markup in the CMML file. It allows an inclusion of its markup into the time–continuous data stream (in the Annodex file format .anx) to enable synchronised mark–up and media streaming, a feature not deployed in MPEG–7. Furthermore, CMWeb extends the URI addressing to clips of time–continuous data through use of temporal offset [13] and named fragment addressing. It is possible to reference annotations created in MPEG–7 from inside an Annodex format bitstream, or to include a subpart directly into the CMML of an Annodex format bitstream through the meta and desc tags.
6.2.2 MPEG–21 MPEG–21 [3] is building an open framework for multimedia delivery and consumption. It focuses on addressing how to generically describe a set of content documents, i.e., digital items that belong together semantically, including the information necessary to provide services on these digital items. As an example, consider a music CD album. When it is turned into a digital item, the album is described in an XML document that might contain references e.g. on the cover image, the text on the CD cover, the text on an accompagnying brochure, references to a set of audio files that contain the songs on the CD, ratings of the album, rights associated with the album, information on the different encoding formats of the music, different bitrates that can be supported when downloading, etc.. This description supports everything that a user would want to do with a digital CD album: It allows the management of it as an entity, description with metadata, exchange with other users, and collection as an entity. In comparison, our CMWeb is focusing on a much smaller task. Its only aim is to integrate time–continuous data files into the existing World Wide Web by making clips accessible through URIs and searchable through textual search engines. Thus, the above music CD example would be represented in the CMWeb as one large audio file on a Web server that consists of a concatenation of the songs of that album, interspersed with XML markup of relevant points where a new song starts. The Annodex bitstream contains textual meta information which allows the different songs to be searched through a Web search engine. All this is considered as one Web resource. There may be hyperlinks to other Web resources that represent the cover image or the accompagnying brochure, but they are not part of the album Web resource. Therefore, in CMWeb, it is not possible to describe the kind of entity that is represented in an MPEG–21 digital item. But by focusing squarely on time–continuous data files only, by providing a markup language that is similar to HTML, by enabling the time–synchronous storage of that markup in the media bitstream and by extending the URI linking scheme to address clips of time–continuous data files, we can leverage of existing Web infrastructure. We expect that Annodex media will become part of the formats that MPEG–21 digital items can hold.
6.2.3 SMIL The Synchronized Multimedia Integration Language (SMIL) [17] is an XML–based language to author interactive multimedia presentations. SMIL 2.0 enables the description of temporal behaviour of multimedia presentations, association of hyperlinks with media objects, and description of the layout of the presentation on screen. SMIL documents mainly focus around the composition of on–screen events in a sequential or parallel manner. In contrast to our CMWeb, SMIL does not enable the annotation of clips with metadata, but only the inclusion of metadata for the complete SMIL document. Thus, search engines cannot identify the subsegment of particular interest. Also in contrast to the CMWeb, SMIL documents do not include the media data itself. They only contain references and the data is retrieved as required — some may never be needed as the user does not activate that bit of
the SMIL presentation. SMIL presentations are also not inherently time–linear in their presentation — they may contain presentation loops and conditionally activated content. Thus, two different users viewing the same SMIL document may get a completely different experience. A recording of such an experience would create a media file that could be Annodexed.
6.3 Related Products Several search engines in the Internet have done a major effort to incorporate audio and video into their search. The following multimedia search engines serve as good examples. The project Webseek @ Columbia University with its Web site http://www.ctr.columbia.edu/webseek is a content– based image and video search and catalog tool for the Web. The engines http://www.alltheweb.com and http://www. altavista.com each have several categories to search, including images, audio, and video. The search results are based on a match with the the annotated descriptions of the respective files. In altavista, matches between the seach term and the annotated metadata of an audio or video file display not only the textual information and the link, but for video, also a key frame of the video is displayed. In any case, two differences are apparent with regards to the Annodex file format: Firstly, the media content is classified according to its physical appearence and secondly, a match embraces the whole media document, without the feature of temporal offsets or intervals.
7. SUMMARY The Continuous Media Web presented in this article integrates the concept of hyperlinking, searching, and retrieving clips of media with the concept of the World Wide Web. The following new ideas have been developed and implemented into the CMWeb: • Integrated concept of hyperlinks, meta tags, and textual annotations for one fragment as an entity in itself. • Integration of metadata in a time–synchronous manner with the media data itself to solve synchronization issues and to provide an integral entity for distribution. • HTML–like annotation format for media.
8. OUTLOOK Currently, we are developing browsers for the Continuous Media Web for the Linux and Windows platforms and improving our Mac OS X browser, as well as implementing a Web server plugin for the Apache. Furthermore, our development plan includes an RTP/RTSP implementation and a transport format definition, proxy support, the support of search engines (first contacts are under way), and an easier handling of the application by means of an GUI authoring application. CSIRO’s own proprietary search engine Panoptic already provides support for Annodex. We are in contact with the three main developers of streaming audio and video — namely Apple, Real, and Microsoft — to pursue the rollout of the CMWeb, and are working with the Internet’s standards bodies such as the IETF and W3C to promote the technologies underlying the CMWeb as open, free standards.
Acknowledgments The authors would like to thank the students Andrew Nesbit and Andre Pang who joined the development team in the summer 2002, along with Simon Lai who became the first person to author meaningful content in CMML. During this time, the basics of the Annodex technology were designed, including the design of temporal URI fragments, the basic DTDs, the choice of the Ogg encapsulation format and the initial design of the libraries. Andre Pang also implemented a browser for Mac OS X.
9.
REFERENCES
[1] Movie content Analysis Project (MoCA). http://www.informatik.unimannheim.de/informatik/pi4/projects/MoCA. [2] T. Berners-Lee, R. Fielding, U. Irvine, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax. http://www.ietf.org/rfc/rfc2396.txt, August 1998. [3] J. Bormans and K. Hill. MPEG–21 Overview, Version 5. http://www.chiariglione.org/mpeg/standards/mpeg21/mpeg-21.htm, October 2002. ISO/IEC JTC1/SC29/WG11 N5231. [4] N. Dimitrova, H.-J. Zhang, B. Shahraray, I. Sezan, T. Huang, and A. Zhakor. Applications of Video–Content Analysis and Retrieval. IEEE Multimedia, 9(3):42–55, July–September 2002. [5] Dublin Core Metadata Initiative. Dublin Core Metadata Element Set, Version 1.1. http://dublincore.org/documents/2003/02/04/dces, February 2003. [6] The Internet Engineering Task Force. http://www.ietf.org. [7] R. Lienhart, S. Pfeiffe, and W. Effelsberg. Video Abstracting. Journal of the ACM, 40:55–62, 1997. [8] J. M. Martinez. MPEG–7 Overview, Version 8. http://www.chiariglione.org/mpeg/standards/mpeg7/mpeg-7.htm, July 2002. ISO/IEC JTC1/SC29/WG11 N4980. [9] S. Pfeiffer. The Ogg Encapsulation Format Version 0. http://www.ietf.org/rfc/rfc3533.txt, May 2003. [10] S. Pfeiffer and C. Parker. Annodex: Open Standards for Annotating and Indexing Networked Media. http://www.annodex.net, 2003. [11] S. Pfeiffer and C. Parker. Specification of the ANNODEX(TM) annotation format for time-continuous bitstreams, Version 1.0 (work in progress). http://www.ietf.org/internet-drafts/draftpfeiffer-annodex-00.txt, 2003. [12] S. Pfeiffer and C. Parker. Specification of the Continuous Media Markup Language (CMML), Version 1.0 (work in progress). http://www.ietf.org/internet-drafts/draft-pfeiffercmml-00.txt, 2003.
[13] S. Pfeiffer and C. Parker. Syntax of temporal URI fragment specifications (work in progress). http://www.ietf.org/internet-drafts/draft-pfeiffertemporal-fragments-01.txt, 2003. [14] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP). http://www.ietf.org/rfc/rfc2326.txt, April 1998. [15] B. Shahraray and D. Gibbon. Pictoral transcripts: Multimedia processing applied to digital library creation. In Proc. IEEE 1st Multimeida Signal Processing Workshop, pages 581–586, Princeton, NJ, USA, 1997. [16] Y. Wang, Z. Liu, and J.-C. Huang. Multimedia Content Analysis — Using Both Audio and Visual Clues. IEEE Signal Processing Magazine, 17:12–36, November 2000. [17] World Wide Web Consortium (W3C). Synchronized Multimedia Integration Language (SMIL 2.0). http://www.w3.org/TR/smil20, August 2001.