Networked Hyper QuickTime : Video-based ... - Semantic Scholar

Networked Hyper QuickTime : Video-based Hypermedia Authoring and Delivery for Education-On-Demand Wei-hsiu Ma, Yen-Jen Lee, and David H.C. Du1 Distributed Multimedia Research Center Department of Computer Science, University of Minnesota

Mark P. McCahill2

Distributed Computing Services, University of Minnesota

Abstract It is generally recognized that the Internet can be used as a platform to deliver distance education services and that Internet information systems can be leveraged to enhance education. The Internet provides shared access to resources and media-rich material to augment traditional instruction. Much of the current educational use of Internet information systems is for publishing collections of documents and for providing hypermedia links to other information. However, menu-based hypermedia systems (Gopher) or document-based hypermedia (WWW) do not address the need for video-based hypermedia as a method for delivering annotated presentations and lectures. Networked Hyper QuickTime (NHQT) is a prototype education-on-demand (EOD) system designed to deliver hypermedia annotated video over the Internet for low-cost desktop PC systems. It also provides easy-to-use editing functions to embed annotations and hypermedia links into a video. These handy operations make authoring viable in terms of time and simplicity for busy instructors without any experience in video editing. NHQT uses WWW, Gopher, and QuickTime technologies to deliver video streams with embedded Uniform Resource Locators (URLs) pointing to ancillary documents or video streams. By incorporating URLs in a video presentation, the users can follow hypermedia links to documents and video segments which are pertinent to the section of the video being viewed. NHQT's video-based user interface supports a full range of user interaction, including VCR-style playback control, random positioning, content-based search, and automatic resolution of hypermedia links for Internet access. A testbed of ATM network for Macintosh and workstation platforms has been built to study the future high speed network infrastructure with NHQT.

Keywords: Video-based Hypermedia, Distance education, Internet, QuickTime, Gopher, World Wide Web.

1 2

This work is supported in part by NSF Grant CDA-9502979 and a gift from IBM.

E-mail addresses are fwma, ylee, [email protected] and [email protected]. Fax number in the US is (612) 625-0572.

1

1 Introduction In recent years, the entire network computing paradigm has shifted from local-area or campuswide networks to the Internet. The exponential growth of the Internet has resulted in a large and growing number of people with access to the information and services available on the Internet. This convergence on the Internet as a standard vehicle for delivering information and the global nature of the Internet makes it very attractive for publishing. Instant access to globally distributed information has popularized hypermedia for on-line publishing, and it is likely that various forms of Internet mediated hypermedia will become a dominant form of publishing. Taking advantage of these trends to deliver distance education is one of the most exciting challenges for the existing educational institutions. In this work, we address the concept of video-based hypermedia and develop an authoring and delivery system based on an extension of the QuickTime technology. Its primary feature, synchronized media streams over the Internet, can be applied to varied distance educational purposes.

Objective

Many people have limited access to the traditional classroom setting due to various constraints, such as, time, distance, physical disabilities, transportation limitations and expenses, or non-school commitments (e.g. children at home, job) [1]. Education-on-demand (EOD) system provides distance learning opportunities for continuing education. Even for traditional full-time students, distance learning can provide attractive options with great exibility. A core problem in providing distance learning service is the production and integration of the course materials. This process is very time-consuming. Even if publishers do the routine work, such as making web pages or encoding video, the instructors still need to make decision regarding the arrangement and integration. However, the busy instructors usually do not have time and are not interested in learning complicated authoring tools to design and arrange the lecture. The publishers really need easy-to-use authoring tools to motivate the instructors, which help them to produce digital courseware in short time. In this work, we develop a simple tool to help the instructors combine their class lectures and Internet resources such that these materials can be delivered to distance learners through EOD service. Our objectives include:

Create an EOD system over the Internet from which students can access courses and class materials by computers at remote sites on a exible schedule. Provide authoring functionality so that instructors can create lectures and arrange dierent media into a lecture easily. Provide students a video-based learning environment with synchronized slides such that they can have in-class-like experience. This environment must make it easy for users to control the progress of a lecture. Integrate dierent media, such as video, audio, music, text, images and animation together to satisfy dierent educational purposes. Design video delivery subsystem to provide lecture or media delivery over the Internet with lowbandwidth requirement instead of retrieving large les into local machine. 2

Utilize the related materials and resources on the Internet in lectures. Extend learners' view over the whole Internet. Implement an EOD application on PC-based platform. Utilize the freely available Internet resources with component design methodology to reduce the system cost and complexity.

Video-based Hypermedia

In the early days of the Internet, information was typically published as text documents. Two popular rst generation distributed Internet information systems (Internet Gopher and World Wide Web) changed this model by introducing hypermedia to casual users and by making it easy to use the Internet as a publishing tool. The Gopher system can be thought of as a hypermedia menuing system where items in Gopher menus are links to objects distributed across servers on the Internet [2]. WWW popularized a form of hypermedia in which documents contain links to other documents residing on distributed servers [3]. Other forms of hypermedia, such as 3D virtual reality with embedded links to Internet resources are being developed [4]. The next step in this evolution will take advantage of advances in network and desktop computing capabilities to popularize video-based hypermedia. By incorporating hypermedia links, to resources on the Internet, into video-based media, it is possible to transform a linear media (such as a video of a lecture) into hypermedia. A hypermedia video has embedded links to other relevant videos, documents on WWW and Gopher servers, and other objects on the Internet. A user viewing a hypermedia video has the option to get more information at any time, and the viewing software automatically resolves the hypermedia links to the relevant information. This is something like asking a question during a live presentation; the presentation pauses while the presenter tells you more. Rather than simply presenting a video stream, video-based hypermedia is annotated so that users who desire more information about a given section of the presentation can jump to this information as easily as they navigate WWW pages or Gopher directories. Such a hypermedia video presentation is also searchable based on keywords or the text of the narrative so that users can quickly jump to a given section within the video. There is a well-known problem in WWW : the users easily get lost in the current Internet browsers especially when the pages are not organized well. It is possible to avoid this problem with video-based hypermedia if the users always keep the primary video to indicate the location of the current browsing. Due to the features of video-based hypermedia, this paradigm is a natural and eective way to represent a lecture for distance education.

Hyper QuickTime and Networked Hyper QuickTime

It is clear that there is a need for a simple, elegant tool to deliver lecture and presentation based education over the Internet. To address this need, we have recently developed Hyper QuickTime (HQT) to validate the video-based hypermedia concept and then extended this system to develop Networked Hyper QuickTime (NHQT) as a prototype system for education-on-demand. HQT presentations are intended to be an enhanced form of video media which captures some of the nuance and expression that can make a good in-person presentation compelling and combines this with annotations pointing to more detailed information. Annotations and cross references are embedded in the video as URLs pointing to Gopher directories, WWW pages, or other video segments. By taking advantage of Apple's QuickTime software architecture we are able to synchronize embedded text narration and URL tracks with an audio and video stream served from either a networked video server or from the user's local disk. Video-based media simpli es authoring material since it is easy to re-purpose a video of an in-person 3

presentation into a digitized presentation that can be annotated and served to distant users. HQT videos can be delivered either from local hard disks or CD-ROMs. In the case of low-bandwidth networks, reading video from a local disks while resolving hypermedia links to WWW and Gopher servers makes it possible to provide hypermedia annotations to video in the absence of a video server and fast network. A more exciting scenario assumes enough network capacity and video servers to present networked hypermedia video to users on demand; in this case the hypermedia annotation can refer to other videos as well as documents. Realizing of this vision requires video servers and clients capable of interacting with video servers as well as more mundane Internet services. Whether the video is served o the network or from a local disk, the HQT software architecture uses existing Gopher/WWW clients to resolve references to Internet resources. To access networked video streams, the NHQT software has a video stream software module to retrieve data from a video server through the Internet or an ATM (Asynchronous Transfer Mode) [5] testbed network. The video server is designed to support a continuous media stream delivery according to the requests of the client software module. One objective of this implementation is to support delivery for long lectures without requiring excessive disk space on the client. Another objective is to develop a software architecture that can accommodate both IP and ATM networks so that the system can have smooth transition with the infrastructure migration to high speed networking. The organization of the paper is as follows. Section 2 describes related work. Section 3 illustrates the features of HQT which is video-based hypermedia authoring and presentation tool. Section 4 provides the overview of the environment upon which NHQT is developed and the whole system design in detail. Section 5 demonstrates various applications to which NHQT can be applied. Section 6 concludes the paper and discusses future work.

2 Related Work Our work covered several related areas including hypermedia system, networked streaming video, and distance education system which were either studied by some academic institutions or have been developed as commercial products.

Hypermedia System

Besides the two popular hypermedia-based systems, Internet Gopher and WWW, there are several dierent sophisticated hypermedia models, e.g., Hyper-G [6] (named after HyperWave as a commercial product) developed by Graz University of Technology. The main features of the Hyper-G are bidirectional hyperlink and referential integrity maintained by cooperating Hyper-G servers. Film player, audio player, text viewer, image viewer, etc. are separate media displayers. Anchor authoring is done with authoring function associated with individual media displayer. There's no notion of timeline based synchronization between any two media objects like NHQT. The Amsterdam hypermedia model tries to give a framework suitable for describing hypermedia in general [7]. Dartmouth College has published multimedia proceedings on CD-ROM [8]. They produced hypermedia materials and mainly used synchronized slides and audio with a looped short video clip to replay the presentation in a conference. All of the above have complicated authoring process to achieve synchronized media production. NHQT was developed for video-based hypermedia, not a general hypermedia system, but it could still provide a mixed media presentation environment. The authoring is much easier and can 4

especially cope with the Internet objects. There are some other hypermedia systems for speci c purposes, such as the navigation in movie-only hypermedia [9]. This work provided a tool to traverse linked movies like a tour. NHQT basically links the static documents, and is also able to use hyperlinks pointing to other movies. Hjelsvold [10] has created video archive tools including a video annotator which can store video annotation as meta-data. The annotation is used to structure a video and service search queries. Not only using annotation for keyword search, but NHQT can also reference Internet documents. It is also possible to use annotation to structure a lecture movie in NHQT.

Networked Streaming Video

Networked streaming video is more and more popular over the Internet and has been explored in a great deal. Many players for streaming video can be freely downloaded from some companies, such as InterVU, Xing Technology, VDOnet, Vivo Software, etc [11]. Some of these companies provide video servers running their own network protocols. Others can just ask Web server to deliver video using Hypertext Transfer Protocol (HTTP). Vosaic [12] derived from NCSA Mosaic incorporates real time video and audio into standard Web pages. England, etc. from Bellcore developed RAVE system to provide real-time services for the Web [13]. RAVE servers reside on both the client and the server machines such that the real-time multimedia trac is exchanged between the RAVE servers. This architecture can be used for one-way video delivery or video conferencing in both directions. NHQT provides streaming of synchronized media objects over the network, not just video or audio. This feature embeds extra information in networked streaming video and is capable of providing more interesting viewing experience.

Distance Education System

Many schools and research groups have been using World-Wide Web to construct distance education environments on the Internet based on Web browsers [14] [15] [16]. These environments are basically con ned by the traditional presentation of static documents. The users may get lost after traversing many links if the materials are not structured carefully. NHQT provides video-based materials and lecture-like presentation. It also transforms a linear media, such as a video of a lecture, into hypermedia by linking the lecture to other media. North Dakota State University and Cornell University developed projects in creating virtual classroom by integrating software on the Internet, such as news group, audio software and video conferencing [17] [18]. Their emphasis was on lectures in real time, but NHQT provides EOD service which means that learners can request stored lectures according to their schedules. Schnepf, etc. presented a medical education environment over ATM networks [19] which was used for a special educational purpose. NHQT is able to support various educational purposes, such as e-mail training, medical, language or music education in light of its user interaction functionalities and media-rich environment. Standford University is currently carrying out an on-demand education system called Asynchronous Distance Education ProjecT (ADEPT) [20]. The goal of ADEPT is to provide on-demand access to selected Stanford engineering classes to remote professionals and students on campus. The video clips can either be downloaded through the Internet or transmitted in real time through high speed ATM network. Lecture notes and handouts are separately converted into either PostScript or PDF formats. NHQT takes advantage of the video-based hypermedia concept such that the video and the lecture materials are linked and synchronized together. 5

3 Video-based Hypermedia in Hyper QuickTime HQT is developed based on the concept of video-based hypermedia to present hypermedia video, which enhances a stored lecture so that the viewing experience is not completely passive; users can pause the video and follow links embedded in the video for more information about topics that interest them. This section rst introduces QuickTime, the technology base of HQT. Then, we discuss the concept of a hypermedia video in HQT and the QuickTime movie format tailored down for HQT. Graphical user interfaces are illustrated to demonstrate the user's and author's control over the movie. Some implementation issues are also discussed.

3.1 QuickTime Technology QuickTime is a technology developed by Apple Computer for time-based media on Macintosh and PC-based platforms [21]. The QuickTime system extension provides a set of functions and data structures for applications to edit, save, display, and control QuickTime movies. For HQT, the most important feature of QuickTime is that movie may contain several logical tracks which are synchronized. Each track refers to a media such as text, image, audio or video, so there is room in the QuickTime data architecture for application-de ned data to be carried along with audio/video stream. QuickTime supports a variety of video compression and decompression schemes including MPEG-1, MPEG-2 and several software-based codecs. To support MPEG video, QuickTime requires a hardware codec in user's machine and higher network bandwidth to meet stringent playback requirements. HQT uses the default software codecs in QuickTime to deal with the video playback. We chose QuickTime as our basic video architecture because: (1) The standard QuickTime software codec is freely available on Macintosh and 80x86-based PC platforms; (2) Its multiple track data architecture can support many media types including text, image, audio, MIDI(Music Instrument Digital Interface), animation and video; (3) Internal synchronization mechanisms are provided between dierent media during the playback; (4) The software architecture is extensible, which is critical in development phase and in preserving user investment.

3.2 Hypermedia Video A hypermedia video consists of at least four tracks: video, audio and two text tracks. One text track, called the TEXT Track, can be used to store indexes, the text of the narration or annotations. The other text track called the URL Track stores URLs. These URLs point to other related text, images or movie documents on the Internet to provide users more information about the corresponding sections of the video. The left-hand side of Figure 1 illustrates their synchronization based on the timeline or the frame sequence of the video. Each block in the two text tracks represents the duration of its narrative. For example, during the period of time (or frame number) X and Y, the contents of the URL track is URL2. The narration text will change as the video progresses. Based on the model, there are two basic authoring functions to be addressed, insertion of the text content and deletion of a section of the video. Before adding a text block into the hypermedia video, the user must indicate the starting and ending points (e.g. X and Y). Then the content (URL2) can 6

Video

Audio

Text

Cut Text2

Text1

URLs

URL2

URL1 X

P

URL3 Y

Q

Text3

Text1

URL4

URL1

Time or Video Frame Sequence

Text2

URL2 URL3

Text3

URL4

Time or Video Frame Sequence

Figure 1: Synchronization of dierent tracks before and after the cutting operation be inserted into the assigned position of the speci c text or URL track. The other scenario to cut a period of the video is also demonstrated in Figure 1. The shaded area between P and Q is the target of the deletion. The result on the right-hand side of the gure shows that the four tracks are kept synchronized after the cutting operation. Annotations can be embedded into the TEXT Track such that the author can provide a short comment or note associated with a period of video. A hypermedia video should also be searchable based on the keywords in the annotations of the TEXT Track so that the user can quickly jump to a given section of the video. Potentially, it is possible to have the following extended usages on the TEXT Track. If the author wants to organize the video, this track can store the segmentation information for the video. For example, the section or subsection numbers can be included in the track to structure the linear presentation in a hierarchical way. Another interesting thought is to embed polygon information, such as the point coordinates, to achieve the inline hyperlink on the video display. Video-based hypermedia can be applied to present a lecture.. Dierent instructors have their own teaching styles and preferences. However, consider the situation where some of the in-class presentations mainly consist of the instructor's speech and slide show. During the progress of the lecture, the instructor changes the slide and gives more explanation about the content of the slide. It implies that each slide change is made at some speci c point of the lecture. If the hypermedia video consists of the instructor's video/audio and slides which are pointed and synchronized by the URL Track, the videobased hypermedia model is very suitable for this scenario. Hence, the users can have similar in-class-like experience when they view the hypermedia lecture video. Under the simple four-track model, hypermedia video becomes very powerful by applying the tracks to dierent functionalities. However, it is not necessarily limited to four tracks. Some more TEXT Tracks or URL Tracks can be inserted to store dierent information or provide multiple hyperlinks. The audio track in dierent language or music can be also added in. Too many tracks may cause the performance problem which should be considered during implementation.

3.3 User Interfaces and Operations HQT provides two user interfaces, an authoring tool used to add narrative text and URL tracks 7

Figure 2: User Interface and a browser-only version. The user interface with authoring capability is illustrated in Figure 2. There are ve control buttons, a video window with a control bar and two text areas at the bottom. A square indicator on the control bar indicates the current position of the lecture. The bar provides random positioning by moving the square indicator and also VCR-style functions, such as play, pause, fast-forward and fast-rewind. The two text areas display the corresponding text and URL when the video is played or moved. The video and audio could be encoded by hardware from a video tape or an animation created by using multimedia authoring tools. The text and URL tracks can be added on-the- y by HQT in order to form a hypermedia lecture movie. The next step is to mark up the movie with hypermedia links and the text narration track. Adding annotations or URLs is easy. The instructor can mark a period of movie (e.g. the shaded area on the control bar in Figure 2) by dragging the indicator and update the text or URL track for that period. This functionality helps out the busy and inexperienced instructors to manipulate video. They will be able to augment and integrate the video with their own comments and the Internet documents in a quick way. Note that once the narrative text and URL tracks have been added to the video, they will remain synchronized with the video and audio tracks even if all or part of the movie is edited using other QuickTime capable video editors. This means that it is possible to use many commercial editing tools in addition to the HQT authoring tool. In Figure 3, the interface of the browser-only version hides the authoring functions to prevent users 8

Figure 3: Search a Keyword in the Browser Version from changing the track contents. The text box showing URLs is invisible such that the users can concentrate on the contents without knowing the address details of the references. The user can use Find... button to search keywords in the TEXT Track. The lecture will be moved to the point at which the contents of the TEXT Track contain the keywords. The dialog box of search function is also illustrated in Figure 3. There are two modes for invoking the URL, explicit mode and implicit mode. In the explicit mode, clicking the More Detail button invokes the appropriate Internet browser application (typically TurboGopher, MacWeb, or Netscape) and passes it the current URL to resolve. The Internet browser then retrieves the item referred to by the URL and displays its content. If the Gopher or WWW browser is con gured to use HQT to display Quicktime video, a URL pointing to another video segment eectively branches the video display to that segment. In the implicit mode, each hyperlink is fetched and displayed automatically by the WWW/Gopher browser instead of the user clicking on the More detail button. Thus, the content of the browser is synchronized and updated spontaneously as the video progresses. It is similar to the instructor changing slides during an in-class lecture. If the instructor has already created slides on Web pages and their URLs are embedded in the video, the users can have a pretty complete presentation experience by viewing video and slide show at the same time. 9

3.4 Implementation Issues We used CodeWarrior 5.0 (upgraded to 6.0 later) as the compiler under MacOS 7.5.1 when we developed HQT in C language. HQT is built on top of QuickTime technology which already provides many functionalities and APIs (Application Programming Interfaces) to manipulate the media and tracks in a QuickTime movie. The user interface of video and audio, including the control bar and the video display window, is supported by the movie controller component in QuickTime. The synchronization between dierent tracks is maintained by the QuickTime internally. The two text tracks have dierent internal user-de ned attributes such that HQT can distinguish these two which have the same media type, i.e., text. Each text track is associated with a callback function. For example, HQT assigns a callback function for the URL Track when HQT opens a QuickTime lecture movie. Thereafter QuickTime informs HQT through this function if necessary. Whenever the movie reaches a point at which the text changes, the function is called and the content of new text (URL) is returned to HQT. Thus, HQT can keep this URL and update the content on the bottom text area for the URL Track. If the users click the More Detail button, HQT fetches the current URL. HQT uses the same method to deal with the TEXT track. Rather than building full-featured WWW/Gopher browsers into the HQT viewer, we take advantage of existing browsers to resolve URLs by sending external browsers a "get this URL" message when appropriate. This message is an Apple event on Macintosh platform. For dierent protocols, such as HTTP, Gopher, FTP and NNTP, HQT invokes appropriate helper applications to display the information at this URL. A preference le allows user to change these helper applications.

4 Synchronized Media Delivery in Networked Hyper QuickTime HQT works well for the movie le local to the viewer. However, it suers from the long fetching time if a large size movie has to be downloaded from a remote site. To build a true EOD system, it is desirable to equip HQT with the capability to deliver lecture movie over the Internet. By extending QuickTime, NHQT is the enhanced version of HQT to deal with the synchronized media streaming through the network. This section covers the rationale for the environment upon which NHQT is built. We describe the software architecture of the system and the client-server mechanism to provide network delivery.

4.1 Environment Overview Figure 4 illustrates the three parts of the conceptual NHQT system architecture: end-users, network infrastructure and servers. The end-users send requests to the servers in order to retrieve information or resources from the servers through the network cloud. The servers response the requests and transmit documents or streaming media. The network infrastructure of NHQT is basically the Internet, which we expect to continue its evolution by embracing emerging high-speed networks. The infrastructure of the Internet will gradually migrate and incorporate high speed networks in order to handle large number of users and the applications which require high aggregate bandwidth. 10

Network Infrastructure

End Users

Gopher Server

End Users

Web Server

The Internet or ATM Clouds

Lecture Server

End Users Requests

Lecture Streams or Materials

Figure 4: Environment of Education-on-Demand System Under an NSF research grant, we are developing an extensive ATM campus-wide network including high-end workstations and PCs. A small-scale ATM testbed within the campus-wide ATM network has a Fore System ASX-200BXE ATM switch connecting a heterogeneous workstation environment with one Power Macintosh 8100/100AV, one Power Macintosh 7100/66, one Sun SparcStation 10, and one SGI high-end Challenge machine. To explore how NHQT could work over future high-bandwidth networks, we built this ATM testbed network using the Macintosh as the user workstation to assimilate sender-receiver interaction within the high-speed network infrastructure.

4.2 Software Architecture Application

Data Flow

QuickTime Movie Toolbox

Image Compression Manager

Image Compression Component

Video Output

QuickDraw

Video Media Handler

Sound Media Handler

Sound Manager

Sound Ouput

Data Handler

Movie Data

Figure 5: Software Modules of QuickTime for Movie Display Internally, QuickTime consists of software components (modules), each of which provides a de ned set of services. These components take the responsibilities for tasks such as user interface, video decompression, manipulating media, etc. Figure 5 illustrates the architecture of the components involved when QuickTime displays a movie with a video and an audio track. Movie Toolbox receives the function calls from application and calls the related components in next level to continue the process of the 11

function calls. Media handlers send data requests to data handler which does the actual reading and writing of media data. This modular internal architecture makes it possible to replace the standard data handler with network-aware data handlers. In NHQT, we developed a Network Client Module which contains a novel data handler and is able to deal with the data in memory, local disk or through network. From Figure 5, the data handler isolates the real data retrieval from the application and the upper layer modules, such as media handlers. Thus the application and the media handlers are not aware of where the data comes from. The media handlers perform the same actions no matter which data handler it is servicing, i.e., they request data from data handler by sending two messages, the position of the movie le and the requested data length according to the current playback or user's control. Then, the Network Client Module transmits the messages to the video server instead of reading from the local disk if the le is at the remote site. The video server reads its local movie le and sends the requested data segment back. The architecture of the software modules and the control paths for NHQT are depicted in Figure 6. The module of User Interface and Interactions sends control signals to dierent software modules according to the user's interaction. For example, when user clicks on the Add URL Track button, the module of Add Text Tracks calls the functions of QuickTime to add a URL track into the current movie. Client Add TEXT track or Add URL track User Interface and Interactions

Update track

More detail

Find...

Add Text Tracks

Update Text Tracks

Search Engine

VCR-like Functions

URL Module VOD URL

Normal URL

Server

QuickTime API

Gopher, WWW or Video Server Network Client Module

WWW or Gopher

Other Applications

The Internet or ATM network Movie Data in Local Disk

Figure 6: The Architecture of the NHQT Software Modules Whenever a user wants to get more information, two scenarios may occur. In the rst scenario, the URL is a \normal" URL. The URL module will invoke WWW or Gopher engine at the client site to retrieve desirable information into the local machine from a WWW, Gopher or FTP server by using HTTP, Gopher or FTP protocols. This approach is good when the amount of the information does not occupy too much storage space. The WWW or Gopher engine at the client site caches the information to the local disk and the transfer is typically within fairly short period of time. In the second scenario, the URL is an \VOD" URL. This is suitable for a large movie (perhaps of a long lecture or music) to 12

avoid consuming too much local disk space and an excessively long delay to download the entire movie before playback. To handle references to a remote movie, the URL Module sends the \VOD" URL to the Network Client Module which establishes a connection with a predetermined video server. The server sends a synchronized mixed media stream according to the request from the client site.

4.3 Mechanism of Client and Server The network subsystem in NHQT enhances the original HQT capability to handle the continuous media stream. From Figure 6, the network subsystem consists of a Network Client Module at client site and a video server at remote site. The module deals with data retrieval for dierent media, such as video, audio or music from local disk or video server. The client-server architecture for the network subsystem separates the data access from data retrieval. If the data is stored on a local disk, the module serves this data to the application through standard memory or le system primitives. If the data is stored at remote site, the module acts as a client to fetch data from the remote video server which retrieves data from server disk, or may recursively relay requests (or data) to (or from) other servers. This network subsystem design not only supports NHQT, but also provides a generic network service for the applications built on top of QuickTime. Each hypermedia movie with multiple tracks can be attened into one le. The dierent types of media, such as video, audio and text, are stored in an interleaving way. Thus, the synchronized media can be delivered in one single network connection based on the pattern the media is saved. This feature reduces complexity in our network protocol and server design since there are only one network connection and one le in operation during the synchronized media delivery. NHQT copes with the prefetching and caching mechanism in QuickTime. Each request from the client actually asks for extra amount of data for prefetching rather than the current need. The prefetching on the client side can reduce the jitters for video or audio and make the play back smoother. The fetched data is also cached in the client machine. It the user goes back to some point of the movie which is already cached, the player can play the cached data instead of sending request through the network. The stateless protocols, such as Gopher and HTTP, accomplish single transaction in each connection. These protocols are not well suited to supporting user controllable video streams since the client and the server have many interactions while a user is viewing a lecture movie. These protocols create much overhead to establish and tear down a connection for each request. We design a simple client-driven protocol in which the server responds only after receiving the client's requests. The state machines of the protocol used by the Network Client Module and the video server are demonstrated in Figure 7 and Figure 8, respectively. On the client side, the module tries to connect with the server rst if a user wants to view a remote video le. Then, the states of connection holding, data request and receiving data form a cycle to keep retrieving the data. Whenever a request from the media handler comes in, the client sends data request with two parameters : amount of data (in Bytes) and the starting request position in the target le. The client module puts the received data into the memory for display. The connection could be closed due to server or client failure, or due to the user's exit from this le. The server has similar state transition while sending data. After the connection is established, the server accesses the disk le according to the two parameters in the request packet. As long as the requested data is being transmitted, the server stays in the connection holding state waiting 13

Initialization (No connection) 1

Connection Setup

6 2

Connection Holding 3

5 Receiving Data

Data Request 4

1 Remote file access request from the user

4 Response from the server for sending data

2 Response from the server for accepting connection

5 Receiving all the data

3 Data request from the media handler

6 Closing the connection

Figure 7: The Protocol State Machine of the Network Client Module for the next request. The physical network can be any type of supported network medium such as Ethernet or ATM. The network protocol can be any supported network protocol which provides traditional datagram services such as TCP/IP, or other low latency protocols to support ecient transport of continuous media such as ATM AAL5. The data transmitted through the network can be a stream of commands, or a stream of media data.

4.4 Implementation Issues We would like to describe some details and results about our initial implementation for Network Client Module. The development environment for the module is the same as the one for HQT. Open Transport 1.1 was used for the development of the network functions on Macintosh. The network testbeds are Ethernet and ATM network with OC-3 bre links; the network protocol is TCP/IP. This module is a new data handler which acts as a mediator between the media handlers and the video server. Data handler is a component in Macintosh system which is a piece of code that provides a de ned set of services to one or more clients. The application can access these services through the interfaces of component manager. The application registers this component and calls its functions during the runtime. We had a hard time to deal with QuickTime movie delivery through network in the very beginning since data handler was hidden behind the QuickTime APIs. After nding that data handler is a key module in QuickTime hierarchy for reading physical data, we rst developed a data handler to read data from local disk and then replaced the read function with network primitives. After the application picks this module for playing a movie, the media handlers access data through the new data handler. The functions of VCR-style control, random positioning, search and fetching the URLs work as if the data 14

Waiting for Connection 1

Connection Setup

6 2

Connection Holding 3

5 Sending Data

Reading Disk 4

1 Connection setup request from the client

4 Finishing reading data from the disk

2 Establish the connection

5 All the data being sent out

3 Data request from the client

6 Closing the connection

Figure 8: The Protocol State Machine of the Video Server comes from local disk. Dierent types of media are synchronized well on the timeline of presentation. QuickTime has its own internal caching and prefetching mechanism to make playback smooth such that Network Client Module does not need to implement extra buering schemes. The bandwidth requirement of our test QuickTime movie ranges from about 20 KBytes/sec to several hundreds KBytes/sec. A heavily-loaded Ethernet may cause a few jitters, but ATM can handle this low bandwidth transmission without any sweat. Other kinds of media, such as MIDI and CD-quality music, can be handled well. A simple video server has been implemented on UNIX variants (SunOS 4.1.3 on Sun SparcStation 10, AIX 4.1.4 on IBM RS/6000, and IRIX 5.3 on SGI Challenge) to respond to data retrieval commands from the client. Our initial test is only from one server to one client. We are in the process of implementing a client-server entity for the server in a distributed hierarchical setting.

5 Applications NHQT provides a media-rich presentation environment including video, audio, animation, image, and text display. Users have full control of the playback of the video and can repeat sections, search the narrative track for sections containing keywords, and follow hypermedia annotations embedded in the video by the author. Because NHQT is built on top of QuickTime, presentations can consist of video, audio, and MIDI streams, and there are already a variety of authoring and editing tools which support QuickTime. Given these features, NHQT can be applied to a spectrum of dierent elds. Email Training and Internet Introduction. The Distributed Computing Services (DCS) of University of Minnesota provides hundreds of classes to train the faculty, sta, and students how to use the campus computer software and facilities. This training consumes signi cant amount of resources, and course scheduling is dicult. As a pilot project, DCS has developed a video training presentation 15

to introduce the basics of the Internet, e.g., telnet, FTP, email, Gopher and WWW. The hypermedia capabilities of NHQT are used to point to the Internet WWW and Gopher servers containing items of interest such as computer sales information, computer lab hours, documentation, and free software. Language Education. This tool has obvious applications for language education. Students can repeat any part of the lecture easily which is useful for reviewing sections which they nd dicult. At the same time, the TEXT Track can act as a subtitle or caption so that students can see as well as hear what the instructor is saying. Since a QuickTime movie can embed multiple audio tracks associated with dierent languages, a lecture can be translated in several languages and saved into several audio tracks. Most importantly, the URL Track can be used to point to either further drills for the concepts or a quiz to test the student's knowledge of the concepts covered in a given section of the video. Medical Education. The current medical technology produces a lot of diagnostic images and video by x-rays, magnetic resonance imaging (MRI), nuclear medicine (NM), etc. This data is an important reference and diagnostic tool for physicians. A medical video could present dierent views of a 3-D human brain or the movements of a patient's heart. Students in medical school need practice to distinguish between normal and abnormal situations from the video playback or the images. The video or sequence of the images can be encoded in QuickTime movie format for NHQT and used for educational purpose. Experienced physicians can embed annotation in TEXT Track to describe what kind of problem the current playback indicates, e.g, there is a tumor here or the pace maker does not work normally and provide links to more in-depth descriptions and related information for students to explore. Figure 9 is a screen capture of an NMR (Nuclear Magnetic Resonance) QuickTime movie. Art Education. A sequence of paintings or pictures can be encoded in QuickTime movie format such that a lecture is like a slide show. The TEXT Track can introduce the artist, title and background of each painting. The instructor can use the audio track for more explanation such as the deep meanings. The URL Track can point to a full screen image if the student wants to look at the high resolution image of a painting and study in detail. Music Education. Some reference URLs pointing to music clips can be added in the URL Track by an instructor of music department. When the instructor is teaching the history of Baroque music, a student can click on the More Detail button to retrieve a music clip of Bach's Brandenburg concerto or Haydn's string quartet as an example. The NQT server sends CD-quality music stream to the client such that the student can listen to high quality music during the lecture. Another possible application would be an analysis of the rst movement of Beethoven's Symphony No. 5. The instructor can encode this period of music in QuickTime music format and then use authoring functions to add annotations into TEXT Track.The annotations can explain the current music playback, for example, the main theme or the variation of the main theme. Applications beyond Education. While we have concentrated on educational applications of this technology, we can also easily envision commercial applications. There is every reason to expect that if video-based hypermedia becomes popular, some of the links will point to advertisements or commercial services. It is also reasonable to expect that popular hypermedia videos will sell space in their annotations to the highest bidder, just as popular WWW sites today charge others for incorporating links into their pages. This sort of ad-supported media may well help make the technology and content more widely available (as was the case with broadcast television). This will also probably make it easier to nd out what sort of shampoo, laundry detergent, and cat food is recommended by the person you are watching in a hypermedia video, and this may well be the price we pay for free access to video-based 16

Figure 9: A Human Head NMR QuickTime movie hypermedia.

6 Conclusion and Future Work NHQT provides a video-based hypermedia presentation environment which compliments traditional menu and document based media found in Gopher and WWW. To provide exible navigation and viewing controls for video presentation, users have content-based search, VCR-style playback control, and automatic resolution of embedded URL reference to Internet information service. The easy-touse authoring functionalities allow authors to integrate annotations and embed URLs into a video. It contributes to the busy instructors to manipulate the lecture video in short time. The network subsystem in NHQT can deliver a hypermedia video stream from the video server to the client. Each stream only needs a network connection to transmit synchronized media using the simple protocol we design. The NHQT design illustrates some desirable features of the future distance education applications on the Internet. This paradigm shows how the delivery of distance education can extend current Internet services and how education-on-demand can help to shape a barrier-free learning environment. The NHQT project is in the process of further enhancement. First, current NHQT uses video-driven 17

synchronization in which the Internet documents are synchronized by the video. On the other hand, the slide-driven synchronization approach is under investigation. The users can mainly traverse the Web pages (slides) and reference the corresponding video segment when they like. Second, NHQT only allows the instructor to add or change the annotations. In a cooperative environment, it should allow learners to put their private, group or public annotations into the lecture. Authentication is another important issue to address, since this enables charging for access to materials. Further more, incorporating with learners' text, audio, or video feedback can make NHQT more interactive for learners and instructors. The feedback may be the questions for the instructor, or the answers for the quizzes or exams. Lastly, assuming that video-based hypermedia becomes popular, a catalog or directory system should provide the display of available on-line lectures.

ACKNOWLEDGEMENT The authors wish to express their sincere gratitude to Alagu Periyannan, David Singer, and Mengjou Lin (now with Integrated Micro Solutions) in Communications Products and Technologies Group, Apple Computer for providing ATM Middleware and loaning ATM Interface card on Macintosh platform; to Thilaka S. Sumanaweera, and Yi-Fen Yen in Lucas MRS Imaging Center at Stanford University, who create and provide access to the human head NMR movie clip in QuickTime; to Thomas Erickson of Apple Computer for user interface advice; and to Jenwei Hsieh and Srihari Nelakuditi in Distributed Multimedia Research Center at University of Minnesota for their suggestions in this work. Special thanks also extend to the anonymous reviewers for their valuable comments.

References [1] J. Schnepf, V. Mashayekhi, J. Riedl, and D. Du. "Closing the Gap in Distance Learning: Computer-supported, Participate, Media-Rich Education". Educational Technoloy Review, No. 3, Autumn/Winter, 1994, pp. 19-25. [2] B. Alberti, F. Anklesaria, P. Lindner, M. McCahill,D. Torrey. "The Internet Gopher Protocol: a distributed document search and retrieval protocol". University of Minnesota. anonymous ftp from boombox.micro.umn.edu, ftp://boombox.micro.umn.edu/pub/gopher/gopher protocol/ [3] T. Berners-Lee, R. Cailliau, J. Gro, and B. Pollermann "World Wide Web: the information universe". Electronic Networking: Research, Applications and Policy, 1992. [4] M. McCahill, T. Erickson. "Design for a 3D Spatial User Interface for Internet Gopher". Ed-Media '95 conference proceedings, 1995. [5] R. Vetter. "ATM Concepts, Artectures, and Protocols". Communications of the ACM, Feb. 1995. [6] H. Maurer. "Hyper-G is now HyperWave: The Next Generation Web Solution". Addison-Wesley Publishing Company, 1996. [7] L. Hardman, D.C.A. Bulterman, and G.V. Rossum "The Amsterdam Hypermedia Model: Adding Time and Context to the Dexter Model". Communications of the ACM, Vol. 37, No. 2, Feb. 1994, pp. 50-62. 18

[8] M. Cheyney, P. Gloor, D.B. Johnson, F. Makedon, J Matthews, and P.T. Metaxas. "Toward Multimedia Conference Proceedings". Communications of the ACM, Vol. 39, No. 1, Jan. 1996, pp. 50-59. [9] Jorg Gei ler. "Sur ng the Movie Space : Advanced Navigation in Movie-Only Hypermedia". Proceedings of ACM Multimedia '95, Nov. 1995, pp. 391-400. [10] R. Hjelsvold, S. Langrgen, R. Midtstraum, and O. Sandsta. "Integrated Video Archive Tools". Proceedings of ACM Multimedia '95, Nov. 1995, pp. 283-293. [11] G. Venditto. "Instant Video". Internet World, Nov. 1996, pp 85-101. [12] Z. Chen, S. Tan, R.H. Cambell, and Y. Li. "Real Time Video and Audio in the World Wide Web". Proceedings of the Forth International World Wide Web Conference, Dec. 1995. [13] P. England, R. Allen, and R. Underwood. "RAVE: Real-Time Services for the Web". Proceedings of the Fifth International World Wide Web Conference, May 1996, http://www5conf.inria.fr/ ch html/papers/P57/Overview.html. [14] B. Ibrahim, and S.D. Franklin. "Advanced Educational Uses of the World-Wide Web". Computer Networks and ISDN Systems, 1995, pp. 871-877. [15] J.K. Campbell, S. Hurley, S.B. Jones, and N.M. Stephens. "Constructing Educational Courseware Using NCSA Mosaic and The World-Wide Web". Computer Networks and ISDN Systems, 1995, pp. 887-896. [16] E. Bilotta, M. Fiorito, D. Iovane, and P. Pantano. "An Educational Environment Using WWW". Third International World-Wide Web Conference, April, 1995. [17] P. Juell, D. Brekke, R. Vetter, and J. Wasson "The Design of a Remote, Low-Cost, Multimedia Classroom". Proceedings of the 27th Samll College Computing Symposium, Apr. 1994, pp. 29-30. [18] D. Dwyer, K. Barbieri, and H.M. Doerr. "Creating a Virtual Classroom for Interactive Education on the Web". Proceedings of the Third International World-Wide Web Conference, Apr. 1995. [19] J. Schnepf, D. Du, E.R. Ritenour, and A.J. Fahrmann. "Building Future Medical education Environments Over ATM Networks". Communications of the ACM, Vol. 38, No. 2, Feb. 1995, pp. 54-69. [20] C. Cordero, D. Harris, and J. Hsieh. "High Speed Network for Delivery of Education-on-Demand". Proceedings of IS&T/SPIE International Symposium on Electronic Imaging: Multimedia Computing and Networking 1996, January 1996. [21] Apple Computer, Inc. "Inside Macintosh: QuickTime". Addison Wesley, 1993.

19

Networked Hyper QuickTime : Video-based ... - Semantic Scholar

Networked Hyper QuickTime : Video-based ... - Semantic Scholar

Suggest Documents

Hyper-heuristics - Semantic Scholar

Networked Virtual Environments - Semantic Scholar

Networked Virtual Environments - Semantic Scholar

Cell-based Hyper-interleukin 6 or Hyper ... - Semantic Scholar

pdf-1854\quicktime-for-net-and-com-developers-quicktime ... - Google

Hyper-parameter Optimisation of Gaussian ... - Semantic Scholar

hyper-programming web applications - Semantic Scholar

Specifying computations using hyper transition ... - Semantic Scholar

Mice Expressing a "Hyper-Sensitive" - Semantic Scholar

Hyper-Differentiation Strategies: Delivering Value ... - Semantic Scholar

Hyper-inducible expression system for ... - Semantic Scholar

A Simulated Annealing Hyper-heuristic ... - Semantic Scholar

Degeneracy and Networked Buffering: principles ... - Semantic Scholar

Networked Identities â Understanding Different ... - Semantic Scholar

Networked Predictive Control of Uncertain ... - Semantic Scholar

A Software Architecture Supporting Networked ... - Semantic Scholar

Resilience Strategies for Networked Malware ... - Semantic Scholar

Reconfigurable Cooperative Control of Networked ... - Semantic Scholar

Networked research: European policy intervention ... - Semantic Scholar

A Networked, Open Architecture Knowledge ... - Semantic Scholar

Characteristics of Middleware for Networked ... - Semantic Scholar

Networked Parking Spaces: Architecture and ... - Semantic Scholar

Networked Computing for Structural Health ... - Semantic Scholar

Networked Information Retrieval as Distributed ... - Semantic Scholar

Networked Hyper QuickTime : Video-based ... - Semantic Scholar