A Video Learning Contents Processing Framework ... - Semantic Scholar

A Video Learning Contents Processing Framework for Portable E-learning Apparatus Razib Iqbal and Shervin Shirmohammadi Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER Lab) School of Information Technology and Engineering (SITE) University of Ottawa, 800 King Edward Ave., Ottawa, ON, Canada K1N 6N5 [riqbal | shervin] @discover.uottawa.ca

Abstract Website interface design and content representation decisions made by the organizations, such as type and level of personalization, have a direct impact on the communication capability of e-learning web sites. Either clients can select individual learning content or the sites can provide the information dynamically based on the device requirements. In this context, pervasive e-learning technology refers to the scaling of personalized representation of information based on device capability, which plays a significant role in choosing or purchasing learning objects from an elearning website. While researchers are aware of both the benefits and the limitations of personalization and customization, it is also essential to focus on the technical aspects of these systems. Prominent researches focus on how to design a good e-learning website whereas the literature lacks a suitable architecture where multimedia based learning contents will be processed to support heterogeneous devices accessing the website. This paper makes an initial step for developing such architecture for video learning contents processing. A new tier for multimedia content processing has been proposed to the traditional product presentation, transaction and delivery framework.

personalization and customization, it is also essential to focus on the technical aspects of the systems, in terms of computational complexity and deployment. Personalized representation of information, e.g. text, image, audio, video etc., based on device capability plays a significant role to comfort an end user in choosing a learning object from a renowned e-learning website. Simultaneously, it is important to ensure that the graphics and multimedia do not spoil their online surfing and learning experience. For example, the long download time associated with video has often been regarded as undesirable for massively interacted commercial websites.

1. Introduction In the world of distant education and e-learning, personalization and customization of the contents is becoming the presentation model of choice. Content providers can use personalization and customization with the goal of creating a strong working relationship with their clients/end-users. While researchers are aware of both the benefits and the limitations of

Figure 1. Universal Multimedia Access Universal Multimedia Access (UMA) refers to the universal access of multimedia content. Fig. 1 reveals the concept of UMA. To access multimedia contents seamlessly, UMA demands a generic adaptation technique to adapt the original content according to the

device requirements. Another phenomenon is the growth of the communication networks. The heterogeneous network provides a very convenient tool for content transmission, but its topmost limitation is the variation of available bandwidths. In the last decade, multimedia communications have experienced rapid growth, and commercial success stimulated research interest in digital techniques for adapting and transmitting multimedia information. Seamless adaptation and transcoding techniques to adapt the digital content have achieved significant focus to serve consumers with the desired content in a feasible way. Simultaneously, security and Digital Rights Management (DRM) of multimedia data has become an emerging concern. Therefore, encryption and authentication operations should be taken care of not only for serving sensitive digital contents, but also for offering security and integrity as an embedded feature of the adaptation practice. Device based personalized representation of the video learning content or product preview will help to increase the quality of experience of individual users. In our previous work [1], we used MPEG-21 framework to perform temporal adaptation and encryption of the H.264 bitstream in the compressed domain. In this paper, we utilize our compressed domain video processing approach to propose a new tier for video learning content processing to the traditional product presentation, transaction and content delivery framework [2]. Overall effort was given to maintain a generic and widely accepted standard for the adaptation, and optional encryption and/or authentication of the video learning contents, so that prescribed solution can be implanted in the existing media delivery configuration. The rest of the paper is organized as follows: in Section 2, a brief background on the video processing operations is provided along with the use-case scenarios of the framework. Section 3 consists of the literature review. The design of the proposed framework is depicted in Section 4. Section 5 describes the video processing scheme where the technical details have been kept short to increase readability of the overall framework. Application of the proposed framework is discussed in Section 6. Finally, we draw the conclusion in Section 7.

2. Background Multimedia capability of an e-learning website refers to the non-verbal cues or features about the products and services that enhance the user’s feeling of preference for a website. Graphics, audio/video clips,

and animation used to demonstrate learning objects are examples of these features. These features can fulfill individual information needs, engender trust and facilitate better learning experience. Recent momentum in online video streaming has demonstrated the effectiveness and marketing potential of downloadable video contents. Technological improvement in video storage allows immense amount of video data to be stored and delivered to end users. Google video (video.google.com) and YouTube (www.youtube.com) are two examples of such online video providers. However, small handheld receiving devices (e.g. PDA, Mobile phone) require video adaptation due to limited screen size, processing power and bandwidth requirements. Before we detail the necessary background and the framework, three use-case scenarios are given below to envision the application of this proposed framework (see Fig. 2). Heterogeneous Network

Proxy Cache Content Provider

Mobile User

Figure 2. Use Case Scenario Use case 1: Assume that a mobile user requests a video learning object that is available from a learning object repository. In addition, the mobile user has GPRS connection with a maximum bandwidth of 28.8 kbps only and accepts videos with a framerate 5frames/second for QCIF (176×144) resolution and 10frames/second for SQCIF (128×96) resolution. Now if the original video content is of 30frames/second then it is necessary to adapt the content in response to the mobile user’s requirement. Use case 2: On its way to the content repository, the request for the content is forwarded to another user who has the same content and the functionality of a proxy cache. However, the bandwidth constraint on the receiver’s side requires the adaptation operations to be performed in the intermediary path. Use case 3: The learning object provider delivers services to earn revenues. Thus to restrict access to the contents, a provider desires to acquire techniques to secure the contents so that only the legitimate users (e.g. those who have paid for the content) can access the repository. Encryption and authentication of the contents can provide adequate measures to manage DRM.

2.1. Video Adaptation Video adaptation is the process of transforming an input video to an output video, manipulating multiple bitstream in order to meet diverse resource constraints and user preferences, while optimizing the overall utility of the video. Fig. 3 shows the role of the video adaptation in pervasive media environments. Usually adaptations tools take into account the content characteristics, device requirements, usage environments, and user preferences to adapt a video content.

framework. Digital Item Adaptation (DIA), MPEG-21 Part 7 [3] specifies the syntax and semantics of tools to assist adaptation of Digital Items (DI). A DI consists of a bitstream together with its relevant metadata. A generic Bitstream Syntax Schema (gBS Schema) is specified in the MPEG-21 framework, to perform adaptation in an intermediary node in a format independent way. A description conforming to this schema is called a gBSD. The gBSD provides an abstract view on the structure of the bitstream that can be used in particular when the availability of a specific bitstream schema is not ensured. Using the gBSD, we can process the video bitstream in compressed domain without any cascading decoding and re-encoding operations.

3. Literature Review

Figure 3. Adaptation to support heterogeneous terminals Adaptation engines are typically deployed in the intermediate locations, such as proxy servers, although they may be included in the servers or clients in some applications. Video adaptation can be denoted as homogenous video transcoding, which aims to reduce bit rate, frame rate and/or the resolution of the preencoded video stream. It does not involve any kind of syntax modifications to the coded video data. Therefore, the incoming compressed video stream preserves its format and compression characteristics after adaptation operations.

2.2. Encryption and Authentication Encryption is used in many video transmission applications to ensure that only authorized receivers can access the media. Reasons for encryption include security, privacy, age restrictions, and others. Authentication is used to ensure the integrity of the content using a fragile or semi-fragile watermark. Fragile watermarking systems or hard authentication rejects any modification made to a digital content. For both encryption and authentication, processing time is an issue especially when the contents need to be delivered and played at real time.

2.3. MPEG-21 DIA To perform the video adaptation, encryption and authentication in real time we utilize the MPEG-21

In [4], Yang et al addressed a solution to enhance pervasive web accessibility and proposed to apply this technique to mobile commerce. Authors proposed an adaptation mechanism that applies one of the predefined adaptation rules to guide the transcoding process in order to produce a customized content. Nevertheless, the authors did not illustrate the mechanism to adapt the content. Moreover, it will work for confined environment only, while dynamic real time adaptation is the necessity in today’s mobile commerce. In [5], authors explained an end-to-end approach to content adaptation which takes advantage of MPEG-21 to facilitate the UMA concept in a media streaming environment. In [6], authors discussed how we can use the tools specified within MPEG-21 for interoperable multimedia communication in terms of device independence and coding format independence. Authors in [7], suggest that modality conversion can provide a better solution where the scaling of video content is not sufficient to meet terminal or network constraints. But it will highly depend on the scalability issue and processing capability of the devices. Considering the emerging opportunity of MPEG-21 framework towards UMA, this paper outlines a structure for video learning content processing for portable learning devices.

4. Proposed Framework If we consider the use-case scenario 2, payment services for contents accessed through mobile devices differs from that of regular desktop/laptop internet connections. Most service providers initiate contracts with the mobile operators to distribute their products as a value added service. As a consequence, the revenue

earned is shared with the mobile operators. In [2], Chong et al proposed a mobile web services concept framework to allow flexible payment mechanisms for mobile consumers and service providers. The objective was to keep service providers away from solely relying on mobile operators to implement their service or sell their content to mobile consumers. The authors proposed a 3-tier model - Payment Service Gateway (PSG), Client Application and Payment Service Provider (PSP). In the framework Content/Service provider is categorized as an external component. We propose to extend this framework by adding a Video Processing Server (VPS) which is responsible for processing the video learning contents in real time. The new framework is illustrated below in Fig. 4.

VPS is responsible to adapt the learning objects especially the video bitstream based on the client specification or device requirements. The usability of the adaptation server is not only limited to adapting video bitstream but also can be extended by adapting promotional video bitstream of the contents such as previews, or advertisements requested by a client through the content provider or mobile operator. The VPS can be optimized to encrypt and authenticate sensitive video data before delivering. Section 5 details the mechanism of the VPS.

5. VPS: Adaptation, Authentication

Encryption

and

In our approach, for any video processing operation, gBSD in the form of XML is required. Fig. 5, shows a sample gBSD. We generate the gBSD while encoding the raw video bitstream. We have used the H.264 video encoder which is the latest video coding and compression standard by ITU-T and ISO/IEC [8]. Both the gBSD and the compressed video work as the Digital Item together.

Figure 4. Proposed Framework PSG connects mobile users to different types of content services without the constraint of mobile operator’s limitation. It acts as mobile web services payment facilitator that provides the mechanisms for managing order request, payment authentication processes and consumer profiles/sessions. Client Application resides on the mobile device and communicates to the servers using web services invocation framework1. Client browser component enables XML compliant format message to be delivered to the client. In the system, client component is designed to handle two types of XML messages: Generic web service and RSS (Really Simple Syndication). Client intelligent component provides the core business logic e.g. presentation of the GUI to the consumer, handling consumer’s request, corresponding server response. PSP is responsible for payment charging and transaction controls. It controls the payment charging between the mobile consumer and the content/service provider.

1

A Java API for invoking web services

Figure 5. Sample gBSD

5.1. Content Adaptation Table I. Temporal Adaptation Performance Resolution

Speed

Frame Rate : 30 fps

CIF (352×288)

2 fps

Intra Period: 9

QCIF (176×144)

10 fps

Total Frames: 600

SQCIF (128×96)

24 fps

*fps = frames per second

Inside the adaptation module, at first the gBSD is transformed via XSLT [9], and then, based on the transformed gBSD, the original bitstream is modified. For the gBSD transformation, XSLT is used where an XSL style sheet defines the template rules and describes how to display a resulting document. Adapted H.264 bitstream is finally generated by

discarding gBSD portions corresponding to specific frames. Table I shows the temporal adaptation performance applying above technique.

5.2. Content Encryption With the help of MPEG-21’s gBSD, either macroblocks containing the motion vectors or slice data partitions of selective frames can be encrypted. Macroblocks containing the motion vectors or the slices in each frame are used as logical units. The encryption unit considers the level of encryption based on the user’s preference. If the encryption preference is high then all the frames in the adapted video are encrypted; else, only the I-frames are encrypted. For encryption, first, the frame marker is scanned from the gBSD. Thus, each logical unit’s starting position and corresponding length is retrieved from the transformed gBSD for those frames which need to be encrypted. To encrypt the logical units, an encryption key is chosen randomly and an XOR operation is performed. After the XOR operation, the bitstream is processed as usual for H.264 format compliance. It is worth mentioning that each logical unit can be encrypted independently. Fig. 6 shows sample macroblock encryption result.

A. Original

B. Encrypted

Figure 6. Sample Macroblock Encryption

5.3. Content Authentication

Figure 7. Performance of the Authentication Module

From the gBSD, marking space can be selected from the available alternatives, like frame, slice, macroblock, and block. To embed watermark for authentication, the size of the marking space is calculated first. A hash value is then computed and embedded in the marking space. For example, if the slice data partitions are selected as the marking space then the authentication bits can be inserted in the slice header byte align bits. The hash value can be computed for the entire frame or for the specific slice partitions. From Fig. 7, we can see that time required to embed authentication bits is little higher than the adaptation time. The difference between these two depends on the hash function used. Robust hash functions will require higher execution time to compute and embed the authentication bits.

6. Discussion Non-text elements enhance communication by helping the website visitors to find or interpret the information presented. On the other hand, more multimedia enabled content takes more time to download. Therefore, designers must find a balance between content presentation and user capability to handle the presentation style. It is probably not a good idea to go overbroad with elaborate graphics and massive video contents, which add no more information. Since people may use the same site frequently so based on the data stored in the databases and recent history, websites may automatically attempt to improve their organization and presentation of the content based on the user profile. The design proposed here considers a media resource server where the H.264 video is generated along with its gBSD. In practical aspect, the video processing steps described above can be performed anywhere along the delivery path. The benefit of the proposed approach is that it does not require an application or codec-specific schema because all the necessary information to regenerate the bitstream is included in the gBSD and any intermediary node can adapt the video content based on the gBSD. For authentication and decryption, the original H.264 video is not required; rather a separate authenticator/decryptor can verify the validity of the received video data. The authenticator/decryptor is independent of the decoder, so there will be no lag added while decoding the received video. An intermediary user can adapt the content to serve another user upon content provider’s request. In this case, the serving node does not need to be aware of the bitstream format and need not to implement detail MPEG-21 framework. Only the adaptation operations

can be built in the application in this regard. For a live broadcasting of a video learning object, a P2P overlay can be formed. In the overlay, stationary users with enough computation power and upload bandwidth can adapt the video stream for the small handheld devices to maintain heterogeneity in the overlay.

7. Conclusion Online video streaming as a tool for distance learning is a tiny segment of e-learning methods. If we can address the rapid growth of device variations, video learning content distribution will be increased in a fast pace. Our work is important in order to put forward a common platform for the content providers including creators and aggregators. During content delivery, adaptation of the bitstream (reduce frame rate, change resolution, reduce color, etc.) is necessary to support heterogeneous clients and to provide context according to the user environment. Moreover, content provider can earn extra revenue by providing the adaptation service independent of the mobile service provider. The technique presented here is scalable, and requires low computational effort and low processing time. Moreover, the end nodes are free of high computation overhead to implement potentially manifold transcoding operations. Rather, it will be able to directly run the adapted content as soon as it is received from the server or from an intermediary node. Beyond the developments presented above, for video processing, in pervasive networking environment, a reasonable solution is required that will disallow possibly un-trusted intermediary adaptation engines in the delivery path to adapt content. Finally, a hardware level implementation capable of generating the

compressed bitstream and gBSD is required to faster the overall process.

8. References [1] R. Iqbal, S. Shirmohammadi, and A. El Saddik, “A Framework for MPEG-21 DIA Based Adaptation and Perceptual Encryption of H.264 Video”, Proc. of SPIE/ACM MMCN, Vol. 6504, pp. 650403-1 – 650403-12, 2007. [2] C. Chong, H. Chua and C. Lee, “Towards Flexible Mobile Payment via Mediator-based Service Model”, Proc. of ICEC, Aug. 2006. [3] ISO/IEC 21000-7:2004, Information Technology – Multimedia Framework – Part 7: Digital Item Adaptation. [4] S.J.H. Yang, N.W.Y. Shao, A.Y.S. Sue and Chung JenYao, “Pervasive content access for mobile commerce,” Proc. of Intl. Conference on E-Commerce Technology, pp. 523 – 526, Jul. 2005. [5] L. Rong and I. Burnett, “Dynamic multimedia adaptation and updating of media streams with MPEG21,” Proc. of CCNC, pp. 436-441, 2004. [6] C. Timmerer and H. Hellwagner, “Interoperable adaptive multimedia communication,” IEEE Multimedia, Vol.12, Issue 1, pp. 74–79, 2005. [7] A. Vetro and C. Timmerer, “Digital item adaptation: overview of standardization and research activities,” IEEE Trans. on Multimedia, Vol. 7, Issue 3, pp. 418426, June 2005. [8] http://ftp3.itu.ch/av-arch/jvt-site/reference_software/

[9] http://www.w3.org/TR/xslt

A Video Learning Contents Processing Framework ... - Semantic Scholar

A Video Learning Contents Processing Framework ... - Semantic Scholar

Suggest Documents

PARALLEL VIDEO PROCESSING ... - Semantic Scholar

A Framework for Soccer Video Processing and

Open video: A framework for a test collection - Semantic Scholar

A dynamic hardware video processing platform - Semantic Scholar

A novel framework for collaborative video ... - Semantic Scholar

Chapter 9.2 A Unified Framework for Video ... - Semantic Scholar

A Framework of Indexation and Document Video ... - Semantic Scholar

A Lagrangian Framework for Video Analytics - Semantic Scholar

A new framework for MPEG video delivery over ... - Semantic Scholar

a multi-camera framework for interactive video ... - Semantic Scholar

Video Tester â A multiple-metric framework for ... - Semantic Scholar

A Collaborative Framework for In-network Video ... - Semantic Scholar

LEARNING PERSONALIZED VIDEO HIGHLIGHTS ... - Semantic Scholar

A Framework for Text Processing and Supporting ... - Semantic Scholar

A Framework for Text Processing and Supporting ... - Semantic Scholar

DyKnow: A Framework for Processing Dynamic ... - Semantic Scholar

A general framework for managing and processing ... - Semantic Scholar

A video processing and data retrieval framework ... - ACM Digital Library

A Real Time Processing Framework for Medical Video Sequences

A Real Time Processing Framework for Medical Video Sequences

Learning Multidimensional Signal Processing - Semantic Scholar

Processing abstract sequence structure: learning ... - Semantic Scholar

A Theoretical Framework for Learning from a Pool ... - Semantic Scholar

Video Contents Acquisition and Editing for ... - Semantic Scholar

A Video Learning Contents Processing Framework ... - Semantic Scholar