protecting digital content and tracing illegal copies. The BUSMAN project is designing and implementing an efficient system for understanding, delivery and.
AN ENVIRONMENT FOR EFFICIENT HANDLING OF DIGITAL ASSETS ∗
PAULO VILLEGAS, STEPHAN HERRMANN, EBROUL IZQUIERDO, JONATHAN TEH AND LI-QUN XU IST BUSMAN Project, www.ist-basman.org
We present a system designed for the management of multimedia content through IP networks. It includes advanced cataloguing capabilities based on the MPEG-7 standard, and embedding of metadata and identifiers for content tracking through video watermarking. A prototype is being implemented as part of the BUSMAN IST project.
1.
Introduction
The proliferation of digital media assets, and particularly their easy accessibility over open networks such as the Internet, has raised challenging issues regarding management, delivery, search, retrieval and ownership protection of video content. The rapid expansion of E-commerce services and the increased distribution of digital assets over fixed and wireless networks have generated an urgent need for technologies able to deliver high quality services to content creators, providers and end users. While end-users expect easy access to digital content using query structures natural and close to human concepts, content creators and providers are looking for efficient distribution systems capable of protecting digital content and tracing illegal copies. The BUSMAN project is designing and implementing an efficient system for understanding, delivery and querying of video from large databases, fulfiling the needs of content creators, providers and end user linked by heterogeneous networks while employing and enhancing multimedia standards such as MPEG-7 and MPEG-21. This paper is organized as follows. Section 2 reviews related activities in the field of digital content, notably indexing, retrieval and copyright management. The BUSMAN approach to content management is then outlined in section 3, which is followed by an overview of the system architecture in section 4. The paper concludes in section 5 with the promise of a functional demonstrator in due course. ∗
Paulo Villegas is with Telefónica I+D, Spain, Stephan Herrmann is with the Technische Universität München, Germany, Ebroul Izquierdo is with Queen Mary University of London, UK, Jonathan The is with Motorola Labs UK and Li-Qun Xu is with BTexact Technologies, UK
2.
Review of related activities
Content-based video indexing and retrieval has been the subject of active research in both industry and academia across the world. This reflects the commercial importance of such technology and the fact that the problems involved are still open and challenging. Among others, the COST211 working group has been focusing on content extraction and image segmentation 1. . The Multimedia Studio developed by the IST-project SAMBITS provides functionalities to deal with selectable arbitrary shaped video objects using MPEG-7 metadata 2. . The UK’s Digital-VCE funded project Content Based Recognition and Retrieval of Multimedia Information focused on strategies for key-frame detection and the definition of basic video descriptors 3. . In the USA, the Vision project deployed a digital video library using detection and key-frame selection for indexing. Several other classical image and video retrieval systems in the web contain basic tangent points with BUSMAN: QBIC 4. ,5. , Photobook 6. , Virage 7. , Webseek 8. , etc. In the context of copyright protection a number of EU IST projects have targeted or are working on technical issues similar to those addressed in BUSMAN. The CITED 9. (Copyright In Transmitted Electronic Document, 1990) and IMPRIMATUR 10. (Intellectual multimedia property rights model and terminology for universal reference, 1995) projects specified the models for establishing the relationship between the main players of the commercial media distribution chain. OCTALIS (Offer of Content through Trusted Access Links) has proposed secured solutions for transaction of still images on the Internet or open environment and of videos in a broadcasting chain 11. . MIGRATOR2000 (Migration of Image Generation and Registration Authoring Tools to the Open Resources of JPEG2000), studies the problems ranging from archiving based on metadata added to JPEG2000, the transfer of images on a network, to the usage of IPR metadata for e-commerce, and copyright protection in line with the WIPO (World Intellectual Property Organization) treaty 12. . The BUSMAN project is also dealing with several issues related to selfembedded metadata, persistent Digital Item Identifiers and locators. This issue is also addressed by some international consortia and organisations, such as the Content ID Forum (cIDf), or the DOI Foundation, who has specified the Digital Object Identifier, defined as a system for interoperable identifying and exchanging intellectual property in the digital environment. DOIs as defined are unique and persistent. The MPEG Committee is currently working on its MPEG-21 standard (dubbed Multimedia Framework), Section 3 of which, called Digital Item Identification 13. , has been finalised in July 2002. The DII
specification contained in this standard addresses the issue of content identifiers, also in the context of URIs and URNs. A group within MPEG is developing further specifications in the field of Persistent Associations of information with Digital Items. BUSMAN has contributed to this work. All these technological developments and standards will have a significant impact on the usability and accessibility of digital content through a new generation of multimedia applications. The BUSMAN system is contributing to speed up the development, deployment, use and commercialisation of these technologies through its innovative approaches and solutions. 3.
The BUSMAN approach to content management
BUSMAN introduces a new schema for efficient and effective multimedia delivery and access from content creators down to end-users. It will provide us with enhanced knowledge and means about how to protect intellectual property in the pervasive digital media area, and how to integrate advanced video processing technologies into a system in which video resources can be transcoded and delivered across a wide range of networks, using any required format, and efficiently accessed from diverse terminals. By robustly inserting pointers to metadata and signatures in the content itself, the BUSMAN system guarantees the availability of such information independently of the delivery channel. In this context the major innovation in the envisaged system is its ability to handle metadata for description, retrieval or intellectual rights protection along heterogeneous delivery channels. The adopted approach enables content access through different channels while resisting the most important process performed on the video within the delivery chain: transcoding. Furthermore, the envisaged system will contribute to a better understanding of the effects of packet loss on the embedded information in different environments. It will also influence and be influenced by emerging multimedia standards. Therefore, in a BUSMAN environment any piece of digital multimedia content has always associated with it a certain context that makes possible the tasks of establishing its source, obtaining descriptions and being able to express relationships to other content samples. The result is content that is easier to locate, retrieve, manage and track, all of which produces an enhanced end-user experience.
4.
Overview of the BUSMAN system
The BUSMAN architecture supports a generic end-to-end secure multimedia content service delivery system, from content generation from unstructured video data, analysis, indexing, watermarking, to delivery through heterogeneous network and consumption (browsing and retrieval) by a user. A schematic diagram of the system architecture is shown in Figure 1.
Figure 1: The BUSMAN system
The design of BUSMAN has centered on the needs of two classes of users: Content provider and consumer. At the provider side, advanced image and video processing techniques will be used to (semi-)automatically annotate video content. The resulting metadata are indexed and watermarked within the video essence. The watermarks can resist transcoding, compression, and fraudulent attacks as well as enforce copyright protection. At the consumer side, efficient browsing and retrieval strategies are being implemented to provide the user with low-latency access to large video databases across networks and the choice of quality content services according to the user’s subscription level. As depicted in Figure 1 the whole BUSMAN system stands to comprise three main functional modules: Input unit, Information Server and User Terminal. Whilst in the end an integrated prototype system will be built aiming to validate all the functionalities furnished by the BUSMAN concepts, this objective will be achieved in stages by designing, building and testing several subsystems, followed by a final integration. Our planned work activities are as
follows: The video analysis and indexing subsystem includes such modules as “analysis and feature extraction”, “description creation”, “content ID generation” and “user annotation”, whose input can be MPEG compressed or uncompressed video stream. The two-level watermarking subsystem consists of modules “content ID and IPR embedding”, “metadata embedding”, whose three data inputs are, respectively, uncompressed video stream, content ID, and desired metadata stream. The metadata server subsystem is a database used to store, maintain and retrieve annotated metadata and handshake with the video content databases. Finally the search engine will provide the interface to effectively parse queries and browse, and retrieve desired information in a fixed network or emulated mobile environment. Additional system software architecture, system integration and testing tasks will provide the necessary glue between all the subsystems. The final demonstration will be performed using video delivery in any MPEG format through GPRS and UMTS packet data communications channels, fast fixed networks and the Internet. The rest of this section will elaborate on the design and consideration of server and client side system components. 4.1. Server side As seen in Figure 1, there are two main logical blocks within a BUSMAN Server: the Input Unit and the Information Server: • The Input Unit will be in charge of all data acquisition, it will accept the video content and perform all needed data processing: analysis and creation of descriptions, assignment of the content ID and insertion of the watermark information into the video data (content ID and supplementary metadata). An important block in this unit will be an annotation user interface, which will allow insertion and correction of MPEG-7 metadata by an operator. • The information server will store all system data: the video content itself, video summaries and MPEG-7 descriptions. A query engine will resolve all metadata requests coming from the end user terminals; as a result of the queries, video and summary information will be streamed to decoders on demand. An additional fingerprinting module will insert watermarks into streamed content on the fly, introducing origin identification into the digital content for IPR auditing purposes, as stated in the use scenarios. A specialised database capable of efficiently storing MPEG-7 descriptions will be used for the Description Server, together with a suitable query interface for the query engine. Existing solutions will be employed to implement the Video
Server, to which an interface from the input unit (to acquire content) and the description server (to relate content to metadata) will also be added. 4.2. Client side For the consumer, BUSMAN will target fixed terminals and mobile terminals. Fixed terminals will use video download at bit-rates of approximately 512kbps1Mbps, while mobile terminals will use video streaming at 32kbps-384kbps. Radio channel conditions on the wireless network such as fading, shadowing and multipath distortion ultimately lead to packet loss. As streaming video is a best effort transmission, lost packets will lead to degradation in video quality and watermark detection. A real-time network emulator will be used to emulate the channel characteristics of a GPRS or UMTS network. The client software consists of a HTML web browser and a smart BUSMAN-enhanced video player. Video is delivered from the server in MPEG4 format targeting the Simple or Advanced Simple Visual Profiles. The search scenario is modelled after searching on web search engines. The web server delivers a search form in HTML to the client. The user enters his search terms into the client and submits it back to the server. This is done using a HTTP POST command. The server performs the search using the search terms on the database of metadata and returns a list of videos, which matches the search terms as closely as possible. The video contains additional information embedded as a watermark. This information includes a DII (Digital Item Identifier), MPEG-7 compliant metadata and digital rights usage information. The enhanced video player extracts the watermark and interprets the embedded data appropriately. On low bit-rate channels, only the DII is embedded. The DII is presented to the user as a selectable link that will query the server for additional metadata on the video. The user interface will be developed based on user requirements and using human factors guidelines. This is an issue especially for the mobile client in view of the limited screen size and choice of input devices. 4.3. Metadata Model The varied functionality to be furnished by the BUSMAN system has been identified based on extensive human factors studies 14. that highlight the user requirements in various practical use scenarios. An analysis of those use scenarios was performed, resulting in a set of processing steps that can be clustered by their inputs and outputs. The resulting functionality of the BUSMAN system supports search & retrieval on various high level queries (e.g., genre, title and keywords), refinement of the query, browsing of media
files lists, access to scenes of a movie, report or program, and content based search. It is a matter of the metadata model to enable all required functionality.
Top level structural unit
Usage Information Creation Information
Structural unit Scene 1
usage-, creation-, semantic information, key frame
Structural unit Shot 1
Structural unit Shot 2
usage-, creation-, semantic information, key frame, visual Ds
usage-, creation-, semantic information, key frame, visual Ds
Semantic Information Structural unit Scene 2
Key Frame
Figure 2: Scheme of the BUSMAN metadata model The Busman metadata model is shown in figure 2. It is designed in a way that it is fully compliant to the MPEG-7 standard. The central element of the metadata model is the structural description (SegmentDS) allowing navigation across the content. Each structural unit (VideoSegment for scenes and shots) is annotated with keywords (TextualAnnotation) and enriched with creation and usage information. Furthermore, the VideoSegments can have multiple key frames from which low level visual descriptors are extracted. 5.
Conclusions
We have described in this paper the BUSMAN concept and approach to video content management and its various subsystems. This a 30-month project started in May 2001, to be completed in October 2004. The final demonstrator as well as individual subsystems will undergo extensive trials, and the feedback from
the assessment and evaluation tasks will be incorporated into the system. Since user requirements and user-centred design are at the core of the project, we expect that the final deliverable will provide its intended users with a natural and exciting experience in consuming multimedia digital contents. Acknowledgements This paper is based on the research and development activities conducted within the IST project BUSMAN which is partially funded by the EU under grant Nr. IST-BUSMAN-2001-35152. References 1. COST 211 quat, Redundancy Reduction Techniques and Content Analysis for Multimedia Services, http://www.iva.cs.tut.fi/COST211/ 2. SAMBITS - System for Advanced Multimedia Broadcast and IT Services http://www.ipsi.fhg.de/delite/Projects/SAMBITS/
3. P. Hill, “Review of Current Content Based Recognition and Retrieval Systems”, Technical report Theme T8, Nr. 99-05/1, Virtual Centre of Excellence in Digital Broadcasting and Multimedia Technology 4. M. Flickner et al. “Query by image and video content: The QBIC system”, IEEE Computer 28, pp 23-32, September 1995. 5. A. Guttman. ‘R-Trees: A Dynamic Index Structure for Spatial Searching”, Proc. of the 1984 ACM SIGMOD Conf on Management of Data, pp 47-57, June 1984. 6. A. Pentland, R.W. Picard, and S. Sclaroff. “Photobook: Content-based manipulation of image databases.” Intern. J. Comput. Vision, 18(3), pp 233254, 1996. 7. A. Hampapur, et.al. “Virage video engine”, Proc. of SPIE, vol. 3022, pp 188-200, 1997. 8. J.R. Smith and S.-F. Chang. “An image and video search engine for the world-wide web”, Proc. of SPIE, vol. 3022, pp 85-95, 1997. 9. CITED, http://www.newcastle.research.ec.org/esp-syn/text/5469.html 10. IMPRIMATUR, http://www.newcastle.research.ec.org/esp-syn/text/20676.html 11. OCTALIS, http://www.ina.fr/recherche/projets/finis/octalis.en.html 12. MIGRATOR2000, http://corporate.alinari.com/eventi/eng/migraEN.htm 13. ISO/MPEG, Information Technology - Multimedia Framework – Part 3: Digital Item Identification, ISO/IEC FDIS 21000-3, July 2002 14. Alyson Evans, Goetz Schmidt-Bossert, Pedro Concejero-Cerezo, Carlos González, Roland Buss, Stephan Herrmann, Thomas Meiers, Initial User Requirements for BUSMAN, BUSMAN WP 2, D4.1, December 2002