Multimedia Management and Query Processing

1 downloads 0 Views 211KB Size Report
Multimedia Management and Query Processing Issues in Distributed ... time-dependent media. ... The foundation of the system architecture, the Mul-.
Multimedia Management and Query Processing Issues in Distributed Digital Libraries: A HERMES Perspective Ulrich Thiel, Silvia Hollfelder, and Andre Everts GMD { German National Research Center for Information Technology IPSI (Integrated Publication and Information Systems Institute) Dolivostr. 15, D-64293 Darmstadt, FRG email: [email protected]

Abstract Multimedia applications accessing large amounts of data require ecient browsing and searching mechanisms to extract relevant information. The system has to preselect items of interest from the database. Additionally, it has to support continuous presentation of time-dependent media. In this paper, browsing system components for multimedia data management are proposed that are developed within the HERMES project. They o er conceptual, content-based access to videos. Our retrieval engine calculates relevance values for the results of a conceptual query by feature aggregation on video shot granularity. An intelligent client bu er strategy uses the relevance values from retrieval as presentation support. It prefers to keep those shots in the cache which have a high relevance value. Keywords: browsing architecture, semantic browsing, video retrieval, content-based search

1 Introduction In distributed information environments - called "Digital Libraries" (DLs)- enormous amounts of multimedia data are provided. Therefore, content based access is a cornerstone in the development of innovative applications. Most of these applications are highly interactive, hence users of DLs will have to browse through large amount of data in order to nd informative, i.e., relevant parts of documents. Browsing means {in this context{ that users want to scan quickly through data to inspect and compare information they  This

work has been funded by the ESPRIT joint project (Long Term Research No. 9141) HERMES (Foundations of High Performance Multimedia Information Management Systems).

speci ed only roughly by a previous query. A DL system has to provide appropiate information management and query processing facilities supporting for this kind of information-intensive work. The access to data should be based on task-related, conceptual, and content-based criteria, especially for time-dependent data like video, whose presentation time is very long. The main goal of users in video browsing applications is to nd relevant information quickly, instead of viewing videos like in Video on Demand applications since they will not have time to go through and view the whole data. The system has to eciently support the browsing process by means of an automatic selection and fast presentation of the relevant data. Additionally, it has to o er interaction capabilities to the user. To support a large number of users, the system has to optimize its throughput, i.e., the data that are sent to a user. Most contemporary approaches to highperformance multimedia (MM) systems do not consider that from the users' perspective some data are more and some are less important. If this were re ected by appropiate management strategies, the system could provide relevant data to more users faster. In this paper, we describe some multimedia management methods for video browsing adressing these issues. They were developed in the context of the HERMES (High Performance Multimedia Information Management Systems) project, an ESPRIT Long Term Research Action (No. 9141) aiming at a framework of methods for future high-performance MM-applications [7]1 . The paper is organized as following: After a short outline of the scope of the project, we discuss our ap1 The project involves the following partners: MUSIC Technical University of Chania, Greece, CNR-IEI, Pisa, Italy, Database and Information Systems Group at the Saarland University, Saarbrucken, Germany, the DataBase Research Group at ETH Zurich, Switzerland, and GMD-IPSI

proach to content-based access by feature extraction, feature aggregation, and classi cation for videos on shot granularity. The next section shows how contentbased video browsing is supported by a client bu ering strategy that considers the information about relevant hits of video shots to a previous query as well as presentation requirements in general.

2 The HERMES Project The HERMES project aims at providing enhanced methods for functions at all levels of MM information systems. We assume that the components are organized in an architecture as shown in Figure 1 ([13]). The HERMES reference architecture is composed in two areas. The boxes above the dashed line represent functions o ered to the user like querying by content, browsing through data, presentation of multimedia objects, and insert and update operations. The boxes below the dashed line represent the key functionalities targeted by the project. The foundation of the system architecture, the Multimedia Storage Modules, features a three-level hierarchy, representing the core memory (at rst level), the secondary storage (e.g., magnetic disks), and the tertiary storage (e.g., optical jukeboxes, tape drive services). The boxes above the Multimedia Storage Modules represent key functions of the HERMES system architecture like data placement (cf [25], [8], [9]), access methods (cf [3], [10]), caching [20], delay-sensitive data scheduling [24], Quality of Service (QofS) and metadata management [2]. Although these functions may not be clearly separated from other functions inside the Multimedia Storage they are represented as separate boxes to stress their importance within the project. The work of GMD-IPSI concentrates on the components which are crucial for browsing applications in a distributed DL, visualized as grey boxes in Figure 1. Our browsing system is based on a client-server model where the server is responsible for the storage of continuous and discrete data. The client is responsible for requesting data units from the server, for presenting them and for handling interactions with the user. The choice of a client-pull architecture gives adequate support for browsing applications since user interactions are re ected by modi ed client requests. We o er generic access and delivery methods for multimedia objects by means of the concept of Continuous Long Fields, a generic data type for timedependent media ([17]). Furthermore, we developed client-side caching mechanism [21] and adaptive stream control [16] and integrated these concepts within a

commercial object-relational DBMS [18]. Admission control for interactive applications in client-pull is provided to restrict the limited resouces on the server (cf [15], [14]). Since browsing applications require conceptual access to multimedia objects, we developed a retrieval engine supporting content-based access to videos. The main impacts of our browsing system architecture are (1) conceptual access to videos by means of specifying rules for previously extracted features using abductive logic, (2) enhanced system throughput by avoiding the delivery of data that is less relevant to the user. Therefore, we use the knowledge of the retrieval result, provided by the abductive retrieval engine, for the client-side bu ering. In this paper we will focus on the retrieval engine and the client caching as the crucial parts of the browsing system architecture. For more information and related work of the HERMES consortium we refer to [1].

3 The Video Retrieval Engine Many approaches in video retrieval propose an image retrieval process based on video stills (cf [28], [4], [5]). The authors report that good results in content analysis were found using a low level image analysis of video shots. Note that this result refers to retrieval tasks only, mostly based on similarity queries. For browsing purposes, additional functionality is needed, as we will describe in this section. Most realistic browsing scenarios involve semantic information needs, hence manual indexing prevails in most contemporary applications ([19]). In the context of this paper indexing means speci cation of content descriptors and not access structures for data. In our approach to automatic conceptual access for browsing, we attempt to employ indexing rules that capture the semantic content to a certain extent ([22]). Our prototype {called HERMES/AVIA2 { indexes videos by analyzing selected frames by feature detection algorithms. The result of these algorithms - called the feature extraction values - are interpreted by rules mapping the values to classi cation terms. Rules and feature extraction values are stored as metadata in the database. The rules re ect the statistical distributions of feature values in a video collection and relate these values to conceptual indexing terms. They can be derived as follows: The feature extraction values are aggregated to dynamically built constraints, e.g., ranges, or linear combinations of the feature values, by analysing the 2

Analysis of Video Information Approach

Figure 1. The HERMES Architecture

feature extraction values of manually classi ed images. The starting point is a set of images (video stills) classi ed by di erent indexers which is used as a training set (much smaller than the whole collection). The set of index terms consists of a number of domain-independent categories, e.g., content of an image (animate, artefact), contour (sharp, blurry), source of light (natural, arti cial), or type of image (photo, cartoon, close-up). The resulting rules describe the 'ofness'3 of pictures on a general level. Now, we make use of the observation that when some feature value xi is within a certain range, the corresponding picture will have been classi ed by some term Cj in many cases. Hence, it seems to be useful to encode the corresponding value range and index terms as part of the indexing rules. A typical rule, e.g., image(contrast : 0::200) ! countour(I; blur), states that values of feature contrast ranging between 0 and 200 tend to be associated with countours perceived as blurry. By this kind of rules a foundation is laid for a conceptual way of accessing MM items necessary for browsing. We can map conceptual query terms on certain constraints on feature representations that are 3 ofness is the pre-iconographic level of an image. For example, an image of a road accident will be described to be of arti cial objects, e.g., a red car and a yellow phone-booth and to be about a trac scene at night. While ofness can be determined automatically, aboutness is a much harder problem.

stored in the DBMS. Executing the rules during the retrieval process automatically yields proof trees which can be used as appropriate access structures for conceptual retrieval. Hence, the main task is to de ne the logical retrieval engine that can exploit such rules. We use abductive logic as inference mechanism ([23]). The basic idea of abduction is to explain previously unproven (i.e., unpredictable) statements by minimally expanding a given theory T . An abductive proof needs to make a set of assumptions to match an observed reaction. The assumptions are referred to as hypotheses, because they might explain the observation. The theory T is de ned by the set of all rules. Discriminative rules are ranked due to their statistical degree of plausibility. A user request Query is a description of a concept the user is looking for. Let Query be an existential quanti ed sentence combining elements of T . We specify abductive information retrieval as: T [ Hypotheses ` Query

Read: Expand the theory by adding some hypotheses such that it satis es the query. The hypotheses are, in our case, sentences claiming the existence of database entries satisfying feature constraints. Proving the hypotheses by accessing the meta database with queries expressing these constraints nally yields answers to the original conceptual query.

In the next step, the rule system must be de ned by providing a method for generating ranges of feature extraction values. [22] propose to de ne for atomic features the -quantile, i.e., the range containing % of the observed values - given that a certain concept ci was assigned as index term. When we take into account vectors x of feature values, the conceptual terms ci are associated with aggregation functions (ci ; x), e.g.,  might identify a vector of weights w which is multiplied with x. In this case, a plausible rule is de ned as min ((ci ; y))  (ci ; x)  8max ((ci ; y)) y 2L =) class(x) = ci ;

8y2L

where L is the set of the feature extraction values of images manually classi ed as ci (the learning set), and x are the feature extraction values of an image to be classi ed. The relevance value rv estimating the con dence that an image I represent by the feature vectore belongs to a classi cation item ci is de ned as the degree of plausibility of the rules. A video can be divided not only into its individual frames, but also into shots. [11], [12], and [27] propose to use these shot as an own information unit in videos. In the indexing process, HERMES/AVIA starts a shotdetection method ([26]). The system now chooses the frame in the middle of each shot as the representants of the video, calculates the feature-extraction values for these frames and stores this metadata into the database. If a user requests some video clips, the prototype calculates the required value range by analysing the request, determining the feature constraints from the meta database. At this point, HERMES/AVIA provides a set of relevance values for each video in the database. The user can decide to view the result list sorted by shot or video relevance. As [6] point out, the users prefer to have a selection of keyframes rather than the video title as the result presentation. As a good compromise between bandwith restrictions and user needs, they suggest to show one video still per shot. If the user wants to rank the result by shot relevance, the videos are sorted by the maximum shot relevance of each video. The ranked result list contains entries providing as a link to a whole video in combination with a thumbnail-link for each shot in the video. The user can now reformulate her request, if necessary, or view a shot or a whole video.

4 Retrieval-value triggered Client Caching The task of our caching mechanism is to make media data available for presentation purposes at the client side. Since the user's goal is to nd relevant information quickly, a user switches permanently between shots from di erent videos, when a typical result list to a conceptual query consists of shots from many different videos. The client caching mechanism has to consider these frequent jumps to reduce the start-up latency for video presentation. Because of the large size of videos it is not possible to preload all video shots of the result list into the cache. In our architecture, the client controls the data ow from the server to the client by requesting single shots from the server (client-pull).4 The main task for the caching algorithm is to decide which shots should be hold in the cache and which should be replaced to support interactive browsing applications. The idea is to assign relevance values to video shots and to cache those with high relevance value at the client assuming that the user will select them. For the assignment of relevance values to video shots we consider both, the presentational aspects of the time-dependent media to support continuous, jitter- and latency- free playout and the contentbased retrieval aspects. Presentation values consider the current presentation state and likely VCR-interactions [21]. For example, the video shots following the presentation point (i.e., the current shot to be presented) in presentation direction have the highest value, decreasing with the distance from the presentation point. If, for example, a change of the presentation direction is likely in the application, the shots preceding the presentation point, that are typically already loaded in the cache, obtain relatively high relevance, too. In the following, the presentation value of a shot is represented by pv(video shot) pv(video shot) 2 [0,1] 8 video shots

These presentation values refer to the currently presented video and do not consider jumps to shots of other videos as needed for the browsing application where switches to other videos occur frequently. Furthermore, it is not considered whether the shots have higher or lower relevance to the content of the query. 4 For simpli cation, we assume that the client requests video shots. Similar, other units, e.g., GOPs (Group of Pictures) or frames of an MPEG-video, may be requested. In this case, the retrieval relevance values, calculated by AVIA on shot granularity, have to be mapped to the syntactical units like frames.

For all video shots of the hit list the retrieval relevance value rv is given by AVIA and stored as meta data in the DBMS. A retrieval relevance value of a video shot is in the following noticed as rv(video shot) rv(video shot) 2 [0,1] 8 video shots The main idea of our caching management strategy is to combine the retrieval relevance with the presentation relevance. The goal is to keep the most important shots, corresponding to a previous query and the presentation state, in the client cache. That means with respect to the browsing activities of the user that a continuous presentation of the most relevant shots is supported and thereby the system performance is improved by reducing data requests from the server. Furthermore, explicit user wishes, represented by current user interactions, are supported by the cache management strategy, too. In our approach the cache manager preloades and replaces video shots based on their browsing relevance value. The browsing relevance value bv is determined by the parameters pv(video shot) and rv(video shot): bv(video shot) := (  rv(video shot)) + (  pv(video shot)) + where and are weighting factors. These weighting factors represent the importance of the pv and rv value and consider Quality of Service (QofS) speci cations. For example, a high value represents high QofS parameters concerning the latency of a video request. A high value represents high QofS parameters concerning the continuous presentation like jitter or skew. After the user sends a query to the system, the cache manager preloads those shots which have highest relevance to the query. The cache space is dynamically divided for all hits. When a user starts to present video shots the presentation process in uences the relevance of the video shots. This means that shots preloaded before have to be replaced due to increasing browsing value of video shots which are not in the cache. In sum, this strategy considers the content-based relevance values, delivered by the retrieval algorithm, and the current presentation process.

5 Conclusion In this paper, we presented a framework for browsing support that is based on feature extraction, feature aggregation, and content-based client caching within the

HERMES architecture. The main aspect is that the content of the media in uences the caching mechanism by combining presentational as well as content-based relevance values for the preloading and replacement of the client cache objects. Future work will go one step further: the knowledge about the relevance values to a query will be used for Quality of Service (QofS) support. The idea is to present scenes with lower relevance with lower quality and vice versa. This means that the QofS provided by the system varies inbetween a media object dependent on its content.

6 Acknowledgements We gratefully express our thanks to our former colleague Adrian Muller for the development of the abductive inference engine, to our former student Pia Hofmann for setting up the statistical analysis framework, to our former student Florian Schmidt for implementing the database functionality, and to all colleagues at the DELITE devision who contributed to building the indexed image collection.

References [1] HERMES Esprit joint project (Long Term Research No. 9141). http://www.ced.tuc.gr. [2] S. Blott, L. Relly, and H.-J. Schek. An open storage system for abstract objects. In H. V. Jagadish and I. S. Mumick, editors, Proc. of the 1996 ACM SIGMOD International Conference on Management of Data, pages 330{340, Montreal, Quebec, Canada, June 1996. [3] S. Blott and R. Weber. A simple vector approximation le for similarity search in high-dimensional vector spaces. Technical report, Esprit Project HERMES, March 1997. [4] S.-F. Chang, J. Smith, H. Meng, H. Wang, and D. Zhong. Finding images/video in large archives. DLib Magazine, February 1997. [5] Y.-L. Chang, W. Zeng, I. Kamel, and R. Alonso. Integrated image and speech analysis for content-based video indexing. In Proceedings of IEEE Int. Conf. on Multimedia Computing and Systems, 1996. [6] M. Christel, D. Winkler, and C. Taylor. Multimedia abstraction for a digital video library. In ACM Digital Libraries '97, pages 21{29, Philadelphia, PA, 1997. [7] S. Christodoulakis and P. Trianta llou. Research and development issues for large-scale multimedia information systems. ACM Computing Surveys, December 1995. [8] S. Christodoulakis, P. Trianta llou, and F. Zioga. Principles of optimally placing data in tertiary storage libraries. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 234{245, September 1997.

[9] S. Christodoulakis and F. Zioga. Data base design principles for striping and placement of delay-sensitive data on disks. In Proc. of the 17th Symposium on Principles of Database Systems PODS'98, pages 69{ 78, 1998. [10] P. Ciaccia, M. Patella, and P. Zezula. M-tree: An ecient access method for similarity search in metric spaces. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), September 1997. [11] S. Devadinga, D. Kosiba, U. Gargi, S. Oswald, and R. Kasturi. A semiautomatic video database system. In Proc. SPIE - Multimedia Computing and Networking, January 1995. [12] U. Gargi, S. Oswald, D. Kosiba, S. Devadinga, and R. Kasturi. Evaluation of video sequence indexing and hierarchical video indexing. In SPIE Conference on Storage and Retrieval in Image and Video Databases, 1995. [13] T. HERMES Consortium. The HERMES Reference Architecture. Internal Project Report, 1997. [14] S. Hollfelder. Admission control for multimedia applications in client-pull architectures. In Proc. of the 3rd Int. Workshop on Multimedia Information Systems (MIS), pages 121{128, September 1997. [15] S. Hollfelder and K. Aberer. An admission control framework for applications with variable consumption rates in client-pull architectures. GMD Technical Report 8, GMD, Sankt Augustin, April 1998. submitted for publication. [16] S. Hollfelder, A. Kraiss, and T. C. Rakow. A client-controlled adaptation framework for multimedia database systems. In Proc. of European Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS'97), Darmstadt, Germany, September 1997. [17] S. Hollfelder and H.-J. Lee. Data abstractions for multimedia database systems. Technical Report, Arbeitspapiere der GMD 1075, GMD, Sankt Augustin, May 1997. [18] S. Hollfelder, F. Schmidt, M. Hemmje, and K. Aberer. Transparent integration of continuous media support into a multimedia DBMS. In Proceedings of IADT '98, Berlin, 1998. Enlarged version published as GMD Technical Report, Nr. 1104, Sankt Augustin, Germany, December 1997. [19] B. Lutes, S. Kutschekmanesch, U. Thiel, C. Berrut, Y. Chiaramella, F. Fourel, H. Haddad, and P. Mulhem. Study on Non-Textbased Information Retrieval State of the Art. EU, Study ELPUB 106, 1996. [20] D. Ma and G. Alonso. Distributed client caching for multimedia data. In Proc. of the 3rd Int. Workshop on Multimedia Information Systems (MIS), pages 115{ 120, September 1997. [21] F. Moser, A. Kraiss, and W. Klas. L/MRP: a bu er management strategy for interactive continuous data

ows in a Multimedia DBMS. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 275{ 286, Sept 1995.

[22] A. Muller and A. Everts. Interactive image retrieval by means of abductive inference. In RIAO 97 Conference Proceedings { Computer-Assisted Information Searching on Internet, pages 450{466, June 1997. [23] A. Muller and U. Thiel. Query expansion in an abductive information retrieval system. In RIAO 94 Conference Proceedings { Intelligent Multimedia Information Retrieval and Management, pages 461{480, October 1994. [24] G. Nerjes, P. Muth, M. Paterakis, Y. Romboyannakis, P. Trianta llou, and G. Weikum. Scheduling strategies for mixed workloads in multimedia information servers. In Proc. of the 8th Int. Workshop on Research Issues in Data Engineering (RIDE'98), February 1998. [25] P. Scheuermann, G. Weikum, and P. Zabback. Data partitioning and load balancing in parallel disk systems. VLDB Journal, 7(1):48{66, 1998. [26] A. Steinmetz. DiVidEd { A Distributed Video Production System. In VISUAL'96 Information Systems, VISUAL'96 Conference Proceedings, 1996. [27] M. Yeung, B.-L. Yeo, W. Wolf, and B. Liu. Clustering and scene transitions on compressed sequences. In Multimedia Computing and Networking, 1995. [28] H. Zhang, C. Low, S. W. Smoliar, and J. Wu. Video parsing, retrieval and browsing: An integrated and content-based solution. In Proceedings of ACM MM, pages 15{24, 1995.