Search algorithms for sub-datatype-based multimedia retrieval Punpiti Piamsa-nga (
[email protected])
Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, 10900, Thailand
Nikitas A. Alexandridis (
[email protected])
Department of Electrical Engineering and Computer Science, George Washington University, Washington D.C., 20052 USA
Abstract. Recently, the researchers mostly are interested in only the search for
the data content that are globally similar to the query, not the search inside data items. This paper presents an algorithm, called \generalized virtual node (GVN)" algorithm, to search for data items where parts of them (sub-datatype) are similar to the incoming query. We call this \sub-datatype"-based multimedia retrieval. Each multimedia datatype, such as image and audio, in this paper is represented as a k-dimensional signal in the spatio-temporal domain. A k-dimensional signal is transformed into characteristic features and these features are stored in a hierarchical multidimensional structure, called the \k-tree." Each node on the k-tree contains partial content corresponding to the spatial and/or temporal positions in the data. The k-tree structure allows us to build a uni ed retrieval model for any types of multimedia data. It also eliminates unnecessary comparisons of cross-media querying. The experimental results of the use of the new GVN algorithm for \subaudio" and \sub-image" retrievals show that it takes much less retrieval times than other earlier algorithms, such as brute-force and Partial-matching algorithm, while the accuracy is acceptable.
Keywords: Sub-datatype, Content-based retrieval, Multimedia, Uni ed model, Search algorithms
1. Introduction Today, more elaborate content-based multimedia retrieval systems are possible to implement because of rapid improvements in processor technologies, high-speed communications, and high-density storage devices. That spurs the growth of multimedia databases spanning such diverse elds as art, medicine, science, and engineering. Most databases are required to handle various datatypes including, audio, still images, and video. Conventional database management systems using keyword-based scheme have been used eciently on text data. However, applications on other multimedia datatypes require a search based more on similarity rather than on exact matching [2, 3]. To provide good eciency and adequate performance, one of the main interests on
c 1999 Kluwer Academic Publishers.
Printed in the Netherlands.
jirsta.tex; 15/03/1999; 12:37; p.1
2
Punpiti Piamsa-nga and Nikitas A. Alexandridis
multimedia research is to come up with ecient approaches to perform automatic content-based indexing and retrieval. There are three main diculties in automatic content-based multimedia retrieval. First, data content is hard to give the descriptions in word [3]. The recognition of data content (feature) requires prior knowledge and special techniques in Signal Processing and Pattern Recognition, which usually require long computing time. Second, since several multimedia features can be used as indices (such as pitches and amplitudes in audio [4, 6, 13, 16]; and colors and textures in images [2, 10, 14]) a method or a processing technique designed and developed for one feature may not be appropriate for another. Third, the extremely large database size and the use of a similarity search require extensive computations. Such a search algorithm is based on measuring the distance between the query and each of the multimedia data records and reporting the best matching results: i.e., those that have the smallest distances to the query. To cover these three problems, we came up with the development of a uni ed model for representing any multimedia datatypes and search algorithms for the uni ed model. In the uni ed model, all subjective multimedia features, such as image colors and audio amplitudes, are de ned and represented by histograms and stored uniformly by a multidimensional, hierarchical, multiresolution structure called the k-tree. The k-tree representation and processing allow both the accuracy and the retrieval times to be dynamically adapted to the users' requirement, and the algorithms designed for one datatype or feature to be able to be reused where only the dimension (k) of the datatype is varied. Therefore, by using this uni ed construction, any processing on the k-tree is independent from datatypes and features. To reduce the retrieval times, pipeline and parallel techniques have been proposed [8]. Nevertheless, the development of fast search algorithms for one processor is still required. In addition, the retrieval time also depends on the nature of querying methods. The querying types can be classi ed roughly into two dierent approaches: \datatype-based search," and \sub-datatype-based" search. Datatype-based search is to look for the data items that are globally similar to the queries. Subdatatype-based is to look for \parts of data" that are similar to the query. For instance, in particular for retrieving an an image clip, we call these approaches speci cally as \image-based" and \sub-imagebased" approaches for image retrieval. Both approaches cause dierent types of querying results. However, researchers currently emphasize only in the datatype-based search approach. Only few are working on sub-datatype-based search approach [9, 12].
jirsta.tex; 15/03/1999; 12:37; p.2
Search algorithms for sub-datatype-based multimedia retrieval
3
In this paper, a generalized sub-datatype-based search algorithm called \Generalized Virtual-Node (GVN)" algorithm to search on a uni ed model representing the characteristic features of any multimedia datatypes is proposed. Types of multimedia data and types of querying are classi ed and the uni ed model is described. The performance measurements of the algorithms by the use of audio and image databases are demonstrated. The retrieval time using the proposed search algorithms is faster than using other algorithms, such as brute-force and Partial-Matching (PM) [12] search, while the accuracy is acceptable.
2. Multimedia datatypes and features 2.1. Multidimensional, multi-channel datatypes Multimedia data can be viewed as either the datatypes or the features that categorize them. Multimedia datatypes consists of data structures with diverse characteristics, such as images, audio, motion pictures, and video [3]. All types of multimedia data can be recognized as multidimensional, multi-channel signals in spatio-temporal domain [4]. Stereo audio is a one-dimensional, two-channel signal. It usually is a sequence of audio amplitude in the temporal domain. A digital audio signal is sampled and encoded from an analog signal. The quality of digital audio signals depends on the sampling rate and the resolution of the amplitude values. Image is a two-dimensional multi-channel signal, where data in both dimensions are spatial and each channel can be data from dierent frequencies, such as red, green, blue, infrared, and ultraviolet. Each element of image (pixel) represents colors (or amplitudes of data in infrared range (or other interest frequency ranges)) at the corresponding positions. The resolution of an image depends on the sampling rate and the amount of possible details required for each pixel. Motion picture is a three-dimensional signal; two in spatial domain and one in temporal domain. The information of motion picture is virtually stored as a sequence in time of image frames. Video is a composite signal. It is a temporally synchronized signal of many channels, such as motion pictures, audio, and text captions. The processing of multimedia data has a familiar trade-o in any datatypes; one must select an importance ranking of data quality, storage, and computation speed. From the above explanation, based on that all datatypes are represented as multi-channel multidimensional signals, uni ed data structures and algorithms can be developed.
jirsta.tex; 15/03/1999; 12:37; p.3
4
Punpiti Piamsa-nga and Nikitas A. Alexandridis
2.2. Characteristic features The characteristic feature of multimedia data is subjective information [3]. Features are used to distinguish one selection of multimedia data from others. Features are classi ed into two types: mathematics-based and knowledge-based features. Mathematics-based features can be divided roughly into two types: transform-based and statistics-based features. For the transform-based features, the transform techniques, such as Fourier, DCT, wavelet, etc., are used to convert the data from spatial domain to the frequency domain [5, 8, 10, 13]. Then, transformed data are ltered by a special lter to extract a speci c interest feature. Transform-based features usually contain all or parts of the spatial information. However, keeping all spatial information requires huge data storages and takes very long time for the computations. Statistics-based approaches generate features calculated from the statistics of the spatial information of data. Examples of statistics-based features include the average values and the standard deviations (of audio pulse-code modulation or image gray-level.) The size of a statistics-based feature is quite small. Many types of statistics-based features, especially the ones that are generated by averages and standard deviations, are not considered to be good features since they cannot be used to distinguish data eciently. On the other hand, knowledge-based features, such as the characteristics of a human face, can not be readily and eciently extracted through the use of a mathematical model. It is believed that knowledgebased features can be built over the mathematics-based features and combined with some semantic rules to create relations among them. The processing of knowledge-based features is beyond the scope of this paper. 2.3. Features' histogram model The advantages of both transform-based and statistics-based types of mathematical features can be combined together. The solution for generating the index table of a database is to use histograms of discrete, transformed features. Using these \features' histograms" produces smaller index table (compared to those using transform-based features) and provides more information than the statistics-based features. The features' histogram also can be optimized between the size of the index and the amount of information, by properly selecting the number of bins in the histogram. An example of a features' histogram is the \texture histogram" (of an image.) The images are transformed using, say, wavelet transforms, the transformed images are ltered for only some pre-selected textures, and
jirsta.tex; 15/03/1999; 12:37; p.4
Search algorithms for sub-datatype-based multimedia retrieval
5
then textures are counted to construct a histogram. A feature can be either a linear feature or a non-linear feature. A linear-features' histogram uses a feature whose values, corresponding to each bin from the rst to the last bin in a histogram, grow up or down linearly. For example, a 256-level gray-scale histogram is a linear-feature histogram because the feature has gray values ranging from gray-level 0 to 255 in the corresponding bins numbered, say, 0 to 255, respectively. On the other hand, each bin in a non-linear features' histogram does not have linear relations between any pairs of the consecutive bins. A bin-distance function is de ned to calculate the distances between every pair of any two bins of a histogram. Suppose B () is a bin-distance function and am0 ;m1 = B (m0 ; m1 ) is a distance between histograms bin m0 and bin m1. A bin-distance matrix of M -dimensional histograms can be de ned by A = [am0 ;m1 ] where 0 m0 ; m1 M and M is the dimension of the histogram. An example of a non-linear feature is the RGB color feature. Because a color is composed of three linear signals (R, G, and B), the colors in dierent bins are not linear but a bin-distance matrix of color still can be generated [7]. However, a bin-distance matrix cannot be generated for some types of features, such as textures. This is because that feature can not be well de ned in mathematical terms and a bin-distance function cannot be established.
3. Multimedia data retrieval approaches 3.1. Measuring the similarity of features' histogram To measure the similarity between histograms of dierent features, three histogram distance functions have been used. All three distance functions are given below:
Euclidean distance function dE (H1 ; H2 ) =
MX ,1 m=0
(h1 [m] , h2 [m])2
(1)
Histogram intersection distance function [14] dI (H1 ; H2 ) = 1 ,
PM ,1 ]; h2 [m]) m=0 min(h1 [m PM PM ,1 , 1 min( m0 =0 h1 [m0 ]; m1 =0 h2 [m1])
(2)
jirsta.tex; 15/03/1999; 12:37; p.5
6
Punpiti Piamsa-nga and Nikitas A. Alexandridis
Histogram quadratic distance function [10] dQ(H1 ; H2) =
MX ,1 MX ,1 m0 =0 m1 =0
(h1 [m0 ] , h2 [m0 ]) am0 ;m1 (h1 [m1 ] , h2 [m1 ])
(3) Both H1 and H2 are M -dimensional histograms. The h1 (m) and h2 (m) are frequencies of elements in an arbitrary bin m of the histograms H1 and H2 , respectively. The am0 ;m1 is a matrix element of of a bin-distance matrix. These three distance functions can be used dierently, depending on the types of applications and the features used. The Euclidean histogram distance function (Eq. 1) only compares the identical bins in respective histograms; all bins contribute equally to the distance and the dierences in the features between two bins of histogram are ignored. This function provides fast computing and good results for the linear-features' histogram approach. The histogram quadratic distance function (Eq.3) uses a bin matrix to evaluate the similarity between two histograms. This function is suitable for the non-linear-features' histogram approach. Note that, if it is applied to the linear-feature approach, the result will be equivalent to the result using the Euclidean distance function. This function also takes quadratic computation time. The histogram intersection distance function (Eq. 2) compares only the elements, which exist in the query. This function is appropriate for searching by a query that has smaller size than size of the data, because it can eliminate the eect of the background. Therefore, this distance function is mainly used in this paper. 3.2. Search approaches In this paper, all query types are \query-by-example (QBE)." The QBE approach searches for data that are similar to an input query. The query has to be the same datatype as data in the database. An example of a QBE query is: \which audio clips in an audio database are similar to an audio query clip?" The QBE can be further divided into three subtypes: 1) datatype-based search; 2) sub-datatype-based search; and 3) object-based search. The datatype-based search looks for data items that are globally similar to the query, regardless to the scale of the query. In other words, all records in a database must be re-sampled into a uni ed format; for instance, all images in a database must be scaled into a uni ed size. An example of a query using datatype-based search is \Which are the data items that have blue sky background on the upper part and green background on the lower part, regardless to the sizes of the pictures?" A
jirsta.tex; 15/03/1999; 12:37; p.6
Search algorithms for sub-datatype-based multimedia retrieval
7
query using sub-datatype-based approach searches for \parts of data" that are similar to the query. An example of a sub-datatype-based query is \List all audio clips in which a portion of each contains sounds of gun shots, where all other parts of the clips are ignored." The object approach is generalized from datatype-based and sub-datatype-based; therefore, a query using object approach searches for a part of data that is globally similar to the query regardless of the scale. An example of an object-approach query is \List all images which contain a red-cross sign, regardless of the sign sizes and their positions in the images." The object-search approach is actually a major problem in the eld of Pattern Recognition, which is beyond the scope of this paper. 3.3. Retrieval output specification Because the similarity search is based on nding the minimal distances between the query and each of the records, the computation time grows with the size of the database. Usually, users do not require ranking results of the whole database, but some of the best matches. They can limit a number of outputs by two approaches: limiting by a distance threshold and limiting by a given number of querying outputs [10]. Each approach has dierent eects on performances. Giving a distance threshold, users will get more promising results but the searching time is unpredictable; moreover, a bad threshold value may bring lots of useless results. However, giving a limit number of outputs, users may lose some good results if there are a lot of results that have the same distances and some are cut by the limit number. In this paper, all querying results are limited by a given number of requested results.
4. The k-tree representation of multimedia datatypes and features 4.1. The k-tree structure The k-tree model for multimedia data is used to unify the various data characteristics. A k-tree is a directed graph; each node has 2k incoming edges and one outgoing edge with a balanced structure. A k-tree can be a binary tree, a quadtree, or an octtree for one-, two-, or threedimensional data, respectively. The use of the k-tree provides three main bene ts. First, the spatiotemporal information of the data is embedded into the tree structure; this reduces the time of JOIN operation processing from cross-media
jirsta.tex; 15/03/1999; 12:37; p.7
8
Punpiti Piamsa-nga and Nikitas A. Alexandridis
querying1 Second, a k-tree can exploit multiresolution processing; this allows user to optimize the restrieval time and search accuracy. Third, the complexity of data structure aects only the degree k of the tree; consequently, an algorithm for a particular type of feature can be reused for a feature of other multimedia datatypes. 4.2. Feature-based histogram k-tree model Since a histogram does not include any spatial information, the ktree structure is used to retain location information and the histogram is used to store the characteristics of each portion of the data that correspond to that particular part of the tree. This generalized model is depicted in Figure 1. First, the feature of interest is extracted by either general mathematical models or special pattern recognition methods. This process is called feature extraction. Second, the universe of datatypes is reduced into a smaller nite set and each data records in the database is mapped to t into this nite set. This process is called feature quantization. For instance, for the color feature, the in nite number of colors in nature has to be reduced into a nite number, say 256 colors. Third, if the size of a data item is not a power of two, then dummy data will be virtually added into each dimension to make the size of each dimension be the lowest power-of-two number that is greater than the largest dimension of data. Then, a histogram is constructed and the k-tree of that features' histogram is formed. This process is called feature k-tree generation
5. Search algorithms on the k-tree structure 5.1. Datatype-based search algorithm In the \datatype-based search" approach, both the data in the multimedia database and the query must be normalized to a uni ed format. Normalized data has a similar sampling rate and a similar number of samples. This is useful for applications that require a search, which disregards the scale of the data. The cost of the normalization of the data is too high but it can be done oine. Since the index is kept in the histogram k-tree structure, the histogram is instead normalized rather than the data itself. By exploiting the hierarchical structure of 1 \Cross-media" query is a composite query that composes of two or more subqueries from dierent datatypes. An example of cross-media query is \ nd a video clip of a governor giving a speech about car tax." In this case, if motion pictures and audio data are processed separately, the database JOIN operation is needed to merge results from both subqueries.
jirsta.tex; 15/03/1999; 12:37; p.8
Search algorithms for sub-datatype-based multimedia retrieval
Known Features’ tables (domains)
(Original Multimedia Datatypes)
Feature-1 extraction
Feature-1 quantization
Feature k-tree
Feature-1 k-tree construction
Generation of k-tree of multiple features
Adding dummy
Video images
Feature-2 extraction
Feature-2 quantization
Feature-2 k-tree construction
audio
Feature-N extraction
“FEATURE” EXTRACTION
Feature-N quantization
QUANTIZATION
9
Generation of k-tree of multiple features’ histograms
Feature-N k-tree construction
K-TREE CONSTRUCTION
(Construct the k-tree of features’ histograms)
k-tree of features’ histograms
(database)
Indexing/retrieval
Figure 1. Generalized indexing/retrieving model
the trees, the search is improved by the selection of the optimal level of a k-tree, which is the level furthest from the leaves that can distinguish the data. [7] The nearer the level to the leaves, the more accurate the search can be done and the longer time it will be taken. The pseudocode of datatype-based search is given in Algorithm 4 (in the Appendix.) 5.2. Sub-datatype-based search algorithms A generalized algorithm for searching content inside a piece of multimedia data is presented. This algorithm allows to search data in a multimedia databases and nd those that contain the same content as the incoming query. The algorithm, which is used for iterating the comparison between the query and each data item in the database, is shown in Algorithm 5 (in the Appendix.) and the example brute-force sub-datatype-based search algorithm is shown in Algorithm 6 (in the Appendix.) In this paper, the newly proposed algorithm is mainly intended to replace the slow brute-force and Partial Matching algorithms [12]. Algorithm 1 is the details of the restricted version of the purposed search algorithm, called the Virtual-node (VN) algorithm, where each dimension of data or query must have same power-of-two size (data size and query size is not necessary to be equal.)
jirsta.tex; 15/03/1999; 12:37; p.9
10
Punpiti Piamsa-nga and Nikitas A. Alexandridis
Algorithm 1: The Virtual-Node (VN) sub-datatype-based search algorithm
VirtualNodeComparison (In FeatureOfQuery, In FeatureOfData, Out Distance, Out Location) BEGIN
Case A) If the query's k-tree aligns within the k-tree structure of data: 1. Using Histogram intersection distance function, Find the feature distances between features in root of query tree and features in the nodes of data at Li , 1 { nodes with solid-line link in Figure 2. If the distance between a child node is equal to that between the query and its parent (Li ), the query could be found within that child node. 2. Repeat Case A) recursively on this child node. If there is no distance at level Li , 1 close to the distance to the parent, the query is \not aligned". Follow Case B) below.
Case B) If the query falls in between two (or more) nodes: 1. If no real node in k-tree (darker nodes in Figure 2) can be a candidate for the search in a deeper level of tree, virtual nodes (dot-line nodes in Figure 2) between two nodes have to be created from the parts of their child nodes. Features in a virtual node are generated by summarizing the features of its child nodes. 2. Repeat the whole algorithm into a new tree under the new virtual node; i.e. use the whole algorithm within the dashed box in Figure 2.
Case C) If the height of the query is equal to of a node: 1. Use Algorithm 4 to calculate the distance with regardless to any information of dummies and then 2. Return the distance and location. END
An example of a search using the virtual node algorithm is also illustrated in Figure 2. The root node of the query is compared with the root node of the data to determine whether the query is a part of the data in the tree under node (1) at level 3. Suppose the result shows
jirsta.tex; 15/03/1999; 12:37; p.10
Search algorithms for sub-datatype-based multimedia retrieval
11
that the target is in between the tree under node (1), the comparison between root of the query and the lower-level nodes continues using the nodes in level 2 { Nodes (2) and (3.) If the result at level 2 shows that the query is under node (2) and not under node (3), then the subtree under node (3) is ignored and the process continues with the nodes (4) and (5) at level 1. If results from both nodes (4) and (5) do not indicate that there is a match below, a virtual node (6) is generated dynamically and then compared with the query. Here, the heights of the data and query are equal; therefore, the Algoritghm 4 is applied to nd the nal distance. In this example, the result is found under node (6.) The Virtual-node algorithm is enhanced to the Generalized VirtualNode (GVN) algorithm. The GVN algorithm is described in Algorithm 2. It allows searching the data content, where the sizes of both query and data do not have to be equal or a power of two. (1)
Query
1 0 0 1 0 1
11 00 00 11 00 11 (4)
Level 3
(2)
11 00 00 11
(5)
00 11 (6) 11 00 00 11 00 11
(search target)
1 0 0 1 0 1
(3)
11 00 00 11 0 1 0 1 0 1
Level 2
Level 1 Level 0
11 00 00 11
Real node
Virtual node
Figure 2. Illustration of searching using the virtual-node algorithm
Algorithm 2: The Generalized Virtual-Node (GVN) sub-datatype-based search algorithm GeneralizedVirtualNodeComparison (In Query, In FeatureOfExtendedData, Out Distance, Out Location) 1. BEGIN 2. ExtendedQuery=AddDummies (Query) 3. FeatureOfExtendedQuery = FeatureExtraction (ExtendedQuery) 4. VirtualNodeComparison (FeatureOfExtendedQuery, FeatureOfExtendedData, ROOT, distance1, TentativeLocation1) 5. IF (distance1 < threshold1) THEN BEGIN 6. Find "QueryRepresentative," the largest node in the k-tree of FeatureOfQuery, where no parts of dummies are included. 7. VirtualNodeComparison (QueryRepresentative, FeatureOfExtendedData, TentativeLocation1,
jirsta.tex; 15/03/1999; 12:37; p.11
12
Punpiti Piamsa-nga and Nikitas A. Alexandridis
8. 9.
10. 11. 12. 13. 14. 15.
distance2, TentativeLocation2) IF (distance2 < threshold2) THEN BEGIN Find the final distance by calculating the distance between the query and area of data where the beginning of the area is at TentativeLocation2. Distance = distance2 Location = TentativeLocation2 RETURN END END END
An example of a search one-dimensional data using the virtual-node algorithm is illustrated in Figure 3. Suppose a query is a sequence of three consecutive nodes of feature, which has only two values (A and B) in its universe. A dummy node (D) is appended to a query's feature. The query's feature with a dummy node is used to construct a binary tree (k = 1) of histogram. Similarly, a 6-node-long feature of a search target is appended by two dummy nodes and then a binary tree of the search target is constructed and shown in Figure 2 (b). All comparisons use the histogram intersection function as a distance function. The search begins at (1) by comparing the root of the query (Q1) and the root of the search target. Suppose the comparison result at node (1) shows that the target is located in between the tree under node (1) then Q1 continues compare with nodes (2) and (3). The comparison at node (2) shows that the target should be under node (2). To determine more accurate precision, the VN algorithm is applied onto parts of the query's tree that do not have dummies. In Figure 2 (a), the search continues by using the subquery Q2, which is the largest subtree of the query that does not have a dummy node. The search continues at nodes (4) and (5). It is noticeable that node (5) is possibly a result, then the second largest subtree (Q3) is used to compare at the location next to node (5). The comparison at node (6) does not give a good result; thus a virtual node (7) in Figure 2 (e) is generated. If another possible result is also found at node (7), the subquery Q3 is used to compare again with the (8). Finally, the position of the target is found.
Time complexity analysis
Let k be a number of spatio-temporal dimensions of the data; Sd be the average size of all records in the database; and Sq be the size of query. (The size of the data is the number of leaf nodes in a query or
jirsta.tex; 15/03/1999; 12:37; p.12
13
Search algorithms for sub-datatype-based multimedia retrieval Q1 3A
Feature-A node
11 00 00 11 00Feature-B node 11 D
Level 2
Q2 2A
A
Q3
Dummy node A
A
A
A.
Virtual node 3A3B
Query: Q1(3A)
(1)
Level 3
(2)
(3)
3AB
2B
AB
2A
11 00 00 11 00 11 B
A
A
A 3A3B
11 00 00 11 00 11 B
Level 1
11 00 00 11 00 11
Level 0
Level 3
2B 2A
A
A
Query: Q3 (A)
B.
Level 2
2B
A 3A3B
11 00 00 11 00 11 B
Level 1
11 00 00 11 00 11
D
D
Level 3
2B
AB
2A
A
A
A 3A3B
11 00 00 11 00 11 00 11 B
C.
Level 2
2B (6)
Query: Q2(2A)
Level 0
B
3AB
B
D
(5)
AB
11 00 00 11 00 11 11 00
D
B
3AB (4)
B
Level 2
2B
Query: Q2(2A)
11 00 00 11 00 11
Level 0
D
Real node
11 00 00 11
Level 1
Level 1
11 00 00 11 00 11 00 11
D
D
Level 0
B
Level 3
3AB
2B
D.
Level 2
(7)
1 2A 0 0 1
AB
00 11 00 11 11 00 B
A
2A
A
Query: Q3 (A)
2B
A 3A3B
11 00 00 11 11 00 B
Level 1
11 00 00 11 11 00
D
Level 0
Level 3
3AB AB
D
B
2B 2A
E.
Level 2
2B
Level 1
(8)
11 00 00 11 00 11 B
A
A
A
11 00 00 11 00 11 B
11 00 00 11 00 11 B
D
D
Level 0
F.
Figure 3. Illustration of searching using the generalized virtual-node algorithm
data record tree.) The total time complexity to compare two records is given in the following discussion. For the p brute-force (Algorithm 6 in the Appendix,) the time comp k k plexity is ( Sd , Sq + 1)k . For the partial-matching algorithm [12], the time complexity is ( SSdq ) + 3k . For the virtual-node algorithm, the
jirsta.tex; 15/03/1999; 12:37; p.13
14
Punpiti Piamsa-nga and Nikitas A. Alexandridis
time complexity is 3k log2k ( SSdq ) plus time of generating the virtual node, which is (3k , 2k ) log2k ( SSdq ). The brute-force algorithm has linear time complexity; it grows with Sd . The new algorithm has time S d complexity O log( Sq ) . However, a k grows, the constant term may have more eect to the processing time.
6. Experiments The GVN algorithm has been tested on audio and image databases. Both databases are built from downloaded data from many sites, such as the Kodak homepage [1], the Sunsite at University of North Carolina [15], and Smithsonian Institute [11], and some commercial CD-ROMs. Index tables for both audio and image databases are created o-line. The experiments are concerned primarily with two issues: the quality of search results and the retrieval times. Two metrics for measuring the eectiveness of the retrieval are recall and precision. 6.1. Measurement of effectiveness To measure recall and precision, it can be done as follows. A selected target clip is used to impose onto a random location in every record in the database. The selected clip is also used as an input query to search into that database. Technically, all clips should be retrieved. However, if a clip is not retrieved, it is counted as a miss. For other records in the result list, if the position of a retrieved result is within the length or size of the query from the imposed positions, this record is counted as a hit; otherwise, each is count as a false-alarm. Let H be a number of hits, M be a number of misses, and F be a number of false alarms. Therefore, Recall and precision originally are de ned by Recall = H +HM ; and Precision = H +HM [10]. Because the \hits" are determined from the distances between the query and the data that are lower than a selected threshold, both the recall and the precision depend on that threshold value. In this paper, we introduce a new de nition for recall and precision, which have less eect from the threshold. Let di be a distance between the query and data item new recall and precision are as follows: = PHi=0i.(1The PHi=0(1,ModifiedRecall , d ) d ) i i PH (1,di )+M ; and ModifiedPrecision = PH (1,di )+F . i=0
i=0
jirsta.tex; 15/03/1999; 12:37; p.14
Search algorithms for sub-datatype-based multimedia retrieval
15
6.2. An audio retrieval system The experiments have been performed for searching audio data using the GVN, PM and Brute-force algorithms. The audio database used in this experiment consists of several hundred audio les, where the total length is approximately 3.5 hours. Each audio clip has the same 8-kHz sampling rate but each may not have the same length. The results from each query are a list of audio clip identi cations with their distances. To measure the recall and precision, a one-second-long audio clip is selected as a query. The selected clip is used to impose onto every audio clip in the database, where the imposed position is randomly located and then kept as a reference. All imposed clips are used to generate a new test database. The index of this database is generated by using \audio-amplitude histogram feature." The audio-amplitude histogram is constructed by counting the number of pulse-code modulatioons (PCM) values of each amplitude value in a particular part of an audio clip. The k-tree index, which is a binary tree of amplitude histograms, is constructed. The GVN, brute-force, and PM algorithms are used to determine the retrieval times, recall, and precision. This process above is iterated several times by changing the query input. All measured results are used to nd the averages of recall, precision, and retrieval times for all GVN, brute-force, and PM algorithms. The experiment results are presented in Figure 4. In Figure 4, the horizontal axis is the number of outputs that is requested by users. The experimental results show that retrieval times using the GVN algorithm is much faster than using the PM and bruteforce algorithms, while there are little dierences for recall and precision (the retrieval time of brute-force is polynomial time complexity.) 6.3. An image retrieval system In the experiments performed on an image database, it demonstrates that the same model, which has been used in the audio retrieval system, can be used for other multimedia datatypes. \Color-histogram" feature is used to generate the index for the image database. The color histogram is constructed by counting the number of pixels of each color in a particular area. The k-tree in this case is a quad tree of color histograms. The image database used in this experiment consists of over 30,000 photographic pictures, where the data size is over 1 Gigabytes. Picture sizes in database are ranged from 320 200 pixels to 1200 1200 pixels. Recall, precision, and retrieval times are also measured by the same method used in the experiment of the audio retrieval above. The results are shown in Figure 5. It is noticeable that the results have the same behaviors as extperimental results from audio retrieval.
jirsta.tex; 15/03/1999; 12:37; p.15
16
Punpiti Piamsa-nga and Nikitas A. Alexandridis 60 ’GVN’ ’PM’
55 50 45
Seconds
40 35 30 25 20 15
300
A.
300
B.
300
C.
10 0
50
100
150 200 Requested number of results
250
100 ’GVN’ ’PM’ ’Brute-force’
99.5
99
Percents
98.5
98
97.5
97
96.5
96 0
50
100
150 200 Requested number of results
250
100 ’GVN’ ’PM’ ’Brute-force’
95
90
Percents
85
80
75
70
65
60
55 0
50
100
150 200 Requested number of results
250
Figure 4. Experimental results on an audio database; (A) Response time, (B) Recall, and (C) Precision
The qualitative results are also shown by the examples of the rst 100 pictures of the querying results in Figure 6. All examples may not have the same size but they are scaled just for displaying. The box in each example shows the best positions in the retrieved pictures that their contents are similar to the query. Note that, by the bene t of the k-tree that the spatial information is embedded in the tree, all pictures
jirsta.tex; 15/03/1999; 12:37; p.16
Search algorithms for sub-datatype-based multimedia retrieval
17
6000 ’GVN’ ’PM’ 5000
Seconds
4000
3000
2000
1000
30000
A.
30000
B.
30000
C.
0 0
5000
10000 15000 20000 Requested number of results
25000
100 ’GVN’ ’PM’ ’Brute-force’ 99.5
Percents
99
98.5
98
97.5
97
96.5 0
5000
10000 15000 20000 Requested number of results
25000
100 ’GVN’ ’PM’ ’Brute-force’
95 90 85
Percents
80 75 70 65 60 55 50 0
5000
10000 15000 20000 Requested number of results
25000
Figure 5. Experimental results on an image database; (A) Response time, (B) Recall, and (C) Precision
mostly have sky blue at the upper parts and dark green at the lower parts.
jirsta.tex; 15/03/1999; 12:37; p.17
18
Punpiti Piamsa-nga and Nikitas A. Alexandridis
Figure 6. Querying results using color feature. (The square box in each example shows the position of picture that the content is similar to the query. The picture at the top-left corner is the query. Others are results; decreasing accuracy of results is left-to-right, top-to-bottom)
7. Summary The \Generalized Virtual node (GVN)" algorithms of sub-datatypebased multimedia data retrieval are presented. These algorithms are uniformly extensible onto other types of multimedia data with less eort since they are developed on a uni ed multimedia retrieval model. The experimental search results on the audio and image databases show that the proposed algorithms are faster than other algorithms, such as brute-force and Partial Matching algorithms, while the accuracy is acceptable.
jirsta.tex; 15/03/1999; 12:37; p.18
Search algorithms for sub-datatype-based multimedia retrieval
19
References
1. Eastman Kodak: 1999, `Company homepage,' available URL: http://www.kodak.com. 2. Grosky, W. I. and Mehrotra, R.: 1989, `Special issue on Image database management,' IEEE Computer Vol. 22, No. 12, December 1989. 3. Gudivada V. and Raghavan V.: 1995, `Special issue on content-based image retrieval systems,' IEEE Computer Vol. 28, No. 9, September 1995. 4. Kemp Z.: 1995, `Multimedia and spatial information systems,' IEEE Multimedia, Vol. 2, No. 4, winter 1995. 5. MIT Media Lab: 1999, `Vision Texture database,' available URL: http://wwwwhite.media.mit.edu/vismod/imagery/VisionTexture. 6. Pfeier, S., Fischer, S., and Eelsberg W.: 1996, `Automatic audio content analysis,' Proceedings of Multimedia'96, Boston, Massachusetts, 1996, pp. 2130 7. Piamsa-nga, P. and Alexandridis, N. A.: 1999, `A universal model for contentbased multimedia retrieval,' International Journal of Computer and Their Applications, March 1999. 8. Piamsa-nga, P., Srakaew, S., Blankenship, G., Papakonstantinou, G., Tsanakas, P., and Tzafestas, S.: 1998, `A parallel algorithm for multi-feature contentbased multimedia data retrieval,' Seventh International Conference on Intelligent System, (ICIS98) Paris, France, July 1-3. 1998 9. Piamsa-nga, P., Subramanya, S. R., Alexandridis, N. A., Srakaew, S., Blankenship, G., Papakonstantinou, G., Tsanakas, P., and Tzafestas, S.: 1998, `Content-based audio retrieval using a generalized algorithm,' Advances in Intelligent Systems: Concepts, Tools, and Applications, Kluwer Academic,
1998.
10. Smith, J. R.: 1997, `Integrated spatial and feature image systems: retrieval, analysis, and compression,' Ph.D. Thesis, Columbia University, 1997. 11. Smithsonian Institute: 1999, `Online collection of pictures,' available URL: ftp://photo1.si.edu./ 12. Subramanya, S. R., Piamsa-nga, P., Alexandridis, N.A., and Youssef A.: 1998, `A Scheme for Content-Based Image Retrievals for Unrestricted Query Formats,' International Conference on Imaging Science, Systems and Technology (CISST'98), Las Vegas, Nevada, July 1998. 13. Subramanya, S. R., Simha, R., Narahari, B., and Youssef A.: 1997, `Transformbased indexing of audio Data for multimedia databases,' International Conference on Multimedia Computing System, Ottawa, Ontario, Canada, June
3-6, 1997.
14. Swain , M. J., and Ballard, D. H.: 1991, `Color Indexing,' International Journal of Computer Vision, 7:1, 1991. 15. University of North Carolina: 1999, `Sunsite FTP archive,' available URL: http://sunsite.unc.edu/pub/multimedia. 16. Wold, E., Blum, T., Keislar, D., and Wheaton, J.: 1996, `Content-based classi cation, search and retrieval of audio data,' IEEE Multimedia, Vol. 3, No.
3, pp. 27-36, Fall 1996.
jirsta.tex; 15/03/1999; 12:37; p.19
20
Punpiti Piamsa-nga and Nikitas A. Alexandridis
Appendix Algorithm 3: Datatype-based search
DatatypeBasedSearch(In Query, In FeatureOfData [n], In kTreeLevel, Out Record (Distance [n], Data [n])) 1. Begin 2. FeaturesOfQuery = FeatureExtraction(Query) 3. For i=0 to NumberOfNodesAt (kTreeLevel) do 4. NormalizedFeaturesOfQuery = FeatureNormalization (FeaturesOfQuery) 5. End for 6. For i = 0 to n do 7. Record.Distance[i] = DatatypeBasedDistance ( NormalizedFearureOfQuery, FeatureOfData[i], kTreeLevel). 8. Record.Data[i] = Data[i] 9. End for 10. Sort(Record(Distance[n],Data[n])) 11. End
Algorithm 4: Datatype-based distance function
Out DatatypeBasedDistance (In NormalizedFeaturesOfQuery, In FeatureOfData, In kTreeLevel) 1. Begin 2. Distance=0; 3. For i=0 to NumberOfNodesAt (kTreeLevel) do 4. NormalizedFeaturesOfData = FeatureNormalization (Features) 5. X = HistogramDistance( FeatureAtNode ( i, NormalizedFeaturesOfQuery), FeatureAtNode ( i, NormalizedFeatuaresOfData)) 6. Distance = Distance + (X * X) 7. End for 8. Return sqrt(Distance) 9. End
Algorithm 5: Sub-Datatype-based search
SubDatatypeBasedSearch(In Query,In kTreeLevelQuery, In FeatureOfData[n], In kTreeLevelData[n], Out Record (Distance[n], Data[n]))
jirsta.tex; 15/03/1999; 12:37; p.20
Search algorithms for sub-datatype-based multimedia retrieval
21
1. Begin 2. FeatureOfQuery = FeatureExtraction(Query) 3. For i=0 to n do 4. Record.Distance[i] = BruteforceSubDatatypeBasedDistance( FeatureOfQuery, kTreeLevelQuery, FeatureOfData[i], kTreeLevelData[i]) 5. Record.Data[i] = Data[i] 6. End for 7. Sort(Record(Distance[n],Data[n])) 8. End
Algorithm 6: Brute-force sub-datatype-based distance function
Out BruteforceSubDatatypeBasedDistance ( In FeatureA, In kTreeLevelA, In FeatureB, In kTreeLevelB) 1. Begin 2. NumberOfNodesLevelA = NumberOfNodesAtLevel (kTreeLevelA) 3. NumberOfNodesLevelB = NumberOfNodesAtLevel (kTreeLevelB) 4. NumberOfComparisons = NumberOfNodesLevelB-NumberOfNodesLevelA+1 5. Distance=0 6. For i to NumberOfComparisons do 7. For j to NumberOfNodesLevelA do 8. Distance = min( Distance , HistogramDistance( FeatureAtNode( j, FeatureA), FeatureAtNode( i+j, FeatureB)) 9. End For 10. End For 11. Return Distance 12. End
jirsta.tex; 15/03/1999; 12:37; p.21
jirsta.tex; 15/03/1999; 12:37; p.22
Search algorithms for sub-datatype-based multimedia retrieval
23
Authors' Vitae Punpiti Piamsa-nga is Lecturer at Department of Computer Engineering, Kasetsart University, Thailand. He received his B.Eng. and M.Eng. in Electrical Engineering from Kasetsart University in 1989 and 1993, respectively, and the D.Sc. in Computer Science from Department of Electrical Engineering and Computer Science, The George Washington University, Washington D.C., USA in 1999. His current research interest is Content-based Multimedia Data Retrieval on Parallel Systems. He has also involved in the elds of Parallel Computing, Computer Architecture, Multimedia, Natural Language Processing, and Computer-Aided Collaborative Work. Nikitas A. Alexandridis is Professor of Computer Engineering in the Dept. of Electrical Engineering and Computer Science, The George Washington University, Washington DC. He received his B.S. degree in Electrical Engineering (summa cum laude and with honors in Engineering) from Ohio University in 1966, and the M.S. and Ph.D. degrees in Computer Science from UCLA in 1967 and 1971, respectively. During the period 1974-84 he was Professor in Greece: between 1974-78 he held the rst Chair of Computer Science established in that country in the School of Engineering at the University of Patras, while between 1978-84 he held the Chair of Digital Systems and Computers at the National Technical University of Athens. He has been Professor at GWU since 1984. He also taught as visiting Professor at UCLA, Ohio University, and Autonoma Univesitat de Barcelona. He has extensive consulting experience both here and abroad, presented numerous seminars, invited talks, and short courses, has over 100 technical publications, and published 12 textbooks in Greek and English. Professor Alexandridis has received a number of awards, is listed in several \who's who," held elected positions in technical societies, was on the editorial board of technical journals, and is the founder and rst president of the Greek Computer Society. His research interests include: parallel and distributed computing; high-performance heterogeneous systems; advanced microprocessors and computer architectures; adaptable parallel supercomputing systems; and parallel computer vision, image processing, and content-based multimedia retrieval.
jirsta.tex; 15/03/1999; 12:37; p.23