ily grasp the overview of retrieval results via the clus- ... based methods7)-12) are useful for grasping the overview ...... Freebase topics related to âApple, Inc.â.
ITE Trans. on MTA Vol. 4, No. 1, pp. 49-59 (2016)
Copyright © 2016 by ITE Transactions on Media Technology and Applications (MTA)
Paper Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval Ryosuke Harakawa (student member)† , Takahiro Ogawa (member)† , Miki Haseyama (member)† Abstract
This paper presents an accurate and efficient method for extracting hierarchical structure of Web communities,
i.e., Web video sets with similar topics for Web video retrieval. First, efficient canonical correlation analysis (CCA), named sub-sampled CCA, is derived to obtain link relationships that represent similarities between latent features of Web videos. Moreover, the obtained link relationships enable application of an algorithm based on recursive modularity optimization to extract hierarchical structure of Web communities. Different from previously reported methods, our method can extract the hierarchical structure for the whole target dataset since the algorithm enables recursive reduction of its processing targets. This means it becomes unnecessary to perform screening of Web videos, and we can avoid performance degradation caused by discarding relevant Web videos in the screening, which occurred in previously reported methods. Consequently, our method enables extraction of the hierarchical structure with high accuracy as well as low computational cost.
Key words: hierarchical structure, Web community, Web video, video retrieval.
video retrieval, we have proposed methods that extract hierarchical structure of Web communities by using not only a single modality but also multi-modalities, i.e., visual, audio and textual features11)12) . In this paper, we define Web communities as Web video sets with similar topics. Moreover, the hierarchical structure is defined as the property of Web communities being divided into their sub-communities. However, one of our previously proposed method11) has a limitation in computational cost. The other proposed method12) solves this problem by introducing a community graph that can handle many Web videos efficiently, but this method needs to perform screening of Web videos for reducing computational cost. Thus, there is a problem that this method cannot realize both high accuracy of retrieval results and low computational cost simultaneously. In this paper, we present a new method to extract hierarchical structure of Web communities for Web video retrieval, which can solve the above problem. The proposed method enables extraction of the hierarchical structure with high accuracy as well as low computational cost by introducing a graph analysis algorithm based on recursive modularity optimization14) . First, as motivated by our previous work12) , we efficiently calculate canonical correlation analysis (CCA)15) -based link relationships that represent similarities between latent features of Web videos. Moreover, the obtained link relationships enable application of the graph analysis algorithm based on recursive modularity optimization14) , which makes screening of Web videos unnecessary and
1. Introduction With the proliferation of digital contents including Web videos∗ 1) , more and more users are retrieving Web videos to access their desired contents. Usually, to access them, users retrieve Web videos by inputting queries that identify the desired contents into a retrieval system. Then it returns retrieval results to users. However, if suitable queries cannot be input, the retrieval results may include irrelevant contents to the desired ones. Consequently, it becomes difficult for users to access the desired contents from the retrieval results2) . To overcome this difficulty, clustering-based retrieval methods3) - 12) have been proposed. Since users can easily grasp the overview of retrieval results via the clustering results, easy access to the desired contents becomes feasible. In particular, hierarchical clusteringbased methods7) - 12) are useful for grasping the overview of retrieval results since relationships between the obtained clusters become clear. Some methods use visual features and enable retrieval of videos based on hierarchical exhibition of videos7) - 9) . Another method for retrieving Web videos based on exhibition of hierarchical topic structure obtained via textual analysis and WordNet13) was proposed10) . Also, to realize accurate Web Received August 11, 2015; Revised September 11, 2015; ;Accepted Recieved ; Revised Accepted September 28, 2015 † Graduate School of Information Science and Technology, Hokkaido University (Sapporo, Japan)
∗
In this paper, we call video materials on the Web “Web videos”.
49
ITE Trans. on MTA Vol. 4, No. 1 (2016)
enables accurate and efficient extraction of the hierarchical structure of Web communities. By exhibiting the obtained hierarchical structure to users, retrieval of the desired Web videos becomes feasible. Finally, we explain our contribution of this paper. To the best of our knowledge, there has been no work that enables accurate and efficient hierarchical clustering via multimodal features for Web video retrieval. Here, to realize effective retrieval of Web videos that exist in quantity, (1) efficient calculation of multimodal features and (2) fast and accurate hierarchical clustering are necessary. To realize (1), our previous method12) , which enables efficient calculation of multimodal features and definition of a graph, is useful. To realize (2), it is undesirable to repeatedly calculate global features of the graph whose computational cost is high since screening of Web videos becomes necessary and may cause the performance degradation as with our previous method12) . Rather, fast and accurate hierarchical clustering based only on local structure of the graph is necessary. To realize this, the method14) is optimal. However, since the method14) has been proposed not for Web video retrieval but for general graph analysis, the method14) cannot be directly introduced into Web video retrieval. Hence, our contributions are twofold. First, we enable application of the method14) to Web video retrieval by newly defining a graph as motivated by the method12) . Note that we need to define an undirected graph to adopt the method14) meanwhile our previous work12) defines a directed graph. Thus, in this paper, we define an undirected graph that can preserve information of edge directions (see Eq. (9)). Second, we show the effectiveness of introducing the method14) into Web video retrieval by the experiments. In particular, although the method14) has not been proposed for Web video retrieval originally, we show that introduction of the method14) successfully enables accurate and efficient extraction of the hierarchical structure for Web video retrieval. This paper is organized as follows. An outline of our method is shown in Sec. 2. In Sec. 3, we show a method to efficiently derive CCA-based link relationships between Web videos. We explain a method to accurately and efficiently extract the hierarchical structure of Web communities for Web video retrieval in Sec. 4. In Sec. 5, we present results of experiments on actual Web videos to verify the accuracy and computational cost of our method. Concluding remarks are given in Sec. 6.
2. Outline of the Proposed Method Our method consists of the following two phases. Phase I: Derivation of CCA-based Link Relationships between Web Videos We calculate CCA15) -based link relationships via visual, audio and textual features, which represent similarities between latent features of Web videos. Here, efficient CCA, named sub-sampled CCA, is derived to obtain the canonical correlation for large training pairs with low computational cost according to our previous work12) . Phase II: Accurate and Efficient Extraction of Hierarchical Structure of Web Communities By constructing a graph whose nodes are Web videos based on the obtained link relationships, we enable application of the algorithm based on recursive modularity optimization14) to extract hierarchical structure of Web communities. Here, screening of Web videos for reducing computational cost becomes unnecessary since the graph analysis algorithm enables recursive reduction of the target nodes. Thus, we can avoid the performance degradation caused by discarding relevant Web videos in the screening, which occurred in our previous work12) . Finally, users can retrieve the desired Web videos by selecting Web communities associated with the desired contents according to the hierarchical structure. The following sections show these details.
3. Phase I: Derivation of CCA-based Link Relationships between Web Videos In Sec. 3. 1, we define Web video features for deriving the link relationships. A scheme for obtaining the link relationships is shown in Sec. 3. 2. 3. 1 Definition of Web Video Features for Deriving Link Relationships In this paper, we adopt visual, audio and textual features of Web videos according to our previous work12) . First, we apply a shot segmentation method16) to each Web video fi (i = 1, 2, · · · , N ; N being the number of Web videos) and obtain shots sqi i (qi = 1, 2, · · · , Mi ; Mi being the number of shots within fi ) from each Web video. Then, from each shot sqi i , we calculate a visual feature vector viqi , an audio feature vector aqi i and a textual feature vector tqi i as follows. Visual Feature Vector (p dimensions): The HSV color histogram with p bins is calculated every Pv frames of Web video fi , and its vector is obtained∗∗ . Then the vector median17) is calculated ∗∗
50
In the experiments, we set Pv as the vectors were obtained once a second.
Paper » Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval
for the frames in each shot sqi i , and the calculated vector is defined as the visual feature vector viqi (= T [viqi (1), viqi (2), · · · , viqi (p)] ). Thus, we obtain Mi visual feature vectors viqi (qi = 1, 2, · · · , Mi ) from each Web video fi . Audio Feature Vector (22 dimensions): Each shot sqi i in Web video fi is divided into some clips∗∗∗ , and we classify each clip into four audio classes, i.e., silence, speech, music and noise, based on a previously reported method18) . Then the audio class that is most included in the clips within each shot sqi i is selected. For all clips classified into the selected audio class in each shot sqi i , we calculate the averages and the standard deviations of the following 11 features: • volume, zero-crossing rate, pitch, frequency centroid, frequency bandwidth, subband energy ratio (0630 Hz, 630-1720 Hz, 1720-4400 Hz and 4400-11025 Hz), non-silence ratio and zero-ratio. For each shot sqi i , the audio feature vector aqi i (= T [aqi i (1), aqi i (2), · · · , aqi i (22)] ) is obtained by aligning the above obtained features. Thereby, Mi audio feature vectors aqi i (qi = 1, 2, · · · , Mi ) are calculated from each Web video fi . Textual Feature Vector (U dimensions): We obtain the keywords that appear in text attached to Web videos fi (i = 1, 2, · · · , N ) such as title and description. Moreover, we apply TF-IDF19) to the keywords and obtain the weights of them. By aligning the obtained weights for each Web video, we obtain the vecT tor ηi (= [ηi (1), ηi (2), · · · , ηi (K)] ). Furthermore, by applying random projection20) to ηi (i = 1, 2, · · · , N ), their dimensions are reduced to U (< K) by the following procedures. First, a K × U matrix R = (rij ) is calculated as follows:
rij
⎧√ ⎪ ⎪ 3 with probability of 16 ⎪ ⎪ ⎨ = 0 with probability of 23 . ⎪ ⎪ ⎪ ⎪ ⎩−√3 with probability of 1 6
3. 2 Derivation of CCA-based Link Relationships To obtain link relationships between Web videos, we calculate similarities between Web videos that enable comparison between different modalities via efficient CCA15) , named sub-sampled CCA, according to the reference12) . Here, similarity calculation with low computational cost can be realized since we introduce a clustering-based sub-sampling scheme of training pairs for CCA. Specifically, we first perform k-means clustering21) to viqi , aqi i and tqi i (i = 1, 2, · · · , N, qi = 1, 2, · · · , Mi ) for each modality. Then we obtain the cluster centers vjcent , acent j and tcent (j = 1, 2, · · · , Nclus ; Nclus being the number of j cluster centers) corresponding to visual, audio and textual feature vectors, respectively. Furthermore, from viqi , aqi i and tqi i (i = 1, 2, · · · , N, qi = 1, 2, · · · , Mi ), we select vectors with the shortest Euclidean distance to the obtained cluster centers vjcent , acent and tcent (j = j j 1, 2, · · · , Nclus ) for each modality. In this paper, the index of the visual feature vectors, which are selected from viqi (i = 1, 2, · · · , N, qi = 1, 2, · · · , Mi ) that is most similar to the cluster centers vjcent (j = 1, 2, · · · , Nclus ), is denoted by Iv , and |Iv | (= Nclus ) denotes the number of elements. In the same manner, we respectively define Ia and It for the audio and textual feature vectors, where |Ia | = |It | = Nclus . Then we denote the index set of vectors that become calculation targets for CCA by I (= Iv ∪ Ia ∪ It ), and the number of elements is denoted by Nsub (= |I|). Here, the selected vectors are denoted by xvi , xai and xti (i = 1, 2, · · · , Nsub )∗4 . Next, we apply CCA to these selected vectors and similarities between Web videos that enable comparison between different modalities. First, three kinds of matrices Xv , Xa and Xt are calculated by using xvi , xai and xti as follows:
(1)
Xv =
Then a new U -dimensional vector ti is calculated as follows: t i = R T ηi ,
i = 1, 2, · · · , N.
T
(3)
¯ a , xa ¯ a , · · · , xa ¯a , xa 1 −x 2 −x Nsub − x T ¯ t , xt2 − x ¯ t , · · · , xtNsub − x ¯t , Xt = xt1 − x
(4)
T
(5)
¯v , x ¯ a and x ¯ t are the average vectors of xvi , xai where x t and xi , respectively. By using Xk (k ∈ {v, a, t}), we calculate linear transformations that maximize correlations among visual, audio and textual features. Specifically, we calculate the matrices that project Xk into | l}). As a result the latent space of Xl (l ∈ {v, a, t |k =
tqi i
Thus, Mi textual feature vectors (= ti ) (qi = 1, 2, · · · , Mi ) are obtained for each shot sqi i in a Web video fi . In this way, the visual feature vector viqi , audio feature vector aqi i and textual feature vector tqi i are calculated from each shot sqi i in the Web video fi . ∗∗∗
¯ v , xv2 − x ¯ v , · · · , xvNsub − x ¯v xv1 − x
,
Xa =
(2)
∗4
For more details in the division into clips, refer to the paper12) .
51
For a detailed algorithm for k-means clustering for CCA, refer to the paper12) .
ITE Trans. on MTA Vol. 4, No. 1 (2016)
of CCA, we obtain coefficient matrices Wk and correlation matrices Λk,l in which each diagonal element is the canonical correlation coefficient∗5 . Next, by using Wk , ¯ vi , aqi i − x ¯ ai , tqi i − x ¯ ti }), ζkl qi , Λk,l and kiqi (∈ {viqi − x
If fi and fj do not link to each other, we do not build the edge between fi and fj . Note that we need to define an undirected graph to adopt the method14) in the next section meanwhile our previous work12) defines a directed graph. Thus, by Eq. (9), we define an undirected graph that can preserve information of edge directions. By constructing a graph G, we enable application of a method14) to extract the hierarchical structure for Web video retrieval as shown later. The contribution of this paper includes this.
i
i.e., the projection results of kiqi into the latent space ¯ vi , aqi i − x ¯ ai , tqi i − x ¯ ti }), are calculated of liqi (∈ {viqi − x as follows: l ζk qi = i
⎧ ⎪ ⎨ W T k qi k
k
k=l
if
i
⎪ ⎩Λl,k W T kqi i
(6)
otherwise.
Hence, from viqi , aqi i and tqi i , we obtain the vectors ζvl qi , ζal qi and ζtlqi that can be compared between difi i i ferent kinds of features. In the experiments shown later, visual, audio and textual feature vectors are projected into the latent space of textual features by setting l = t in Eq. (6). Note that the projection results of all feature vectors viqi , aqi i and tqi i (i = 1, 2, · · · , N, qi = 1, 2, · · · , Mi ) can be obtained by applying CCA to a small number of vectors xvi , xai and xti (i = 1, 2, · · · , Nsub ) selected via k-means clustering. Thus, we enable efficient calculation of CCA to obtain latent features of Web videos. Next, we compute similarities sij between Web videos fi and fj via the obtained latent features as follows:
q
ξi i
q (ξ i )T ξqj j i sij = max q , q qi ,qj ξ i ξ j i j
l T l T T = (ζvl qi )T , (ζa . qi ) , (ζ qi ) t i
i
4. Phase II: Accurate and Efficient Extraction of Hierarchical Structure of Web Communities In Sec. 4. 1, we present a method for accurately and efficiently extracting the hierarchical structure. A Web video retrieval method that uses the hierarchical structure is presented in Sec. 4. 2. 4. 1 Extraction of Hierarchical Structure of Web Communities After deriving the link relationships between Web videos, i.e., the graph G = (V, E), we extract hierarchical structure of Web communities, i.e., Web video sets with similar topics. Specifically, by using G, the hierarchical structure of Web communities is extracted via an algorithm based on recursive modularity optimization14) . The algorithm14) is a very fast method for graph analysis by which 118 million nodes can be processed in 152 minutes. In addition, our proposed similarities can be easily introduced into this method. Therefore, since this method is optimal for extracting the hierarchical structure efficiently and accurately, we adopt this method. Extraction of the hierarchical structure consists of the following two phases. In the first phase, each node fi (i = 1, 2, · · · , N ) is assigned to each different Web community. For each node fi , the gain of modularity Q is evaluated when a node fi is set to a Web community containing a neighbourhood node fj . Then fi is re-assigned to a Web community for which the positive gain is maximum. Note that modularity Q is an evaluation measure for the division results of the communities in the graph, which are defined by the following equation14) :
(7) (8)
i
Moreover, we build link relationships between Web videos based on the obtained similarities and the metadata of Web videos, namely “related videos”∗6 . In particular, we consider that a Web video fi links to a Web video fj if “related videos” of fi include fj . Although the only use of the low-level features extracted from Web videos may cause the semantic gap22) , i.e., the difference between the low-level features and high-level interpretation of humans, the metadata “related videos” is useful for obtaining Web videos that are similar to each other23) . Therefore, we introduce the metadata into our method. Then we construct a graph G = (V, E) whose nodes and edges are respectively Web videos fi (i = 1, 2, · · · , N ) and weighted links. The edge weight eij between Web videos fi and fj is defined as follows: eij =
⎧ ⎪ ⎨2sij if fi links to fj and fj links to fi ⎪ ⎩sij
if fi links to fj exclusive or fj links to fi i = 1, 2, · · · , N,
∗5 ∗6
Q=
(9)
N N ki kj 1 )δij , (eij − 2m i=1 j=1 2m
(10)
N N N where 2m = i=1 j=1 eij , ki = j=1 eij and δij is 1 if fi and fj belong to the same Web community and 0 otherwise. Thus, the higher the modularity is, the better division results of the communities are. This
j = 1, 2, · · · , N.
For derivation of Wk and Λk,l , refer to the paper12) . On video sharing services, Web videos related to each other are linked as “related videos”.
52
Paper » Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval
process is applied to all nodes iteratively and sequentially until no more improvement of the modularity can be obtained. In the second phase, we construct a new graph whose nodes are the Web communities obtained in the first phase. Here, the edge weight between the two new nodes is the sum of the edge weight of the original graph found during the first phase. Also, each new node has a self-loop that is derived from the weighted edges of the corresponding original nodes obtained in the first phase. In this paper, we call a pair of the first and second phases “a pass” and this iteration number is denoted by q (= 1, 2, · · · , Qh ; Qh being the number of all passes). Moreover, the passes, i.e., the first phase to find the Web communities from the new graph and the second phase to construct the newer graph, are iterated until no more improvement of the positive gain of modularity can be obtained. Then we denote the obtained Web communities by Comqnq (nq = 1, 2, · · · , Tq ; Tq being the number of Web communities where pass is q). Consequently, since we can attain Web communities with different levels of resolution according to the number of passes, it becomes feasible to extract the hierarchical structure of Web communities. Finally, we note the following for the above recursive processes. According to the increase of q, the number of Web videos within the new nodes becomes larger and the number of new nodes decreases in the second phase. Thereby, unlike our previously reported method12) that needs to perform screening of Web videos for handling many Web videos, our method can extract the hierarchical structure for the whole target Web videos by the above efficient recursive phases. Therefore, our method can avoid the performance degradation caused by discarding relevant Web videos in the screening unlike our previous method12) . For detailed procedures of extraction of the hierarchical structure, refer to Algorithm 1.
Algorithm 1 :
Extraction of Hierarchical Structure of Web Com-
munities Input: A graph G = (V, E) whose nodes and edges are Web videos fi (i = 1, 2, · · · , N ) and weighted links, respectively Output: Web communities with the hierarchy Comqnq (q = 1, 2, · · · , Qh , nq = 1, 2, · · · , Tq ) 1: Assign each node fi (i = 1, 2, · · · , N ) to each different Web community. 2: Set the index of pass as q ← 1. 3: while True do 4: while Improvement of modularity Q of a graph G is obtained do 5: for each node do 6: Evaluate the gain of Q when a node is set to each Web community containing neighbourhood nodes. 7: Re-assign a node to the Web community for which the positive gain of Q is maximum. 8: end for 9: Calculate Q of G. 10: end while 11: Denote the obtained Web communities by Comqnq (nq = 1, 2, · · · , Tq ). 12: if Improvement of Q of G is not obtained then 13: Break the while loop. 14: end if 15: Build the new graph G with a self-loops whose nodes are the obtained Web communities, where each edge weight is defined by the sum of the edge weights of the original graph. 16: q ←q+1 17: end while 18: Return Web communities with the hierarchy Comqnq (q = 1, 2, · · · , Qh , nq = 1, 2, · · · , Tq ).
q Rn (i) = q
N
eji δij ,
(11)
j=1
where δij is 1 if fi and fj belong to the same Web community and 0 otherwise. Thus, Web videos in each Web community are ranked on the basis of the number of weighted links in the graph of each Web community, i.e., the degree centrality of the graph. Moreover, we exhibit the Web communities in the order of Qh −1 1 h ComQ nQh , ComnQh −1 , · · · , Comn1 , that is, from larger Web communities to smaller Web communities. We notice that the larger Web communities include Web videos with various topics and the smaller Web communities contain Web videos with similar topics. Then users select the Web communities associated with the desired contents according to the exhibited hierarchical structure and retrieve the desired Web videos based on the ranking Rnq q (i) of each Web video. Hence, our method enables users to easily grasp an overview of many Web videos via the hierarchical structure of Web communities. As a consequence, the users can retrieve the desired Web videos even if varied topics are contained in the retrieval results since they cannot input suitable queries that identify the desired contents.
4. 2 Web Video Retrieval Using Hierarchical Structure of Web Communities In the proposed method, it becomes feasible to perform Web video retrieval by using the obtained hierarchical structure of Web communities. First, we obtain the hierarchically extracted Web communities Comqnq (nq = 1, 2, · · · , Tq , q = 1, 2, · · · , Qh ). Then we rank each Web video fi (i ∈ {1, 2, · · · , N }) that belongs to the Web community Comqnq in the descending order of the following measure Rnq q (i):
53
ITE Trans. on MTA Vol. 4, No. 1 (2016)
Table 1
Dataset Dataset Dataset Dataset
1 2 3 4
Details of datasets: p, K, U and Nclus are the numbers of dimensions of visual feature vectors, keywords used for computing textual feature vectors, dimensions after random projection, and cluster centers in k-means clustering, respectively. In this experiment, we used keywords that appeared in two and more Web videos for computing textual feature vectors.
Query keyword galaxy sightseeing apple jaguar
Num. of Web videos in the dataset 3000 3011 3001 3002
Num. of shots in the dataset 9179 10486 9440 10033
p (H=12,S=2,V=2) (H=12,S=2,V=2) (H=12,S=2,V=2) (H=12,S=2,V=2)
K 11894 11695 10076 8598
U 1000 1000 1000 1000
Nclus 1000 1000 1000 1000
our method, but this method does not use similarities between Web videos. Specifically, we defined edge weights eij between Web videos fi and fj as follows: ⎧ ⎨1 if f links to f or f links to f i j j i eij = ⎩0 otherwise
5. Experimental Results In this section, we present experimental results for actual Web videos to verify the accuracy and computational cost of our method. 5. 1 Datasets By using YouTube Data API (v3)∗7 , we built datasets as follows. First, we gave a keyword as a query and obtained the top 50 Web videos. Then we repeatedly obtained 10 Web videos contained in links, i.e., metadata “related videos” of the selected Web videos, until about 3000 Web videos were collected. We collected only Web videos with lengths of less than 1800 seconds and ones to which Freebase topics24) were attached. Note that Freebase is a knowledge database maintained by a community supported by Google, and YouTube videos are associated with their relevant topics, e.g., “Samsung Galaxy S5” and “Google Nexus”. The detailed conditions of each dataset are shown in Table 1. Here, we note that these datasets were collected in the different time and conditions from the datasets shown in our previous paper12) . Therefore, even if a query used to collect the dataset was the same, Web videos contained in each dataset were different from them used in the experiments shown in a paper12) . 5. 2 Evaluations In Fig. 1, we show the hierarchical structure of Web communities obtained by our method. From this figure, we can see that the hierarchical structure enables users to easily grasp an overview of many Web videos since Web videos containing varied topics are clustered into ones with similar topics. Next, we quantitatively evaluate the accuracy of our method. Note that we denote the proposed method by (Ours) in this evaluation. We compared (Ours) with the following methods.
i = 1, 2, · · · , N,
j = 1, 2, · · · , N.
(C2-1): This is a method based on the conventional method11) to extract the hierarchical structure of Web communities. Although the method11) does not use modularity originally, we introduced modularity into this method for comparison in this experiment. This method performs screening of Web videos to select about 1000 representative Web videos in a dataset. (C2-2): This is a same method as (C2-1), but this method performs screening of Web videos to select about 2000 representative Web videos in a dataset. (C3-1): This is a conventional method12) to efficiently extract the hierarchical structure of Web communities. This method performs calculation of sub-sampled CCAbased link relationships between Web videos as in our method, but screening of Web videos is performed to select about 1000 representative Web videos in a dataset. (C3-2): This is a same method as (C3-1), but this method performs screening of Web videos to select about 2000 representative Web videos in a dataset. (C4): This is a conventional method6) that extracts Web communities via Web video features and metadata “related videos”; however, this method does not extract the hierarchical structure. (C5): This is a method based on a work5) with a Web community extraction scheme by affinity propagation25) . This method uses only Web video features without the use of metadata “related videos”, and the hierarchical structure is not explored. Note that we do not use the original similarities in a paper5) but use our proposed similarities in this experiment.
(C1): This is a method to extract the hierarchical structure of Web communities in the same manner as ∗7
48 48 48 48
https://developers.google.com/youtube/v3/
54
Paper » Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval
6PDUWSKRQH³$SSOHL3KRQH´ DQG³6DPVXQJ*DOD[\S5´ (282 Web videos)
Mobile device (541 Web videos)
Smartphone ³*DOD[\1RWH´ (144 Web videos) Universe (146 Web videos)
All videos in the dataset (͵ͲͲͲ Web videos)
Universe (313 Web videos) Unidentified Flying Object (35 Web videos)
Mobile device ³*RRJOH1H[XV´ (257 Web videos)
Mobile device (310 Web videos)
6PDUWSKRQH³6DPVXQJ *DOD[\1RWH(GJH´ (53 Web videos)
(a)
(b)
Sightseeing in Tokyo (136 Web videos)
Japanese culture (45 Web videos)
Sightseeing in Japan (318 Web videos)
Sightseeing in Norway (106 Web videos)
All videos in the dataset (͵Ͳͳͳ Web videos)
Sightseeing in Europe (226 Web videos)
7UDYHOZLWK³5LFNSteves´ (102 Web videos)
Sightseeing in Portugal (190 Web videos)
(c) Fig. 1
Table 2
(d)
(a) and (c): hierarchical structure of the three largest Web communities for datasets 1 and 2, respectively. We show only division points at which all of the Web communities contain more than 30 Web videos. Web communities shown in the second, third and fourth columns correspond to ones where q = 3, q = 2 and q = 1, respectively. (b) and (d): thumbnails of Web videos in each Web community for datasets 1 and 2, respectively. We located the top five ranked videos in the order of left to right.
where k is the number of Web videos provided as the retrieval results, Rk is the number of “relevant Web videos” within k Web videos of the retrieval results, xi is 1 if the i-th retrieved Web videos are “relevant Web videos” and 0 otherwise, and preci is the precision when i Web videos are retrieved. Here, Freebase topics related to each Web community were selected by human assessors, and then we defined each ground truth for the evaluation. Note that a few ground truths were defined on the basis of titles of Web videos since the number of their Freebase topics were too small to perform the evaluation. Table 3 shows the defined ground truths. Since it was difficult to evaluate retrieval results when all Web communities were selected, we performed evaluation as follows. First, we selected the three largest Web communities at the highest hierarchy. Then we retrieved Web videos by using Web communities in the lowest hierarchy corresponding to the selected ones to
Number of Web videos after screening of Web videos by (C2-1), (C2-2), (C3-1) and (C3-2).
Dataset Dataset Dataset Dataset
1 2 3 4
(C2-1) 987 920 981 1025
(C2-2) 2027 2013 1995 2003
(C3-1) 1196 842 1157 879
(C3-2) 2130 1986 2066 2009
Table 2 shows the number of Web videos after screening of Web videos by (C2-1), (C2-2), (C3-1) and (C3-2). For the evaluation, we used the following F-measure and average precision (AP@k): F-measure = AP@k =
2 × Recall × Precision , Recall + Precision
(12)
k 1 xi preci , Rk i=1
(13)
Num. of correctly retrieved Web videos , (14) Num. of relevant Web videos Num. of correctly retrieved Web videos , (15) Precision = Num. of retrieved Web videos Recall =
55
ITE Trans. on MTA Vol. 4, No. 1 (2016)
Table 3
Target 1 Target 2 Target 3 Target 4 Target 5 Target 6 Target 1 Target 2 Target 3 Target 4 Target 5
Target 1 Target Target Target Target
2 3 4 5
Target Target Target Target Target Target
1 2 3 4 5 6
Target 7
introduction of the method14) successfully enables accurate and efficient extraction of the hierarchical structure for Web video retrieval.
Ground truths for the quantitative evaluation. Web videos with the following Freebase topics or titles were defined as “relevant Web videos”.
(A) Dataset 1. Freebase topics related to “Apple iPhone” and “Samsung Galaxy S5”. Freebase topics related to “Samsung Galaxy Note”. Freebase topics related to “Universe”. Freebase topics related to “Unidentified Flying Object”. Freebase topics related to “Google Nexus” and “Products of Motorola, Inc.”. Titles related to “Samsung Galaxy Note Edge”. (B) Dataset 2. Freebase topics related to “Tokyo”. Freebase topics related to “Japanese”. Freebase topics related to “Norway”. Freebase topics related to “Travels with Rick Steves”. Freebase topics related to “Portugal”. (C) Dataset 3. Freebase topics related to “Apple iPhone” and “Samsung Galaxy S5”. Freebase topics related to “Google Nexus” and “Samsung Galaxy Note”. Freebase topics related to “Apple iPad”. Freebase topics related to “Apple, Inc.”. Freebase topics related to “Steve Jobs”. (D) Dataset 4. Freebase topics related to “Automobile of Ferrari”. Titles related to “Chris Harris on Cars”. Freebase topics related to “Automobile of Porsche”. Freebase topics related to “Automobile of Subaru”. Freebase topics related to “Automobile Cadillac”. Freebase topics related to “Auto Show”. Freebase topics related to “Automobile Mercedes-Benz”.
6. Conclusions In this paper, we have proposed an accurate and efficient method to extract hierarchical structure of Web communities for Web video retrieval. First, we derived CCA-based link relationships that represent similarities between latent features of Web videos by using visual, audio and textual features. Here, similarity calculation with low computational cost can be realized on the basis of efficient CCA, named sub-sampled CCA. Moreover, we enabled application of the algorithm based on recursive modularity optimization by constructing a graph whose nodes are Web videos via the obtained link relationships. Then it became feasible to extract the hierarchical structure of Web communities. Here, unlike the previously reported methods that needed to perform screening of Web videos, it can be realized for the whole target Web videos since its algorithm enabled recursive reduction of the target nodes. Experimental results have verified that our method enabled extraction of the hierarchical structure with high accuracy as well as low computational cost.
Acknowledgement
perform evaluation. Tables 4 and 5 show F-measures and average precision when Web videos were retrieved. When comparing our method with (C1), we can see the effectiveness of our CCA-based link relationships via visual, audio and textual features. Also, since our method does not need to perform screening of Web videos by our efficient recursive processing unlike (C2-1), (C2-2), (C3-1) and (C3-2), evaluation results obtained by our method can outperform those obtained by these comparative methods. When comparing our method with (C4) and (C5), we can confirm that the use of the hierarchical structure enables accurate retrieval. Next, we show computational times for obtaining the hierarchical structure of Web communities in Table 6. From this table, the computational efficiency of our method can be confirmed. In particular, it is significant to realize the results although our proposed method does not perform screening of Web videos unlike (C2-1), (C2-2), (C3-1) and (C3-2). Consequently, we can show the effectiveness of introducing the method14) into Web video retrieval. In particular, although the method14) has not been proposed for Web video retrieval originally, we can confirm that
This work was partly supported by Grant-in-Aid for Scientific Research (B) 25280036, Japan Society for the Promotion of Science (JSPS), and Grant-in-Aid for Scientific Research on Innovative Areas 24120002 from the MEXT. References 1) V. Turner, J. F. Gantz, D. Reinsel, and S. Minton, “The digital universe of opportunities: Rich data and the increasing value of the internet of things,” IDC White Paper, URL http://idcdocserv.com/1678 (2014) 2) M. Haseyama, T. Ogawa, and N. Yagi, “A review of video retrieval based on image and video semantic understanding,” ITE Trans. MTA, 1, 1, pp. 2–9 (2013) 3) Y. Wang, M. Belkhatir, and B. Tahayna, “Near-duplicate video retrieval based on clustering by multiple sequence alignment,” in Proc. ACM Int. Conf. Multimedia, pp. 941–944 (2012) 4) U. Gargi, L. Wenjun, M. Vahab, and Y. Sangho, “Large-scale community detection on youtube for topic discovery and exploration,” in Proc. Int. AAAI Conf. Weblogs and Social Media, pp. 486–489 (2011) 5) A. Hindle, J. Shao, D. Lin, J. Lu, and R. Zhang, “Clustering web video search results based on integration of multiple features,” World Wide Web, 14, 1, pp. 53–73 (2011) 6) Y. Hatakeyama, T. Ogawa, S. Asamizu, and M. Haseyama, “A novel video retrieval method based on web community extraction using features of video materials,” IEICE Trans. Fundamentals, E92-A, 8, pp. 1961–1969 (2009) 7) C-W Ngo, T-C Pong, and H-J Zhang, “On clustering and retrieval of video shots,” ACM Int. Conf. Multimedia, pp. 51–60 (2001)
56
Paper » Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval
Table 4
Dataset 1
Dataset 2
Dataset 3
Dataset 4
Dataset 1
Dataset 2
Dataset 3
Dataset 4
Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target
1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 7
Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target
1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 7
Quantitative evaluation for Web video retrieval by F-measures. (A) Average values of F-measures. (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) 0.506 0.543 0.175 0.277 0.277 0.352 0.345 0.274 0.526 0.186 0.249 0.173 0.263 0.257 0.447 0.355 0.209 0.311 0.264 0.359 0.302 0.574 0.0679 0.108 0.0827 0.294 0.128 0.0596 0.417 0.510 0.224 0.226 0.180 0.229 0.215 0.544 0.284 0.136 0.107 0.0670 0.0931 0.0734 0.384 0.330 0.192 0.300 0.169 0.287 0.132 0.317 0.230 0.210 0.237 0.180 0.216 0.0873 0.471 0.460 0.251 0.502 0.229 0.265 0.170 0.629 0.403 0.582 0.242 0.0275 0.190 0.0880 0.400 0.381 0.155 0.395 0.158 0.334 0.248 0.501 0.486 0.244 0.348 0.278 0.410 0.414 0.494 0.382 0.202 0.322 0.214 0.185 0.204 0.285 0.225 0.0443 0.220 0.218 0.0777 0.0809 0.319 0.427 0.0673 0.191 0.0705 0.172 0.203 0.592 0.178 0.457 0.556 0.504 0.0757 0.0878 0.415 0.351 0.145 0.156 0.0966 0.136 0.126 0.611 0.332 0.152 0.119 0.0156 0.0474 0.0891 0.341 0.297 0.0834 0.0901 0.0127 0.120 0.119 0.364 0.134 0.268 0.213 0.189 0.221 0.0677 0.362 0.414 0.130 0.317 0.000117 0.0145 0.0396 0.307 0.274 0.216 0.216 0.0581 0.304 0.0560 0.552 0.563 0.120 0.202 0.200 0.227 0.207 (B) Maximum values of F-measures. (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) 0.757 0.776 0.284 0.386 0.417 0.445 0.511 0.412 0.742 0.249 0.316 0.250 0.312 0.483 0.693 0.548 0.456 0.541 0.439 0.581 0.693 0.786 0.119 0.198 0.137 0.405 0.171 0.271 0.567 0.745 0.286 0.288 0.234 0.273 0.436 0.844 0.354 0.206 0.148 0.0913 0.127 0.351 0.486 0.407 0.316 0.394 0.267 0.354 0.486 0.481 0.271 0.269 0.308 0.299 0.255 0.337 0.765 0.638 0.448 0.784 0.410 0.308 0.670 0.829 0.568 0.923 0.479 0.0476 0.291 0.838 0.646 0.646 0.261 0.664 0.277 0.476 0.512 0.763 0.774 0.384 0.539 0.486 0.508 0.579 0.651 0.543 0.291 0.423 0.297 0.219 0.511 0.500 0.276 0.0929 0.313 0.277 0.109 0.181 0.534 0.531 0.112 0.348 0.198 0.236 0.374 0.750 0.247 0.724 0.841 0.720 0.120 0.677 0.580 0.458 0.184 0.186 0.166 0.167 0.245 0.838 0.451 0.236 0.168 0.0333 0.0614 0.528 0.564 0.440 0.149 0.128 0.0886 0.158 0.200 0.500 0.196 0.364 0.300 0.256 0.316 0.337 0.516 0.565 0.174 0.406 0.00995 0.0290 0.100 0.464 0.473 0.392 0.392 0.0811 0.456 0.464 0.811 0.750 0.169 0.249 0.350 0.314 0.303
(C5) 0.0537 0.0540 0.0626 0.0644 0.0429 0.143 0.0737 0.0634 0.0590 0.0947 0.0390 0.0270 0.0671 0.0746 0.0285 0.0799 0.0441 0.0604 0.0491 0.0340 0.110 0.104 0.0244 (C5) 0.0973 0.0762 0.0973 0.130 0.0702 0.186 0.128 0.107 0.109 0.114 0.0660 0.0414 0.100 0.0909 0.0455 0.120 0.0561 0.0889 0.0561 0.0741 0.140 0.175 0.0381
Mech., P10008 (2008) 15) A. Nielsen, “Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data,” IEEE Trans. Image Processing, 11, 3, pp. 293–305 (2002) 16) A. Nagasaka and Y. Tanaka, “Automatic video indexing and fullvideo search for object appearances,” in Proc. IFIP 2nd Working Conf. Visual Database Systems, pp. 113–127 (1991) 17) J. Astola, P. Haavisto, and Y. Neuvo, “Vector median filters,” in Proc. IEEE, pp. 678–689 (Apr. 1990) 18) N. Nitanda and M. Haseyama, “Audio-based shot classification for audiovisual indexing using PCA, MGD and Fuzzy algorithm,” IEICE Trans. Fundamentals, E90-A, 8, pp. 1542–1548 (2007) 19) F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing Surveys, 34, 1, pp. 1–47 (2002) 20) D. Achlioptas, “Database-friendly random projections,” in Proc. ACM Symp. Principles of Database Systems, pp. 274–281 (2001) 21) J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, pp. 281–297 (1967) 22) A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and
8) C-W Ngo, T-C Pong, and H-J Zhang, “On clustering and retrieval of video shots through temporal slices analysis,” IEEE Trans. Multimedia, 4, 4, pp. 446–458 (2002) 9) C. Taskiran, J.-Y. Chen, A. Albiol, L. Torres, C. A. Bouman, and E. J. Delp, “Vibe: A compressed video database structured for active browsing and search,” IEEE Trans. Multimedia, 6, 1, pp. 103–118 (2004) 10) J. Sang and C. Xu, “Browse by chunks: Topic mining and organizing on web-scale social media,” ACM Trans. Multimedia Comput. Commun. Appl., 7S, 1, pp. 30:1–30:18 (2011) 11) R. Harakawa, Y. Hatakeyama, T. Ogawa, and M. Haseyama, “An extraction method of hierarchical web communities for web video retrieval,” in Proc. IEEE Int. Conf. Image Processing, pp. 4397– 4401 (2013) 12) R. Harakawa, T. Ogawa, and M. Haseyama, “An efficient extraction method of hierarchical structure of web communities for web video retrieval,” ITE Trans. MTA, 2, 3, pp. 287–297 (2014) 13) G. A. Miller, “Wordnet: A lexical database for english,” ACM Commun., 38, 11, pp. 39–41 (1995) 14) V. D. Blondel, J-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat.
57
ITE Trans. on MTA Vol. 4, No. 1 (2016)
Table 5
Quantitative evaluation for Web video retrieval by average precision, i.e., AP@k. We set k to the number of Web videos within Web communities used for retrieval.
Dataset 1
Dataset 2
Dataset 3
Dataset 4
Table 6
Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target Target
1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 7
(Ours) 0.947 0.604 0.929 0.867 0.783 0.999 0.526 0.518 0.989 0.912 0.983 0.949 0.744 0.449 0.963 0.802 0.521 0.926 0.927 0.704 0.721 0.630 0.916
(C1) 0.857 0.890 0.371 0.0648 0.919 0.253 0.317 0.191 0.546 0.490 0.993 0.996 0.813 0.189 0.537 0.123 0.421 0.336 0.320 0.106 0.626 0.701 0.816
(C2-1) 0.368 0.329 0.507 0.0989 0.308 0.121 0.266 0.269 1.00 0.946 0.985 0.704 0.246 0.0458 0.141 1.00 0.157 0.112 0.0851 0.426 0.521 0.757 0.225
(C2-2) 0.357 0.264 0.367 0.0597 0.202 0.0824 0.274 0.230 0.931 0.235 0.995 0.831 0.612 0.226 0.984 0.986 0.126 0.0762 0.0673 0.215 0.624 0.757 0.255
(C3-1) 0.486 0.225 0.721 0.312 0.209 0.0487 0.397 0.419 1.00 0.111 0.475 0.863 0.341 0.217 0.376 0.872 0.362 0.0455 0.0798 0.172 0.00595 0.131 0.452
(C3-2) 0.382 0.239 0.467 0.124 0.185 0.0762 0.304 0.183 0.293 0.216 0.434 0.509 0.158 0.0501 0.131 0.0483 0.103 0.0315 0.0874 0.377 0.0116 0.675 0.242
(C4) 0.441 0.389 0.758 0.195 0.360 0.197 0.479 0.268 0.530 0.856 0.501 0.543 0.440 0.0763 0.253 0.632 0.188 0.372 0.122 0.168 0.0403 0.298 0.280
(C5) 0.534 0.857 0.808 0.174 0.636 0.710 0.382 0.471 0.539 1.00 0.445 0.848 1.00 0.917 0.616 0.484 0.767 0.393 1.00 0.211 0.767 0.312 0.635
Computational time for obtaining the hierarchical structure of Web communities. We used a R computer with Intel CPU 3.07GHz and 16GB RAM.
(A) Extraction of Web video features. (The image size was 320 × 180 pixels and the frame rate was 24 fps. The audio signal was sampled at 22.05 kHz.) (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) (C5) Visual feature 21.1 (sec) — 21.1 (sec) 21.1 (sec) 21.1 (sec) 21.1 (sec) 21.1 (sec) 21.1 (sec) Audio feature 6.5 (sec) — 6.5 (sec) 6.5 (sec) 6.5 (sec) 6.5 (sec) 6.5 (sec) 6.5 (sec) Textual feature 0.01 (sec) — 0.01 (sec) 0.01 (sec) 0.01 (sec) 0.01 (sec) 0.01 (sec) 0.01 (sec) (B) Derivation of CCA-based link relationships between Web videos. (Ours), (C3-1) and (C3-2) include k-means clustering, but (C2-1), (C2-2), (C4) and (C5) do not. (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) (C5) Dataset 1 961.8 (sec) — 1849.6 (sec) 1849.6 (sec) 961.6 (sec) 961.6 (sec) 2413.4 (sec) 961.6 (sec) Dataset 2 1442.4 (sec) — 2345.0 (sec) 2345.0 (sec) 1442.2 (sec) 1442.2 (sec) 3090.1 (sec) 1442.2 (sec) Dataset 3 1307.9 (sec) — 2205.1 (sec) 2205.1 (sec) 1307.8 (sec) 1307.8 (sec) 2852.9 (sec) 1307.8 (sec) Dataset 4 949.4 (sec) — 1787.0 (sec) 1787.0 (sec) 949.3 (sec) 949.3 (sec) 2426.6 (sec) 949.3 (sec) (C) Screening of Web videos for extracting hierarchical structure of Web communities. Note that (C2-1), (C2-2), (C3-1) and (C3-2) need to perform this processing. (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) (C5) Dataset 1 — — 4.4 (sec) 4.4 (sec) 6704.8 (sec) 6705.0 (sec) — — Dataset 2 — — 4.3 (sec) 4.3 (sec) 6730.7 (sec) 6730.9 (sec) — — Dataset 3 — — 4.3 (sec) 4.3 (sec) 6578.0 (sec) 6578.2 (sec) — — Dataset 4 — — 4.4 (sec) 4.4 (sec) 6767.2 (sec) 6767.4 (sec) — — (D) Extraction of the hierarchical structure of Web communities. As for (C4) and (C5), computation time for obtaining not the hierarchical structure but Web communities is shown since the methods do not explore the hierarchical structure. (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) (C5) Dataset 1 2.0 (sec) 0.9 (sec) 103.3 (sec) 5266.2 (sec) 5.7 (sec) 22.6 (sec) 13.1 (sec) 185.7 (sec) Dataset 2 1.0 (sec) 0.5 (sec) 1282.3 (sec) 17386.1 (sec) 0.0 (sec) 38.0 (sec) 12.8 (sec) 211.6 (sec) Dataset 3 1.4 (sec) 0.8 (sec) 3712.3 (sec) 66911.0 (sec) 12.4 (sec) 70.6 (sec) 13.1 (sec) 157.0 (sec) Dataset 4 1.9 (sec) 0.8 (sec) 7721.9 (sec) 75572.7 (sec) 1.8 (sec) 238.3 (sec) 13.4 (sec) 146.0 (sec) (E) Total time for obtaining the hierarchical structure except for extraction of Web video features, i.e., the sum of (B), (C) and (D). (Ours) (C1) (C2-1) (C2-2) (C3-1) (C3-2) (C4) (C5) Dataset 1 963.8 (sec) 0.9 (sec) 1957.3 (sec) 7120.2 (sec) 7672.1 (sec) 7689.2 (sec) 2426.5 (sec) 1147.3 (sec) Dataset 2 1443.4 (sec) 0.5 (sec) 3631.6 (sec) 19735.4 (sec) 8172.9 (sec) 8211.1 (sec) 3102.9 (sec) 1653.8 (sec) Dataset 3 1309.3 (sec) 0.8 (sec) 5921.7 (sec) 69120.4 (sec) 7898.2 (sec) 7956.6 (sec) 2866.0 (sec) 1464.8 (sec) Dataset 4 951.3 (sec) 0.8 (sec) 9513.3 (sec) 77364.1 (sec) 7718.3 (sec) 7955.0 (sec) 2440.0 (sec) 1095.3 (sec)
R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Trans. Pattern Analysis and Machine Intelligence, 22, 12, pp. 1349–1380 (2000) 23) X. Cheng, C. Dale, and J. Liu, “Statistics and social network of youtube videos,” in Proc. IEEE Int. Workshop on Quality of Service, pp. 229–238 (2008) 24) K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor,
“Freebase: A collaboratively created graph database for structuring human knowledge,” in Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 1247–1250 (2008) 25) B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, 315, 5814, pp. 972–976 (2007)
58
Paper » Accurate and Efficient Extraction of Hierarchical Structure of Web Communities for Web Video Retrieval
Ryosuke Harakawa
received his B.S. and M.S. degrees in Electronics and Information Engineering from Hokkaido University, Japan in 2013 and 2015, respectively. He is currently pursuing a Ph.D. degree at the Graduate School of Information Science and Technology, Hokkaido University. His research interests include audiovisual processing and Web mining. He is a student member of the IEEE, IEICE, and Institute of Image Information and Television Engineers (ITE).
Takahiro Ogawa
received his B.S., M.S. and Ph.D. degrees in Electronics and Information Engineering from Hokkaido University, Japan in 2003, 2005 and 2007, respectively. He is currently an assistant professor in the Graduate School of Information Science and Technology, Hokkaido University. His research interests are multimedia signal processing and its applications. He has been an Associate Editor of ITE Transactions on Media Technology and Applications. He is a member of the IEEE, EURASIP, IEICE, and Institute of Image Information and Television Engineers (ITE).
Miki Haseyama received her B.S., M.S. and Ph.D. degrees in Electronics from Hokkaido University, Japan in 1986, 1988 and 1993, respectively. She joined the Graduate School of Information Science and Technology, Hokkaido University as an associate professor in 1994. She was a visiting associate professor of Washington University, USA from 1995 to 1996. She is currently a professor in the Graduate School of Information Science and Technology, Hokkaido University. Her research interests include image and video processing and its development into semantic analysis. She has been a Vice-President of the Institute of Image Information and Television Engineers, Japan (ITE), an Editor-in-Chief of ITE Transactions on Media Technology and Applications, a Director, International Coordination and Publicity of The Institute of Electronics, Information and Communication Engineers (IEICE). She is a member of the IEEE, IEICE, Institute of Image Information and Television Engineers (ITE) and Acoustical Society of Japan (ASJ).
59