FA-MCADF - FIU School of Computing and Information Sciences

FA-MCADF: Feature Affinity based Multiple Correspondence Analysis and Decision Fusion Framework for Disaster Information Management Haiman Tian, Shu-Ching Chen School of Computing and Information Sciences Florida International University Miami, FL 33199, USA {htian005, chens}@cs.fiu.edu

Abstract—Multimedia semantic concept detection is one of the major research topics in multimedia data analysis in recent years. Disaster information management needs the assistance of multimedia data analysis to better utilize those disasterrelated information, which has been widely shared by people through the Internet. In this paper, a Feature Affinity based Multiple Correspondence Analysis and Decision Fusion (FAMCADF) framework is proposed to extract useful semantics from a disaster dataset. By utilizing the selected features and their affinities/ranks in each of the feature groups, the proposed framework is able to improve the concept detection results. Moreover, the decision fusion scheme further improves the accuracy performance. The experimental results demonstrate the effectiveness of the proposed framework and prove that the fusion of the decisions of the basic classifiers could make the framework outperform several existing approaches in the comparison. Keywords-Multimedia Semantic Concept Detection; Multiple Correspondence Analysis (MCA); Feature Affinity based MCA (FAMCA); Disaster Information Management;

I. I NTRODUCTION Disasters, in most cases, disrupt a community seriously with human, economic, and environmental losses [1][2]. A natural catastrophe or accident produces a great amount of time-sensitive information [3]. The abilities to collect, analyze, and manage such information can benefit the society in decision-making and rehabilitation since not only the hazard status but also the preparation or recovery processes are critical to the populace and the community [4][5]. By witnessing an exponential growth of multimedia data (including video, audio, image, and text) recently, using the visual data that carries a variety of rich semantic information becomes popular in conquering the challenges in disaster information management. During a disaster event, the advances and popularity of electronic and mobile devices enable the capturing of a large amount of disaster-related multimedia data[6]. How to effectively and efficiently extract useful information from such disaster-related multimedia data to provide situation awareness information to the general public and the personnel in the Emergency Operations Center (EOC) has become

Stuart H. Rubin, William K. Grefe Space and Naval Warfare Systems Center Pacific San Diego, CA 92152-5001, USA {stuart.rubin, william}@navy.mil

more and more important. Video semantic concept detection, which aims to explore the rich information in videos, uses various machine learning and data mining approaches to address this challenge [7][8][9][10][11][12][13]. In addition, there have been efforts to better bridge the semantic gap between the low-level visual features and the high-level concepts in the literature [14][15][16][17][18][19][20]. Not being restricted to the disaster classification tasks that attempt to classify the disaster scenes from non-disaster scenes, a variety of information relevant to a specified disaster can be utilized, including the hazard situation, recovery progress, disaster effects, and disaster prevention, to name a few. The difficulty increases since all those concepts are surrounding one major premise, which will immensely increase the similarity between the concepts. Benefited from the enhanced quality and increased resolution of multimedia data, a large number of features can be extracted and utilized to improve the accuracy of semantic concept detection. Though feeding these features to a powerful classifier could improve the results, it may not be an optimal one and as a result, the computational complexity will increase significantly as well. In the literature, various classifiers have been used to identify the inherent concepts in videos, including Artificial Neural Networks (ANNs) [21], Logistic Regression (LR) [22], Decision Trees (DTs) [23], Support Vector Machines (SVMs) [24], etc. Besides performing as single classifiers, SVMs are also considered as good candidates for the choice of basic classifiers that achieve multiple decision fusion tasks. However, there is still a large space for improvements. To address such challenges, a Feature Affinity based Multiple Correspondence Analysis and Decision Fusion (FA-MCADF) framework is proposed in this paper. In the proposed framework, the Feature Affinity based MCA (FA-MCA) algorithm is first introduced as an individual classifier that outperforms other machine learning algorithms in the disaster-related concept detection tasks. The low-level features are fused into one group after the feature extraction phase and a feature selection method is applied in the FAMCA model to deal with the high dimensional feature sets.

After building a tree-like structure that demonstrates the feature affinities, a weighting function that considers the affinity relationship among the ranks of the features and the number of features at the same rank is developed to improve the MCA algorithm. Furthermore, it is adopted as a basic classifier that can be simultaneously applied to separated feature groups, which reduced the complexity and the computational time. In addition, it has an automatic process to moderate how the weight of a feature dominates the other features. The important relationships among features within each feature group are preserved in FA-MCADF without being affected or counteracted by other representations of features that are less correlated globally. The rest of this paper is organized as follows. Section II discusses the related work in multimedia data analysis. Section III introduces the proposed framework and discusses each component in details. In section IV, the experimental results and observations are presented. Finally, the last section presents the conclusion and future work. II. R ELATED W ORK Multimedia data analysis has been widely used in a variety of application domains that need to process and manage huge amounts of raw multimedia data, typically represented by a group of low-level features [25][26][27][28][29][30]. The low-level features are image descriptors of the visual properties that are extracted directly from the images without any object description [31][32]. The features are converged into a single form for the sack of storage with diversified representatives and can assist the content analysis afterward. On the other hand, high-level features or concepts that contain the semantic information can be acquired from the low-level features using some data analytic approaches. In order to utilize these low-level features to characterize high-level semantic concepts, various approaches have been developed, including feature selection [7][33][34], classifier selection [35][36][37][38], and decision fusion [39][8]. Due to the advances in technologies that greatly improve the quality of the recorded multimedia data, higher resolution data is widely used to further improve the analysis results. However, the more features learned from the data, the more computational time it will need, which slowers the analysis process. For most of the multimedia applications, especially in the current big data era, the dimension of the features is very high and thus feature selection is commonly applied to reduce the feature dimension to make the learning more efficient [40][41]. After feature selection, many classification algorithms can be used to detect the high-level semantic concepts such as ANNs, LR, DTs, SVMs, and Multiple Correspondence Analysis (MCA) [24][42][23][43]. The first two algorithms determine the parameters by maximum-likelihood estimation and calculate the probability of each class. MCA has been used as a classifier by calculating the correlations between the

features and the classes. DTs that use the information gain values to generate the tree structure are another commonly used classifier. However, while building up the decision tree of each decision direction, the features are considered as independent. SVMs can bound the generalization error and build consistent estimators from data. Decision fusion is the last step before printing out the merged classification results. It commonly uses non-linearly weighted summation methodologies to explore the interdependencies among multiple classifiers. Decision fusion frameworks are widely employed for multi-modality, multitemporal, and/or multi-spatial feature classification problems. In our proposed FA-MCADF framework, FA-MCA is used as the basic classifier and the affinity relationship between the tying features is considered to enhance the classification effectiveness by using conditional weighting functions. It is a scalable framework that accepts a flexible number of feature groups and evaluates the reliabilities of the basic classifiers basing on the evaluation of every learning process. The features are separated into different groups base on the representation levels (e.g., color space, object space, etc.). III. T HE P ROPOSED F RAMEWORK The overall framework is illustrated in Figure 1. It includes four major steps: pre-processing (the upper right panel), training phase (the middle left panel), testing phase (the lower right panel), and the decision fusion scheme for the final classification. The pre-processing the phase includes key frame extraction and feature extraction, which make the data cleaned and structured. In the training phase, the model is trained using the FA-MCA algorithm (details are depicted in the upper right corner) for each structural feature group individually. The feature affinities are calculated and applied to the final weight as a factor which will be used in the testing phase to classify the testing instances. The proposed framework considers the feature selection procedure as well as the relationship between the features. It will further affect the final weight of each feature and moderate the bias of the classification results. Nevertheless, by distributing the feature set (with an enormously high dimension) into several feature groups (with smaller dimensions) based on the different representation levels (e.g., color space, object space, etc.), a closer dependency analysis on the relationships among the features within each group and between groups can be conducted. For example, from the color space to the object space, the feature groups form a flat structure, indicating that each group is self-structured and relatively independent. A. Pre-processing The pre-processing phase is typically domain-specific. In video analysis, each video is processed independently

! Input:'Raw'videos' '

Training!Data!

Feature!Selection!

Pre3processing:'

Hierarchical!Information! Gain!Analysis!

Key!Frame!Extraction!

!

Feature'Extraction'

Feature!Affinity! Calculation!

HSV!

YCbCr!

FA8MCA!Model!

Color!space!

Training!

Training:' ' FA3MCA'models'

HOG!

Haar8like!

CEDD!

Testing:'

Classifications' Decision' Fusion' Scheme' ' '' Figure 1.

Illustration of the FA-MCADF framework

to extract several low-level features as frame-based. To reduce the redundancy of the frames in each video, the raw videos are grouped into different video shots [44][45]. A key frame from each video shot is selected to represent the video shot, and all the selected key frames are then used to cover the general idea of the video. This can reduce the computation time significantly. In this paper, several different types of low-level visual features are extracted from the raw data, which include Histogram of Oriented Gradient (HOG) [46], Color and Edge Directivity Descriptor (CEDD) [47], Haar-like feature [48], and color space information [49][50]. Specifically, HOG feature is used for the purpose of object detection, which is computed on a dense grid of uniformly spaced cells and uses overlapping normalization for accuracy improvement. CEDD feature, as it is named, obtains color information and texture information. The haarlike feature is always used in object recognition with Haar wavelets, especially useful in face detection. The color space representations are the Hue, Saturation, and Value (HSV)

with YCbCr as the supplemental information. As a result, one video is represented by several key frames, and each key frame is composed by several feature values. Hence, the dataset consists of data instances at the frame level with the binary class information. The finalized dataset is then split into training and testing sets using three-fold crossvalidation [51] based on the count of videos. In other words, a set of key frame instances that belong to the same video is assigned to either the training dataset or testing dataset during the separation. B. Training Phase In the training phase, there are two key components: feature selection and feature affinity calculation. The FAMCA model includes the chi-squared test [52] to evaluate and select the most representable feature values if the dataset is in a very high dimension. By building up a decision tree structure that uses the reduced features, the useful information and positions are stored and utilized in the

feature affinity calculation component. The proposed feature affinity calculation component assigns the weight of each feature based on the position of the feature (depthi ) in the tree structure. Furthermore, the number of features at the same depth in the tree is also considered as useful information. By considering the number of features at the same depth, the weight assigning to each feature at the same rank will be reduced. It is obvious that the feature, which holds the rank by itself, should be more valuable than those features at the same depth. Unlike the information gain in the original decision tree algorithm, the relationships between features in the structure are preserved after the feature selection component to make the final MCA weight generation multivariate. As a result of feature selection, the total number of useful features in the training phase decreases. The number is also considered while calculating the feature affinity (F Ai ) for feature index (i). However, instead of considering only the ratio of feature reduction, the proposed feature affinity calculation component utilizes the position of the feature to eliminate the effect. (as shown in Equation (1)). It will be directly applied to the feature that is responsible for the decision in a certain level only by itself. In other words, there is no other feature competing with the current one while making the decision. Let Iorig and I be the total number of features before and after feature selection, respectively. The natural logarithm is used to obtain a simpler derivative under the curve y = 1/x. F Ai

=

Share F Ai

=

Iorig 1 + . loge (depthi + 1) I F Ai . # of f eatures in depthi

(1) (2)

For each selected feature, the feature index (i) and the feature level (depthi ) are recorded. They are reused here for the feature weight calculation. The number of features holding the same rank will be counted to evaluate how those features in that rank dominate the other features. In brief, dividing the count will decrease the respective affinity. Such modification is shown in Equation (2). The feature affinity is supposed to improve the final classification results due to the deep observation of the correlations between features. That is, the relationship between the features plays an important role to make the feature domination consistency. Without such information, each feature that is considered as independent will enlarge the weighting effect. The most direct influence is that more instances will be classified to be either positive or negative during the testing phase since some features are overweighted. By integrating the feature affinity with the MCA algorithm, the final weighting function of the MCA algorithm is thus modified. For the details of how to generate the original MCA weighting matrix, please refer to [42]. After

selecting each feature to calculate the MCA weight, a 3-D matrix (M W ) is generated as a form of feature-value pairs. For each pair of the feature and class, the final weight is multiplied by the feature affinity. The function is shown in Equation (3). M Wic , ϕ = M Wic , ϕ ∗ Share F Ai ,

(3)

where c represents the class of the instance, and ϕ represents the feature value. Similar to Equation (1), i0 indicates the feature index after feature selection. C. Testing phase The final weighting matrix generated during the training phase is used in the testing phase in order to get the final ranking scores for the testing instances. Those ranking scores are responsible for predicting the concept class. The ranking procedure starts with adding all feature weights for instance t, and calculates its average [42]. For classification, all the ranking scores of the testing instances are sorted in the descending order, and the top instances are selected with the best selection threshold [53]. Since the testing phase sums up the feature weights learned from the training phase, the proposed feature affinity calculation will make the final weight of each feature more durable and improve the testing results. D. Decision fusion In the decision fusion stage, the classification results from several FA-MCA classifiers are fused. The criterion that decides the best threshold in the testing phase is used here to evaluate the reliability of each training model. The F1 scores (sf ) [54] calculated during the training phases are optimized by the best threshold selection. The average F1 score (¯ s) and the standard deviation (stdF ) accumulated by the F feature groups decide whether the specific group of features is a good representative of the concept. The Bessel’s correction [55] is applied to the standard deviation calculation as shown in Equation (4). F is also the number of the basic FA-MCA classifiers which correspond to the F feature groups. v u F u 1 X stdF = t (sf − s¯)2 . (4) F −1 f =1

In the proposed framework, each feature group is considered as an equally contributed input to the final decision. Meanwhile, the uncertainty of the contribution for each representation space makes the fusion scheme flexible, as illustrated in Equation (5) and Equation (6). P n0 is the final label prediction set for each testing data (te). If the instance is predicted as negative in any feature group, the prediction value will be added to the output set P n0 that takes the zscore value (γf ) and the sum of the values of all the basic classifiers. For every testing instance, each FA-MCA classifier produces one prediction result. The prediction result is a

binary value, either 1 representing negative (not belonging to the concept of interest) or 0 representing positive (belonging to the concept of interest). For example, if there are four basic classifiers that classify one instance as negative, the final score will be summed up to at least four, since a smaller absolute z-score value of a specific classifier represents a higher reliability. In addition, since the 99.7% confidence interval is represented between the z-score values of -3 and 3 [56], the α value is set to 3.5 empirically to eliminate the effect of abnormal values and keep as much information as possible. P n0

te

=

F X

(P nte f + γf )

(5)

f =1

( γf =

P nte f F +|zscoref | P nte f

|zscoref | ≤ α otherwise

(6)

The decision fusion scheme mainly focuses on better predicting the negative instances, as we would like to keep as many positive instances as possible. Algorithm 1 illustrates the idea of how to utilize the prediction results based on the normal distribution among all basic classifiers. The prediction set P n is a T e × F matrix which includes F prediction results for T e testing instances. The z-scores of the basic classifiers are calculated in line 4 to decide the reliabilities. After accumulating the prediction results in line 6, the final decisions are made though a threshold β calculation using Equation (7) in line 9. When s¯ is close to 1, the final classification result is considered to be trustable with a lower accumulated value. By setting the z-score value (z) to -2 in Equation (7), the mean value is shifted to the left with 2 standard deviation values. That is the smallest value between 0 and 1 in the 95% confidence interval and is considered as a fault tolerance number. This number is applied to half of the classifiers in order to decide the threshold of the summation, which indicates the negative label. Namely, at least half of the classifiers should classify an instance as negative with a better z-score when the instance is a negative instance. β=F−

(¯ s + z ∗ stdF ) F (¯ s + z ∗ stdF ) ∗ =F− (7) F 2 2 IV. E XPERIMENTAL A NALYSIS

A. Dataset Description Although the MCA-based framework can be used as a general framework that works for various multimedia application domains, in this paper, the specific task of detecting disaster-related semantic concepts is selected using a dataset obtained from the Federal Emergency Management Agency (FEMA) website. Since the semantic concepts obtained from this website are different from the normal disaster event

Algorithm 1 Decision Fusion Scheme Input: The negative label prediction set P n of each feature group F {P nte f |f = 1, · · · , F ; te = 1, · · · , T e}; the training set F1 score set s= {sf |f = 1, · · · , F }; the average F1 score s¯; and the F1 scores’ standard deviation stdF . Output: The combined negative label prediction set P n0 te {P n0 |te = 1, · · · , T e} 1: procedure DF CAL (P n, s, s ¯, stdF ) 2: for all P nte f (f = 1, · · · , F ) do 3: //Calculate each F1 score’s z-score 4: zscoref = (sf − s¯)/stdF ; 5: for all P nte f (te = 1, · · · , T e) do te 6: Calculate P n0 using Equations (5) and (6); 7: Calculate β using Equation (7); te 8: for all P n0 (te = 1, · · · , T e) do 0 te 9: if P n ≥ β then te 10: P n0 is negative; 11: else te 12: P n0 is positive; te 13: return P n0

concepts, it is more useful to examine the effectiveness of the proposed FA-MCADF framework. No. 1 2 3 4 5 6 7

Concepts Flood Human Relief Damage Training Program Disaster Recovery Speak Interview Total

Positive Instances 258 92 281 148 369 1230 117 2495

Videos 21 4 21 7 16 145 23 237

Table I DATASET STATISTICS

The dataset contains over 200 videos and thousands of key frames that are related to seven different concepts. However, there are still many similarities between some of the concepts. The statistics information is shown in Table I which depicts the name, number of positive instances, and number of videos of each concept. When the similarity between concepts increases, the task of concept detection becomes more challenging. Meanwhile, the weight generation of each feature needs a higher accuracy to improve the training and testing performance. These are the reason and motivation for proposing the FA-MCADF framework. As mentioned in Section III-A, the dataset is split using three-fold crossvalidation based on the number of videos. In other words, the entire data set is divided into 3 different folds with approximately 1/3 of the videos (one fold) for testing and 2/3 of the videos (two folds) for training purpose.

Figure 2. Flood RBF Network SVM Decision Tree Logistic Regression FA-MCA

FA-MCADF

Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1

70.07% 51.30% 36.83% 70.17% 46.20% 29.47% 70.13% 45.23% 29.77% 70.23% 58.23% 42.97% 70.25% 61.67% 46.62% 45.26% 85.84% 50.90%

Number of True Positives obtained from each classifier Human Relief 1.10% 33.33% 2.17% 1.67% 32.33% 3.17% 1.43% 32.33% 2.73% 67.87% 38.07% 11.13% 34.18% 30.52% 24.45% 34.53% 49.29% 28.08%

Damage 71.77% 78.60% 62.47% 71.77% 65.97% 52.57% 72.10% 61.40% 49.13% 71.77% 63.73% 50.60% 71.91% 72.92% 64.80% 71.77% 96.06% 73.49%

Training Program 35.20% 34.87% 6.47% 1.87% 33.33% 3.53% 68.60% 40.87% 17.20% 68.60% 49.23% 32.97% 68.61% 65.48% 45.07% 68.53% 61.11% 42.43%

Diasater Recovery 36.97% 47.60% 26.53% 70.33% 66.87% 50.87% 70.37% 61.67% 46.47% 71.17% 58.17% 47.93% 70.32% 68.90% 52.69% 70.30% 97.48% 71.91%

Speak

Interview

Average

78.23% 99.93% 86.07% 82.93% 88.57% 82.27% 82.93% 81.37% 77.67% 82.97% 81.00% 77.47% 82.91% 97.78% 87.42% 82.93% 99.85% 88.45%

35.50% 40.27% 15.53% 68.83% 40.70% 16.83% 68.77% 40.53% 18.90% 68.83% 44.33% 22.80% 68.79% 48.95% 29.54% 68.82% 78.63% 57.98%

46.98% 55.13% 33.72% 52.51% 53.42% 34.10% 62.05% 51.91% 34.55% 71.63% 56.11% 40.84% 66.71% 63.74% 50.08% 63.16% 81.18% 59.03%

Table II P ERFORMANCE EVALUATION RESULTS ON A DISASTER DATASET

B. Evaluation Results The performance evaluation takes the precision, recall, and F1-score values as the criteria [54]. Table II presents the experimental results in details, while the proposed FAMCA algorithm shows the best performance on average in comparison with ANNs, SVMs, DTs, and LR classifiers (available in WEKA [57]). All the classifiers are tuned to achieve their best performance during the experiment, and the results are ordered by the average F1-scores (the last column in Table II). SVMs and ANNs are two examples of black-box models that can only be verified externally. They are always popularly used in different domains where good

classification performance is preferred. However, from the evaluation results, it can be interpreted that their discriminating power is not significantly better than the other models, which means for this specific dataset, a more accurate model is needed to differentiate the concepts. The Radial Basis Function (RBF) network is selected as a representative of ANNs since it performs better than other ANNs classifiers on this specific dataset. As can be inferred from this table, the improvement of the average F1-score of FA-MCA is around 10% when comparing to LR, which achieves promising results in comparison with the other classifiers. LR is a statistical method that is always compared to the ANNs

models in many classification tasks. It shows its capability of handling a dataset with a small number of positive instances (i.e., imbalanced data). On the contrary, the RBF Network reaches 86.07% for concept “Speak”, which is the most balanced concept, but it is still 1% worse than FAMAC. Compared with the other machine learning methods mentioned here, DTs take the information gain values as the common criterion and have the advantage that each tree can easily be expressed as rules. FA-MAC also takes the information gain values as one of the feature selection criterion and avoids the disadvantage of DTs, which is losing information along the splitting process. In addition, The FAMAC algorithm shows significant improvements (12% and 13%, respectively) on the complicated semantic concepts such as “Human Relief” and “Training Program”. In the last three rows of Table II, the FA-MCADF framework is used to boost the performance by separating more than 700 dimensions of features into four feature groups (illustrated in Figure 1) that run FA-MCA models independently. Each FA-MCA model handles approximately 1/4 of the features, thus speeding up the learning process. A decision fusion scheme (Algorithm 1) is proposed as the final step to generate the final classification decision. The FA-MCADF framework achieves the best in all the evaluation metrics in comparison to the other classifiers in the experiments. The average recall and F1-score values elevate 17.44% and 8.95% more in comparison to the single FA-MCA model. V. C ONCLUSION AND F UTURE W ORK Disaster-related concept detection does not limit to disaster events. It also includes various concepts that are critical disaster information, such as disaster preparation training, disaster recovery, and disaster damage situation. Since the correlations between those concepts are higher than diverse disaster events, it makes the classification task more challenging. To tackle this challenge, the FA-MCADF framework is proposed to consider the relationship between features within each feature group to eliminate the situation when some features dominate during the feature weight generation process. As a result, critical features are selected and weighted based on their ranks. The decision fusion scheme allows a scalable number of feature groups to run the classifiers separately, which reduces the negative effect among the features that belong to different representation levels. Comparing with the decision tree and SVM classifiers, the experimental results show significant improvements for all the evaluation criteria, which means that the proposed framework truly holds the importance of the features when detecting the interrelated concepts. However, there is still some improvements that can be further carried out. In the future, this framework will be further extended and tested for more concept detection applications. Multimodality features include high-level features, like audio,

spatio-temporal and textual information, also can be included to improve the concept detection performance [58]. In addition, the latest cluster computing techniques (i.e., Apache Spark) can be included to build up a parallel framework to reduce the computation time, which is worth considering when processing large datasets [59][60]. R EFERENCES [1] D. Zhang, L. Zhou, and J. F. Nunamaker Jr, “A knowledge management framework for the support of decision making in humanitarian assistance/disaster relief,” Knowledge and Information Systems, vol. 4, no. 3, pp. 370–385, 2002. [2] L. Zheng, C. Shen, L. Tang, C. Zeng, T. Li, S. Luis, and S.-C. Chen, “Data mining meets the needs of disaster information management,” IEEE Transactions on Human-Machine Systems, vol. 43, pp. 451–464, 2013. [3] S.-C. Chen, M. Chen, N. Zhao, S. Hamid, K. Chatterjee, and M. Armella, “Florida public hurricane loss model: Research in multi-disciplinary system integration assisting government policy making,” Government Information Quarterly, vol. 26, no. 2, pp. 285–294, 2009. [4] H. Tian and S.-C. Chen, “A video-aided semantic analytics system for disaster information integration,” in Proceedings of the Third IEEE International Conference on Multimedia Big Data. IEEE, 2017, pp. 242–243. [5] L. Zheng, C. Shen, L. Tang, T. Li, S. Luis, S.-C. Chen, and V. Hristidis, “Using data mining techniques to address critical information exchange needs in disaster affected public-private networks,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2010, pp. 125–134. [6] L. Zheng, C. Shen, L. Tang, T. Li, S. Luis, and S.-C. Chen, “Applying data mining techniques to address disaster information management challenges on mobile devices,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011, pp. 283–291. [7] J. Fan, H. Luo, J. Xiao, and L. Wu, “Semantic video classification and feature subset selection under context and concept uncertainty,” in Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 2004, pp. 192–201. [8] Y. Wu, E. Y. Chang, K. C.-C. Chang, and J. R. Smith, “Optimal multimodal fusion for multimedia data analysis,” in Proceedings of the 12th Annual ACM International Conference on Multimedia. ACM, 2004, pp. 572–579. [9] M.-L. Shyu, S.-C. Chen, and C. Haruechaiyasak, “Mining user access behavior on the www,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, vol. 3. IEEE, 2001, pp. 1717–1722. [10] T. Meng and M.-L. Shyu, “Leveraging concept association network for multimedia rare concept mining and retrieval,” in Proceedings of the IEEE International Conference on Multimedia and Expo, Melbourne, Australia, July 2012.

[11] C. Chen, Q. Zhu, L. Lin, and M.-L. Shyu, “Web media semantic concept retrieval via tag removal and model fusion,” ACM Transactions on Intelligent Systems and Technology, vol. 4, no. 4, pp. 61:1–61:22, October 2013. [12] X. Huang, S.-C. Chen, M.-L. Shyu, and C. Zhang, “User concept pattern discovery using relevance feedback and multiple instance learning for content-based image retrieval,” in Proceedings of the Third International Workshop on Multimedia Data Mining, in conjunction with the 8th ACM International Conference on Knowledge Discovery and Data Mining, July 2002, pp. 100–108. [13] X. Chen, C. Zhang, S.-C. Chen, and S. Rubin, “A humancentered multiple instance learning framework for semantic video retrieval,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 39, no. 2, pp. 228–233, 2009. [14] S.-C. Chen, “Multimedia databases and data management: a survey,” International Journal of Multimedia Data Engineering and Management, vol. 1, no. 1, pp. 1–11, January-March 2010. [15] M.-L. Shyu, S.-C. Chen, and R. L. Kashyap, “Generalized affinity-based association rule mining for multimedia database queries,” Knowledge and Information Systems, vol. 3, no. 3, pp. 319–337, 2001. [16] Q. Zhu, L. Lin, M.-L. Shyu, and S.-C. Chen, “Effective supervised discretization for classification based on correlation maximization,” in Proceedings of the IEEE International Conference on Information Reuse and Integration, 2011, pp. 390–395. [17] L. Lin, M.-L. Shyu, G. Ravitz, and S.-C. Chen, “Video semantic concept detection via associative classification,” in Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 2009, pp. 418–421. [18] X. Chen, C. Zhang, S.-C. Chen, and M. Chen, “A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval,” in Proceedings of the Seventh IEEE International Symposium on Multimedia, Dec 2005, pp. 37–44.

[23] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Transactions on Systems Man and Cybernetics, pp. 660–674, 1990. [24] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [25] S.-C. Chen, M.-L. Shyu, C. Zhang, and R. L. Kashyap, “Identifying overlapped objects for video indexing and modeling in multimedia database systems,” International Journal on Artificial Intelligence Tools, vol. 10, no. 04, pp. 715–734, 2001. [26] X. Huang, S.-C. Chen, M.-L. Shyu, and C. Zhang, “User concept pattern discovery using relevance feedback and multiple instance learning for content-based image retrieval.” in Proceedings of the MDM/KDD. Citeseer, 2002, pp. 100–108. [27] Z. Ma, F. Nie, Y. A. G. Hauptmann, for multimedia data Multimedia, vol. 14,

Yang, J. R. Uijlings, N. Sebe, and “Discriminating joint feature analysis understanding,” IEEE Transactions on no. 6, pp. 1662–1672, 2012.

[28] P. O. Nunally and D. R. MacCormack, “Multimedia data analysis in intelligent video information management system,” Mar. 7 2000, uS Patent 6,035,341. [29] S.-C. Chen, M.-L. Shyu, C. Zhang, and R. L. Kashyap, “Identifying overlapped objects for video indexing and modeling in multimedia database systems,” International Journal on Artificial Intelligence Tools, vol. 10, no. 4, pp. 715–734, 2001. [30] S.-C. Chen, S. Sista, M.-L. Shyu, and R. Kashyap, “Augmented transition networks as video browsing models for multimedia databases and multimedia information systems,” in Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, 1999, pp. 175–182. [31] H. A. Elnemr, N. M. Zayed, and M. A. Fakhreldein, “Feature extraction techniques: Fundamental concepts and survey,” Handbook of Research on Emerging Perspective in Intelligent Pattern Recognition Analysis and Image Processing, 2015.

[19] S.-C. Chen and R. Kashyap, “Temporal and spatial semantic models for multimedia presentations,” in Proceedings of the International Symposium on Multimedia Information Processing, 1997, pp. 441–446.

[32] M.-L. Shyu, K. Sarinnapakorn, I. Kuruppu-Appuhamilage, S.C. Chen, L. Chang, and T. Goldring, “Handling nominal features in anomaly intrusion detection problems,” in Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, 2005, pp. 55–62.

[20] S.-C. Chen, M.-L. Shyu, and C. Zhang, “An intelligent framework for spatio-temporal vehicle tracking,” in Proceedings of the 4th IEEE International Conference on Intelligent Transportation Systems, August 2001, pp. 213–218.

[33] Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe, “Feature selection for multimedia analysis by sharing information among multiple tasks,” IEEE Transactions on Multimedia, vol. 15, no. 3, pp. 661–669, 2013.

[21] L. Bruzzone and D. F. Prieto, “A technique for the selection of kernel-function parameters in rbf neural networks for classification of remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 2, pp. 1179– 1184, 1999.

[34] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Effective feature space reduction with imbalanced data for semantic concept detection,” in Proceedings of the IEEE International on Sensor Networks, Ubiquitous, and Trustworthy Computing, June 2008, pp. 262–269.

[22] X. Liao, Y. Xue, and L. Carin, “Logistic regression with an auxiliary data source,” in Proceedings of the 22nd International Conference on Machine Learning. ACM, 2005, pp. 505–512.

[35] S.-C. Chen, M.-L. Shyu, C. Zhang, and M. Chen, “A multimodal data mining framework for soccer goal detection based on decision tree logic,” International Journal of Computer Applications in Technology, vol. 27, pp. 312–323, 2006.

[36] D. Liu, Y. Yan, M.-L. Shyu, G. Zhao, and M. Chen, “Spatiotemporal analysis for human action detection and recognition in uncontrolled environments,” International Journal of Multimedia Data Engineering and Management, vol. 6, no. 1, pp. 1–18, 2015. [37] Q. Zhu, M.-L. Shyu, and S.-C. Chen, “Discriminative learning-assisted video semantic concept classification,” Multimedia Security: Watermarking, Steganography, and Forensics, p. 31, 2012. [38] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Video semantic concept discovery using multimodal-based association classification,” in Proceedings of the IEEE International Conference on Multimedia and Expo, July 2007, pp. 859–862. [39] B. Heisele, T. Serre, S. Prentice, and T. Poggio, “Hierarchical classification and feature reduction for fast face detection with support vector machines,” Pattern Recognition, vol. 36, no. 9, pp. 2007–2017, 2003. [40] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Contentbased multimedia information retrieval: State of the art and challenges,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 2, no. 1, pp. 1–19, 2006. [41] Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian, “Feature selection using principal feature analysis,” in Proceedings of the 15th ACM International Conference on Multimedia. ACM, 2007, pp. 301–304. [42] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Correlationbased video semantic concept detection using multiple correspondence analysis,” in Proceedings of the Tenth IEEE International Symposium on Multimedia. IEEE, 2008, pp. 316–321. [43] L. Lin and M.-L. Shyu, “Weighted association rule mining for video semantic detection,” International Journal of Multimedia Data Engineering and Management, vol. 1, no. 1, pp. 37–54, Jan. 2010. [Online]. Available: http://dx.doi.org/10.4018/jmdem.2010111203 [44] J. S. Boreczky and L. A. Rowe, “Comparison of video shot boundary detection techniques,” Journal of Electronic Imaging, vol. 5, no. 2, pp. 122–128, 1996. [45] S.-C. Chen, M.-L. Shyu, and C. Zhang, “Innovative shot boundary detection for video indexing,” in Video Data Management and Information Retrieval, S. Deb, Ed. Idea Group Publishing, 2005, pp. 217–236. [46] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE, 2005, pp. 886–893. [47] S. A. Chatzichristofis and Y. S. Boutalis, “Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval,” in Proceedings of the International Conference on Computer Vision Systems. Springer, 2008, pp. 312–322.

[48] R. Lienhart and J. Maydt, “An extended set of haar-like features for rapid object detection,” in Proceedings of the International Conference on Image Processing, vol. 1. IEEE, 2002, pp. I–900. [49] S. Sural, G. Qian, and S. Pramanik, “Segmentation and histogram generation using the hsv color space for image retrieval,” in Proceedings of the 2002 International Conference on Image Processing, vol. 2. IEEE, 2002, pp. II–589. [50] X. Li, S.-C. Chen, M.-L. Shyu, and B. Furht, “Image retrieval by color, texture, and spatial information,” in Proceedings of the 8th International Conference on Distributed Multimedia Systems, September 2002, pp. 152–159. [51] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Artificial Intelligence Journal (IJCAI), vol. 2, 1995, pp. 1137–1143. [52] H. Vafaie and I. F. Imam, “Feature selection methods: genetic algorithms vs. greedy-like search,” in Proceedings of the International Conference on Fuzzy and Intelligent Control Systems, vol. 51, 1994. [53] Y. Yang, H.-Y. Ha, F. Fleites, S.-C. Chen, and S. Luis, “Hierarchical disaster image classification for situation report enhancement,” in Proceedings of the IEEE International Conference on Information Reuse and Integration. IEEE, 2011, pp. 181–186. [54] D. M. Powers, “Evaluation: from precision, recall and fmeasure to roc, informedness, markedness and correlation,” International Journal of Machine Learning Technology, 2011. [55] W. J. Reichmann, Use and abuse of statistics. Penguin books, 1964. [56] F. Pukelsheim, “The three sigma rule,” The American Statistician, vol. 48, no. 2, pp. 88–91, 1994. [57] G. Holmes, A. Donkin, and I. H. Witten, “Weka: A machine learning workbench,” in Proceedings of the Second Australian and New Zealand Conference on Intelligent Information Systems. IEEE, 1994, pp. 357–361. [58] M.-L. Shyu, C. Haruechaiyasak, S.-C. Chen, and N. Zhao, “Collaborative filtering by mining association rules from user access sequences,” in Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration, April 2005, pp. 128–135. [59] M.-L. Shyu, T. Quirino, Z. Xie, S.-C. Chen, and L. Chang, “Network intrusion detection through adaptive sub-eigenspace modeling in multiagent systems,” ACM Trans. Auton. Adapt. Syst., vol. 2, no. 3, Sep. 2007. [Online]. Available: http://doi.acm.org/10.1145/1278460.1278463 [60] M.-L. Shyu, C. Haruechaiyasak, and S.-C. Chen, “Category cluster discovery from distributed www directories,” Journal of Information Sciences, vol. 155, pp. 181–197, 2003.