International Journal of Knowledge Engineering and Research
Vol 2 Issue 4 April 2013 ISSN 2319 – 832X
Knowledge Discovery from National Flag through Data Mining Approach M. A. H. Akhand#1, Al-Mahmud#2, Iqbal Hossain#, K. Murase*3 #
Dept. of Computer Science and Engineering Khulna University of Engineering & Technology Khulna, Bangladesh 1
2
[email protected] [email protected]
*
Dept. of Human and Artificial Intelligent Systems University of Fukui Fukui, Japan 3
[email protected]
Abstract— A national flag (NF) is the unique pictogram of a country’s identity of being an independent nation. Every country has a distinct national flag that symbolizes the country. Therefore, the colors and symbols of NF might have meanings indicating socio-cultural, historical, ideal and other features of a country. The aim of this study is to identify socio- cultural, historical, ideal features of a country on its flag through data mining approach. The study first identifies the features of the NF and makes a data set with all 191 countries. Then decision trees have been constructed with the data using Weka, the popular data mining tool for decision tree construction. The constructed decision trees reflect how similar religion, government condition, regional status of countries reflects on the features in NFs. Keywords- National Flag, Decision Tree, C4.5, and Weka.
I. INTRODUCTION A National Flag (NF) is a unique symbol of a country. Different sizes with different color combinations and various symbols in NFs makes them uniquely identified for each country [1]. Since the NF symbolizes a country, the colors and symbols of it might have meanings indicating socio-cultural, historical, ideal and other features of a country. As an example, the red disc in the NF of Bangladesh represents the sun rising over Bengal and also the blood of those who died for the independence of the country, the green field stands for the lushness of the land of Bangladesh [2-3]. There are 191 countries in the world and each one has a uniquely identifiable national flag [1].With one exception (NF of Nepal), the NFs are rectangular or square in shape. Colours and symbols in NF make uniquely identifiable; there are three colour combinations that are used on several flags in certain regions. Blue, white, and red is a common combination in Slavic countries such as the Czech Republic, Slovakia, Russia, Serbia, Slovenia, and Croatia as well as amongst Western nations including Australia, France, Iceland, Norway, New Zealand, the United Kingdom, the Netherlands and the United States of America. Many African nations use red, yellow, and green, including Ghana, Cameroon, Mali and Senegal. Flags
M. A. H. Akhand et.al.
containing red, white, and black can be found particularly among the Arab nations such as Egypt, Iraq and Yemen [4]. While some similarities in NFs for region, others are rooted in shared histories, socio-cultures, ideals of countries. For example, the flags of Venezuela, Colombia, and Ecuador use variants of the flag of Great Colombia, the country they composed upon their independence from Spain, created by the Venezuelan independence hero Francisco de Miranda; and the flags of Egypt, Iraq, Syria, and Yemen are all highly similar variants of the flag of the Arab revolt of 1916–1918 [5]. The flags of Romania and Moldova are virtually the same, because of the common history and heritage. Moldova adopted the Romanian flag during the declaration of independence from the USSR in 1991 (and was used in various demonstrations and revolts by the population) and later the Moldovan coat of arms (which is part of the Romanian coat of arms) was placed in the centre of the flag. The Nordic countries all have the same design (Iceland, Denmark, Norway, Sweden, Finland, in addition to the autonomous regions of the Faroe Islands and Aland), a horizontal cross on a single-colored background [4]. Moreover, religious status of a country seems have an influence on NF as half moon sign is available a number of Muslim majority countries and cross sign in several Christian dominant countries. The aim of this study is to identify socio-cultural, historical, ideal features of a country on its flag through data mining approach. The study first investigates the features of the NF and makes a data set with all 191 countries. Then decision trees with the data have been constructed and give analysis how similar socio-cultural, historical and regional featured of countries correlate in NFs with similar colors and signs. The rest of the paper is organized as follows. Section II describes the features of NFs that has been considered to make the data set for the study. Section III gives the description of decision trees with C4.5 decision tree construction method and report the knowledge discovered from the decision tree
212
www.ijker.org
International Journal of Knowledge Engineering and Research constructed with the NF dataset. Finally Section IV gives a short conclusion of the study with some future direction that emerges from the study. II. FEATURE EXTRACTION FROM NATIONAL FLAG Selection of features from NFs is a crucial task for fruitful analysis. Based on the observation of NFs of 191 countries it is found that a NF can be identified by three major things: i. Number of major colors it belongs, ii. Special sign in it, and iii. Stripe availability in it. Special sign might be Half moon, Sun or Cross and Stripe might be in horizontal, vertical or in both vertical and horizontal. There are eleven different colors including major three colors (Red, Green and Blue) available in NFs from which a NF conceives two or three colors. Table 1 describes all the features name and descriptions with values that is used to make NF data set from 191 NFs. Next section presents the experimental studies with NF dataset. TABLE I. FEATURE OF NATIONAL FLAGS INCLUDING COLORS, SPECIAL SIGNS AND STRIPES. Feature Feature Sl. Feature Description Name Values 1. Green Is green color available or not? y, n 2. Red Is red color available or not? y, n 3. While Is while color available or not? y, n 4. Yellow Is yellow color available or not? y, n 5. Orange Is orange color available or not? y, n 6. Maroon Is maroon color available or not? y, n 7. Blue Is blue color available or not? y, n 8. Saffron Is saffron color available or not? y, n 9. Black Is black color available or not? y, n 10. Light_Blue Is light blue color available or not? y, n 11. Gold Is gold color available or not? y, n 12. Half_Moon Is half moon sign available or not? y, n 13. Sun Is sun sign available or not? y, n 14. Cross Is cross sign available or not? y, n Is stripe in horizontal, vertical, 15. Stripe h, v, hv, n horizontal- vertical or not?
III.
supports several standard data mining tasks, more specifically, data pre-processing, clustering, classification, regression, visualization, and feature selection. Several algorithms for classification are conceived in Weka; e.g., j48 of Weka is an implementation of C4.5 [7]. C4.5 is the popular and standard decision tree construction algorithm that is widely used for practical machine learning. A powerful feature of Weka is its visual representation of a decision tree. A virtual tree-like visual representation helps easy understanding so that we presented analysis of this study from it. On the other hand, confusion matrix is a specific table layout that allows visualization of the performance of a decision tree. Three decision trees are constructed to correlate flag feature data set for three different conditions of countries: (i) Religious status (ii) Continental Region, and (iii) Government system. On the basis of religious condition, countries may be divided into eight groups: a. Muslim majority country, b. Christian majority country, c. Muslim and Christian majority country, d. Hindu majority country, e. Buddha majority country, f. Jews majority country, g. Indigenous majority country and h. No religion country. Fig. 1 is the Weka constructed pruned decision tree from the NF data set with religion condition and Table 2 is the corresponding confusion matrix. The DT indicates that halfmoon and cross signs are the most prominent features in NFs: only some of Muslim countries flag hold half moon sign; on the other hand cross sign definitely says that the respective country is the Christian majority country. Despite half moon and cross sign, blue color is the common is in a large number of Christian majority countries. Also yellow and white is
EXPERIMENT STUDIES
This section presents the experimental findings from the NFs dataset constructing decision trees. A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision trees are commonly used in decision analysis as well as to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating conditional probabilities [8]. In this study, we used Weka to construct decision trees with the standard and popular algorithm C4.5. Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is a free software tool available under the GNU General Public License [6]. It contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. Weka
M. A. H. Akhand et.al.
Vol 2 Issue 4 April 2013 ISSN 2319 – 832X
Fig. 1. Pruned decision tree from NF data on the basis of Religion Condition.
213
www.ijker.org
International Journal of Knowledge Engineering and Research TABLE II. CONFUSION MATRIX OF THE DECISION TREE OF FIG. 1 No of Pruned DT Classification Religion Status countries a b c d e f g h a = muslim b = christian c = buddhu d = hindu e = no_religion f = jews g = muslim christian (m_c) h = indigenous Total
47 122 12 3 2 1 3 1
29 7 3 1
191
41
1
18 115 7 2 2 1 2 1 148
2 0 0 0 0 0 2
0
0
0
0
0
common indication of Christian majority country. It is seen from the confusion matrix table that among 122 Christian majority countries 115 are classified correctly as Christian country and only seven are misclassified as Muslim countries. On the other hand, among 47 Muslim countries 29 are classified as Muslim and 18 are classified as Christian. The misclassification is reasonable for prunes tree. Pruning reduces tree size to increase to generalize the classification ability. The misclassifications among Muslim and Christian countries indicate that they have some common feature values in the NFs. After Muslim and Christian majority countries, Buddu majority countries are 12 and only two are classified as Buddu and other are classified as Muslim and Christian. Since number of countries except Muslim and Christian is very few
Vol 2 Issue 4 April 2013 ISSN 2319 – 832X in numbers most of them are also classified as Muslim or Christian. On the basis of religion point of view, the rest 10 countries of the rest five categories are classified as Muslim or Christian due to pruning. Fig. 2 is the Weka constructed pruned decision tree from the NF data set with government condition. There are ten different government conditions of the countries: a. Military, b. Parliamentary democracy (par_d), c. Parliamentary republic (par_r), d. Presidential democracy (p_d), e. Presidential republic (p_r), f. Republic, g. Absolute monarchy (a_m), h. Constitutional monarchy (c_m), i. Communist, j. Federal republic (f_r), k. Parliamentary constitutional monarchy (p_c_m) and l. Islamic republic (i_r). For the decision tree construction of Fig. 2, the features cross and half_moon signs are also found important as like tree for Religion Condition (Fig. 1). Although tree of Fig. 2 is difficult to analyze due to large number of categories the confusion matrix of Table 3 clearly explains correlation between NFs features and government systems of countries. From the table it is found that there are 59 Presidential Republic (p_r) counties and 49 are truly classified as p_r and eight are classified as Parliamentary Republic (par_r). On the other hand, among 40 par_r countries 30 truly classified as par_r and nine are classified as p_r. These results clearly indicate that countries having similar government condition contain a sort of similarities in their NFs. It is also noticeable from the decision tree of Fig. 2 that cross and half_moon signs are important
Fig. 2. Pruned decision tree from NF data on the basis of Government Condition Fig. 1. Pruned decision tree from NF data on the basis of Religion Condition.
M. A. H. Akhand et.al.
214
www.ijker.org
International Journal of Knowledge Engineering and Research TABLE III. CONFUSION MATRIX OF THE DECISION TREE OF FIG. 2 Government Status a b c d e f g h i j k l
= = = = = = = = = = = =
No of countries
military par_d par_r p_d p_r republic a_m c_m communist f_r p_c_m i_r
1 30 40 1 59 13 5 22 5 9 3 3
Total
191
a 1
b
15 9 0 1 8 49 4 5 1 2 3 3 10 3 2 2 3 1 1 2
0
9 65 0 93 1
0
6
1 1
3
Pruned DT Classification c d e f g h i j 8 30
k
l
1 1 1 1
1 1
5 3 1
2
1 2
2
5
0 8
5
0
features for building tree like decision tree of Religion Condition (Fig. 1). Fig. 3 is the Weka constructed pruned decision tree from the NF data set with Regional Condition. Regional points of view countries are categorized into the following 12 groups: a. South Asia (sasia), b. East Asia (easia), c. South East Asia (seasia), d. West Asia (wasia), e. Australia (australia), f. North America (namerica), g. South America (samerica), h. Central Africa (africa), i. West Africa (wafrica), j. Southern Africa (safrica), k. Middle East Asia (me), l. Central America (camerica), m. Central Europe (eu), n. West Europe (weu), o.
Vol 2 Issue 4 April 2013 ISSN 2319 – 832X North Europe (neu), and p. Euro Asia (euasian). For the decision tree for regional condition the cross sign in the NFs is found the root node that indicates countries of a region may have religion similarity. Since Fig. 3 is little complex for analysis its confusion matrix in Table 4 is easy to explain. From Table 4 it is found that there are 38 in Central Europe (eu) and 36 are truly classified as eu and remaining two are classified as Central Africa (africa). On the other hand, among 33 African countries 22 truly classified and five are classified as eu. In addition, the misclassification of several countries as eu from various continental regions remembers the history European colonies around the globe and the NFs conceived some features of the European countries. Despite the thing, the decision tree of Fig. 2 reflects that countries of a region have a short of similarity in their NFs. The decision trees of Figs. 1-3 indicate that NFs have similar feature values for similar conditions of the countries such as religion, government situation and regional status. The signs related religions in the NFs are powerful so that decision tree based on religion condition is small and simple with respect to trees for government or regional condition. The sign half_moon and cross in the NFs instance to Muslim and Christian majority country. It also found from the Figs. 2-3 that countries those have similar government condition or states in the same continental region belong a short of similar feature values in their NFs.
Fig. 3. Pruned decision tree from NF data on the basis of Regional Condition.
M. A. H. Akhand et.al.
215
www.ijker.org
International Journal of Knowledge Engineering and Research TABLE IV. CONFUSION MATRIX OF THE DECISION TREE OF FIG. 3. No of Pruned DT Classification Regional counStatus tries a b c d e f g h i j k L m n o p a = sasia 9 2 2 1 4 b = easia 5 1 2 1 1 c = seasia 10 1 2 4 3 d = wasia 11 1 6 1 3 e = australia 12 3 1 8 f = namerica 10 1 5 3 1 g = samerica 12 1 6 3 1 1 h = africa 33 3 1 1 5 22 1 i = wafrica 13 1 1 1 1 3 6 j = safrica 7 1 4 2 0 k = me 5 1 4 0 l = camerica 11 2 1 2 0 6 m = eu 38 2 36 n = weu 5 1 4 0 o = neu 4 1 3 0 p = euasian 6 1 1 1 1 2 0 Total
191
7 1 9 12 11 4 7 41 12 0 0 0 87 0 0 0
IV.
CONCLUSION
A National Flag not only a unique symbol of country but its colors and signs conceive the histories, socio-cultures, ideals of countries. This study investigates the correlation between NFs features with Religion, Government, and Regional status of countries. According to best of our knowledge, this study is the first that looks NFs with eye of
M. A. H. Akhand et.al.
Vol 2 Issue 4 April 2013 ISSN 2319 – 832X machine learning or classification tools. A NF feature data set has been prepared from the NFs of 191 counties and decision tree has been constructed with the data set. Visual representation of decision trees and corresponding confusion matrixes demonstrated that NFs have similar feature values for similar conditions of the countries such as religion, government situation and regional status. As stated earlier, three decision trees are constructed for analysis and further research encourages that more rigorous study with more decision may be very much effective to discover the knowledge from NF. REFERENCES [1] [2] [3] [4] [5]
[6] [7] [8]
The World Atlas, Available: http://www.worldatlas.com/nations.htm. Flag of Bangladesh, Available: http://en.wikipedia.org/wiki/Flag_of_Bangladesh. Country Profile of Bangladesh, Available: http://www.bdgateway.org/country_profile.php. National Flag, Available: http://en.wikipedia.org/wiki/National_flag. Amavilah and V. Heinrich, “National flags, national flag colors, and the well-being of countries”, Munich Personal RePEc Archive, Unpublished, 2008. Available: http://mpra.ub.uni-muenchen.de/11304/ Weka 3: Data Mining Software in Java, Available: http://www.cs.waikato.ac.nz/ml/weka/ J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, 1993. Max Bramer, “Principles of Data Mining”, Springer, 2007.
216
www.ijker.org