Integrating Spatial Decision Support System with Graph Mining Technique Abdul Halim Omar and Mohd Najib Mohd Salleh Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
[email protected]
Abstract. Large scale crop productivity requires current and accurate information. The choice of a remotely spatial data source entails tradeoffs between cost and accuracy in a timely manner. A primary challenge to large-scale data integration is creating heterogeneous reliability data from different data sources to the same real-world entity. Spatial Decision Support System for agricultural sector can play an important role to facilitate users requiring a large amount of information that must be easily accessible. A fast developing trend in agropedagogical scenario analysis is the combination of multiple association data into coherent interaction networks to enable integrated scenario analysis. This paper explores data integration for spatial data sets using graph mining approach during the development of a prototype spatial DSS as a support tool for the farm manager. The results of mining spatial association rules could be implemented to explore alternative states of the environment and policy options to correlate key parameters conducted and better knowledge discovery. Keywords: Spatial Decision Support System, scenario analysis, graph mining.
1
Introduction
In the agricultural sector, farm management and planning of large scale crop productivity requires current and accurate information. Land managers and decision makers are faced with daily problems related to growing development under a constant diminution of resources; they frequently encounter planning issues characterized by the complexity of interactions between environmental and socio-economic systems. This complexity is essentially related to the diversity of decision alternatives and their variability in space, diversity of criteria nature and the fact that decisions are often surrounded by uncertainty. In the past two decades, the remote and geographic information systems (GIS) were widely used as spatial information technologies to assist land managers in their daily work [7,8,18]. However, more technical expertise is required to improve the decision-making process. Indeed, in spite of their huge capacities in the acquisition and storage of spatial data, GIS have some limits when solving most real-world spatial decision problems. Therefore, the improvement of spatial V. Khachidze et al. (Eds.): iCETS 2012, CCIS 332, pp. 15–24, 2012. © Springer-Verlag Berlin Heidelberg 2012
16
A.H. Omar and M.N.M. Salleh
information in decision-making support is becoming a necessity under the multiple challenges facing decision makers who are supposed to present efficient solutions. The effective process in decision making is becoming more and more indispensable in order to dynamically meet business and customer needs. In this setting, the integration of existing and multiple business processes in information systems aims to combine selected systems [3], so that they form unified views of interacting users with one single information system. For these reasons, the availability of powerful tools should help to afford a flexible problem-solving environment in which problems can be explored, understood and solved under multiple conflicting objectives. This is the principal aim of the present work, where we explore the effects of interfacing spatial analysis tools within the decision-making environment [4]. An integrated view can be created to facilitate information access and reuse through a single information access point. On the other hand, data from different complementing information systems is combined to gain a more comprehensive basis to satisfy needs. We review an application of graph mining which can find the deficiencies between user and system interaction and help to improve business process alignment. In this paper, we focus on the integration of information and highlight integration solutions that are provided by the database community. The relational information between entities and their attributes is important as it helps for new knowledge discovery. Mining relational knowledge in the business process not only requires relational data to be presented but also support for mining tasks effectively and efficiently [15,16]. We provide the semantic discovery that is needed in all integration examples given above and that will form a key factor for future integration solutions [5]. This paper is structured as follows: in Sec. 2, we describe the importance of data integration in agri-business domain applications. Sec. 3 presents our conceptual framework to address the integration using graph mining. In Sec. 4, we evaluate the performance of the framework and show the result from the experiment using real-world data sets. Sec. 5 presents discussion and concludes the paper.
2
Research Background
The contribution of agri-business to economic development, growth and exports has been important, and most recent literature has focused upon the challenges and solution to maintaining this important role. Some earlier studies on agriculture in regard to these challenges have provided a somewhat broad understanding of the issues [10, 13, 14]. In the studies, most researchers highlight key challenges to resource management, such as a lack of proper coordination amongst the country’s development agencies, an inability of farmers to participate in mainstream agricultural development, under-utilization of available technical assistance and other incentives, and a lack of skilled and talented workers. A large scale of crop planting decision-making involves multiple objectives and large heterogeneous data sets [4,5], many unknowns and uncertainties. The numerous constraints in land management are complex decision problems related to numerous actors and points of view. Therefore, more research and development efforts should
Integrating Spatial Decision Support System with Graph Mining Technique
17
be made to explore new tools to evaluate theoretical alternative scenarios for territory management projects and to improve interaction among the members of the project in agri-business. The use of spatial information technologies affords large possibilities in formulating evaluation models inside the spatial decision support system (SDSS). The advantages of GIS improve the capacities as principal components in SDSS. Integration of multiple information systems generally aims at combining selected systems so that they form a unified entity and give users the illusion of interacting with one single information system. In this study, a business process of crop management and its attributes are represented in a graph showing the relationship between data objects and activities, which is important for discovering relational knowledge. Users are provided with a homogeneous logical view of data that is physically distributed over heterogeneous data sources. In general, information systems are not designed for integration. Thus, whenever integrated access to different source systems is desired, possible implicit spatial relationships may lead to more interesting patterns and rules [6,7]. Because of these relationships, all data in real world entities can be represented in the same abstraction principles and affect behavior of other entities in the neighborhood. This task can provide the resolution of schema and data conflicts regarding structure and semantics in the integration problem. The process of extracting spatial relationships provides interesting rules to the users. Well-known spatial relationships will generate high confidence rules; however, not all strong rules necessarily hold considerable information. The thousands of interesting and uninteresting rules can discourage users from interpreting them in order to find novel and expected knowledge [1].
3
Literature Review
To support effective decision making in business process, evaluation tools are needed to make informed long-term regional resource decisions and recognize research needs. These tools can help authorities involved in ecological restoration by identifying decision variables, developing problem solving heuristics, and evaluating the consequences of alternative policy actions. Spatial decision support systems (SDSS) for natural resource management are computer-based tools that tightly integrate decision theory models with ecological models and Geographic Information System analyses and mapping [12]. The information provided by SDSS gives decision makers increased ability to follow outcomes of interacting variables, improves the reproducibility of decisions, and documents the reason why a particular choice was made. 3.1
Spatial Decision Support System
A Spatial Decision Support System has the main characteristics of a DSS. In addition, it should be adapted to the specificity of spatial data. Densham et al. defined a SDSS as a geo-processing system designed to support the decision research process for
18
A.H. Omar and M.N.M. Salleh
complex spatial problems [9]. SDSS are also defined as a conceptual framework that assists decision makers in solving complex spatial problems. Hence, a SDSS has to provide input for spatial data and allow storage of complex structures common in spatial data. This kind of system should also include analytical techniques that are unique to spatial analysis and produce outputs in the form of maps, reports, charts and other spatial forms. One of the most important characteristics of a SDSS is to support users while solving semi-structured or ill-structured spatial decision problems. According to Simon (1960), decision problems fall on a continuum, ranging from completely structured to unstructured decisions: the former occur when the decision-making problem can be structured either by the decision maker or on the basis of relevant theory, whereas the latter must be solved by the decision maker without any assistance from a computer. 3.2
Semantic Integration
The integration of heterogeneous, distributed information from the Web is a complicated task, especially the task of schema matching and integration. During the matching and integration process, the syntactic, semantic and structural heterogeneity between multiple information sources are investigated. In this paper, our main objective is to resolve semantic conflicts. The data, ontology and information integration communities face similar types of problems [12], and we leverage techniques developed by these communities. In general, early integration approaches were based on a relational or functional data model and realized rather tightly-coupled solutions by providing one single global schema. To overcome their limitations concerning the aspects of abstraction, classification, and taxonomies, object-oriented integration approaches were adopted to perform structural homogenization and integration of data. With the advent of the internet and web technologies, the focus shifted from integrating purely well-structured data to also incorporating semi- and unstructured data while architecturally loosely-coupled mediator and agent systems became popular. Frequent pattern and spatial association rule mining algorithm generates candidates and frequent sets [11,16]. The candidate generation in spatial data mining is not a problem because the number of predicates is much smaller than the number of items in transactional databases [15]. Moreover, the computational cost relies on the spatial join computation. Approaches that generate closed frequent sets compute the previous frequent sets and then verify if they are closed. Although they reduce the number of frequent sets, they do not warrant the elimination of well known patterns. In spatial association rule mining, it is more important to warrant that the resulting frequent sets are free of well known dependences, aiming to generate more interesting patterns than it is to reduce the number of frequent sets. Apriori [16] has been the basis for dozens of algorithms for mining spatial and non-spatial frequent sets, and association rules.
Integrating Spatial Decision Support System with Graph Mining Technique
4
19
Research Methodology
The decision support framework for crop planting selection relates to land management and evaluations. Its first component is the analysis of the proposed scenarios and their respective effects on the physical environment. These analyses are supported by inputs from models that simulate each scenario, such as ecological and metrological models, urban growth models, and water-quality models. Tools provided in the second component evaluate effects on wildlife habitat and ecological communities caused by changes in the physical environment. The impact of the research project could be advantageously integrated into a GIS by means of appropriate integration models. The multiple actors implicated in such projects use their expertise and knowledge to affect priorities, defined also as scores, to spatial entities that are subject to an alteration by the project or that represent a potential area to improve the project. For instance, when it is a matter of choosing the best site for a culture, terrain slope is an important factor to consider in the study; thus, the agriculture engineer uses previous knowledge to determine a kind of a mathematical function relating slope to a suitability factor. This process corresponds to the elaboration of evaluation models whose formulation is necessary in the SDSS. In this section, we study the ecological models that provide essential output for evaluating land suitability and management changes, allowing decisions to be made from multiple evaluations. Decision analysis provides tools for systematically formulating and evaluating multiple criteria and explaining why a particular decision was made [9,13]. Next, we apply an architectural perspective to create an overview of the different ways to address the integration problem. The presented classification is based on Anyanwu and Shiva’s analysis [2] and distinguishes integration approaches according to the level of abstraction where integration is performed. Graph mining methods can be used to predict the information requirements of a user during the execution of activities in enterprise application. 4.1
Proposed Framework
In recent research, the idea of using semantic knowledge was introduced. One scholar proposed the idea of eliminating well-known patterns among target feature types and relevant feature types for data pre-processing.
Fig. 1. Ecological and land management in crop plating selection
20
A.H. Omar and M.N.M. Salleh
Figure 1 presents a procedural relationship between weather forecasting and land management conceptual framework. Conceptual models are an effective initial tool for group identification of resources and linkages to attributes in the agricultural environment. The criteria or performance are selected as measurable values of identified attributes and are used to evaluate the success of implemented plans. To evaluate the frequent pattern by eliminating the input space, notice that input space pruning reduces the frequent pattern independently of minimum support. Decision makers then determine the importance of each of these criteria and use this information to evaluate different alternatives. Decision models aid in weighing and evaluating alternatives and may also help decision makers pinpoint conflicts between objectives and conceptualize new alternatives that minimize these conflicts (Ozernoy, 1984). Once an alternative is selected and implemented, expected environmental change is compared to actual conditions through monitoring and directed experimentation, which may lead again to re-evaluations of criteria and implemented plans. 4.2
Data Integration
The integration problems of all information provide nearly the same view on a domain [17]. In the event the domain views of the sources differ, finding a common view becomes difficult. To overcome this problem, multi-ontology approaches as in Fig. 2 describe each data source with its own ontology; then, the local ontology must be mapped, either to a global ontology or between one another, to establish a common understanding. To find relevant documents is not an easy way to satisfy user preferences. In information retrieval, there are many aspects that must be considered to make sure it will work efficiently. Figure 2 is the conceptual framework of information extraction with regards to how information can be retrieved and clustered. It is just a general framework, and we focus on the clustering method in order to structure spatial data sets and cluster documents into relevant groups based on term frequency and concept. There are four main steps to implement the task of data pre-processing for spatial association rule mining.
Fig. 2. Conceptual Framework of Information Extraction
Integrating Spatial Decision Support System with Graph Mining Technique
21
Step 1: Internet Cloud From the Internet cloud, where arbitrary data float and are presented in standards like HTML and XML, we retrieve all relevant information from both mark-up languages having their own rules to carry the data such as XML of Resource Description Framework (RDF). Therefore, in this step, mark-up languages are found and can be visited by the URL. Step 2: Interconnection of Nodes Tag data on HTML and XML can be retrieved by using Document Object Model (DOM). According to DOM, everything inside HTML document is a node. So the entire document is a document node; every HTML element is an element node; the text in the HTML elements are text nodes. These nodes constitute a graph that can be used for the deployment of web crawlers; for XML it is constructed in tree structure and it is therefore allows the web crawler to crawl all over the nodes. Step 3: Scrapping Spatial Data Sets on the Web Crawler will be deployed as an agent to scrap those data over the tags inside the web document and return a collection of data to be structured. Step 4: Deploy Clustering Technique When those spatial data sets have been scrapped, the clustering algorithm is deployed on that dataset to make it structured and unified for all users. Further, more structured data will be kept on the database by a named ontology database. This step will be further expounded in the discussion of clustering with frequent terms and frequent concepts. To map all data to one single domain model forces users to adapt to one single conceptualization of the world. This stands in contrast to the fact that receivers of integrated data widely differ in their conceptual interpretation of and preference for data. They are generally situated in different real-world contexts and have different conceptual models of the world in mind [10]. COIN [10] was one of the first research projects to consider the different contexts in which data providers and data receivers are situated. In our own research, we continue the trend of taking into account user specific aspects in the process of semantic integration. We address the problem of how userspecific mental domain models and user-specific semantics of concepts can be reflected in the data integration process. We investigate how data — equipped with explicit and semantics — can be effectively pre-integrated on a conceptual level. That way, we aim to enable users to perform declarative data integration by conceptual modeling of their individual ways to perceive a domain of interest.
5
Experimental Results
We obtained crop management data sets from the proceedings of the National Conference on Agro-Environment, [14] which contains agricultural and agro-environment
22
A.H. Omar and M.N.M. Salleh
input. We converted the data into a graph transaction. The experimental results of finding frequent sub-graph in Figure 3 displays the number of discovered frequent sub-graphs on different values of the support threshold.
Fig. 3. Minimum support values and the number of discovered frequent sub-graph
To evaluate the frequent sub-graph reduction by different minimum support values, the number of frequent sub-graph increases exponentially. As we start with less minimum support (0% to 8%), the larger frequent sub-graph appeared, in which the edge and vertex labels have generally uniform distribution. In the next experiment, the integration of extracted knowledge from data sets is applied to the class label in classification. Figure 4 shows the ROC hulls for the splitting criteria on the given data sets. We use some of the common classification algorithm [2] such as neural network (NN) and different types of decision tree approaches, such as fuzzy DT and C4.5 to compare with SAR. Our experimental analysis of the performance evaluation of graph mining shows that there is a direct relationship between building the model and attribute size of the data sets. The experimental analysis also shows that spatial association Rule mining (SAR) has good classification accuracy compared to selected classifiers.
Fig. 4. ROC curve for crop management data sets
Integrating Spatial Decision Support System with Graph Mining Technique
6
23
Discussion and Conclusion
The designed framework reduces the difficulties in linking and integrating spatial data drawn from separate sources. Analysis of the knowledge in various agricultural relations enables researchers to integrate graph mining with ontology reasoning in network data analysis. There is a growing need to provide unified access to these heterogeneous data sources. As the amount and complexity of structured data increases in an agricultural sector, large scale collaboration on sensor deployment and an increasing structured web are required. Furthermore, we investigate the incorporation of ancillary spatial data to improve the accuracy and specificity of land suitability classification from crop zoning and meteorological data used to modify the initial classification. Through this case study, we demonstrate how semantic graph mining can be applied to the analysis of land suitability and entities interactions analysis. The principal scope of the present work was the development of a methodological approach to improve the spatial decision-making process in land planning and natural resources management. We integrate tools elaborated in the field of advanced spatial analysis in a spatial decision support system. We showed how it is possible to interface these tools within the different stages of SDSS process. The developed methodology was applied to the problem of suitability assessment because many planning and resource management problems use this information support in negotiations between multiple actors and as a support for representing feasible solutions. The practical case study was based on an impact study where multiple environmental, technical and economic constraints are present. Combined with multi-thematic spatial data, these constraints were transformed into spatial evaluation database used as input for suitability computations models. These models are the principal component in the improvement of SDSS design and choice phases. Two major results have been obtained by the present research. In this paper, we have addressed the problem of modeling structural relationships from relational databases for the purposes of graph mining. We explored alternative graph models and evaluated their space and time efficiency as well as their suitability and database independence; the architecture has been modularized. The significance of this work is the application of the results of this paper to large real-world activities data sets for mining patterns that could not be done otherwise. Acknowledgement. The author would like to thank Universiti Tun Hussein Onn Malaysia for providing grant Vot0824 to support the research project.
References 1. Appice, A., Berandi, M., Ceci, M., Malerba, D.: Mining and Filtering Multi-level Spatial Association Rules with ARES. In: Hacid, M., Murray, N.V., Ras, Z.W., Tsumoto, L.S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 342–353. Springer, Heidelberg (2005) 2. Anyanwu, M.N., Shiva, S.G.: Comparative Analysis of Serial Decision Tree Classification Algorithms. International Journal of Computer Science and Security 3(3)
24
A.H. Omar and M.N.M. Salleh
3. Lodhi, A., Kassem, G., Koeppen, V., Saake, G.: Investigation of Graph Mining for Business Processes. In: ICIIT. IEEE Computer Society, Lahore (2010) 4. Bayardo, R.J., et al.: Agent-Based Semantic Integration of Information in Open and Dynamic Environments. In: 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), pp. 195–206. ACM, Tucson (1997) 5. Bogorny, V., Engel, P.M., Alavares, L.O.: Enchancing the process of Knowledge Discovery in Geographic Database using Geo-Ontologies. In: Nigro, H.O., Cisaro, S.E.G., Xodo, D.H. (eds.), IGI Global, New York, USA (2008) 6. Borges, J., Levene, M.: Data Mining of User Navigation Patterns. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 92–112. Springer, Heidelberg (2000) 7. Carver, S.J.: Integrating multicriteria evaluation with geographical information systems. International Journal of Geographical Information Systems 5, 321–339 (1991) 8. Carey, M.J., et al.: Towards Heterogeneous Multimedia Information Systems: The Garlic Approach. In: 5th International Workshop on Research Issues in Data EngineeringDistributed Object Management (RIDE-DOM 1995), Taipei, Taiwan, pp. 124–131 (1995) 9. Densham, P.J., Armstrong, M.P.: A heterogeneous processing approach to spatial decision support systems. In: Waugh, T.C., Healey, R.G. (eds.) Proceedings of the Sixth International Symposium on Spatial Data Handling Advances in GIS Research, pp. 29–45 (1994) 10. Goh, C.H., Madnick, S.E., Siegel, M.: Context Interchange: Overcoming the Challenges of Large-Scale Interoperable Database Systems in a Dynamic Environment. In: Third International Conference on Information and Knowledge Management (CIKM 1994), pp. 337–346. ACM, Gaithersburg (1994) 11. Laksmi, K., Meyyappan, T.: A Comparative Study of frequent Subgraph Mining Algorithms. International Journal of Information Technology Convergence and Services 2(2) (2012) 12. Mena, E., et al.: OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies. In: First IFCIS International Conference on Cooperative Information Systems (CoopIS 1996), June 19–21, pp. 14–25. IEEE Computer Society, Brussels (1996) 13. Pereira, J.M.C., Dickstein, L.: A multiple criteria decision-making approach to GIS-based land suitability Evaluation. International Journal of Geographical Information Systems 7, 407–424 (1993) 14. Roslina, A., Abu Kasim, A.: Eliminating the Perspective Impacts of Global Warming on Malaysia Agricultural. In: 2nd Proceedings in National Conference on Agro-Environment, Malaysia, pp. 98–104 (2009) 15. Shenker, S., Chawla, S.: Spatial Databases: A tour. Upper Saddle, New York (2003) 16. Srikant, R., Agrawal, R.: Mining generalized Association Rules. In: Dayal, V., Gray, P.M.D., Nishio, S. (eds.) Proceedings of the 21th International Conference on Very Large Databases, pp. 403–419. Morgan Kaufmann, Zurich (1995) 17. Wache, H., et al.: Ontology-Based Integration of Information — A Survey of Existing Approaches. In: Stuckenschmidt, H. (ed.) Workshop on Ontologies and Information Sharing, IJCAI 2001, Seattle, USA, April 4-5, pp. 108–117 (2001) 18. Jankowski, P.: Integrating geographical information systems and multiple criteria decisionmaking methods. International Journal of Geographical Information Systems 9, 251–273 (1995)