represented in different formats, it becomes impossible to process the data
without the aid of computer ... Cluster Analysis for Effective Decision Making: Let's
.
World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 © IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913
Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree U.B. Baizyldayeva, 2R.K. Uskenbayeva and 3S.T. Amanzholova
1
KIMEP, Almaty, Kazakhstan International IT University, Almaty, Kazakhstan 3 K. Satpayev Kaz NTU, Almaty, Kazakhstan 1
2
Abstract: Decision making process is one of the most complex tasks faced by all types of businesses. If the decision depends on the multi-criteria problem conditions with huge volumes of data and the data is represented in different formats, it becomes impossible to process the data without the aid of computer software. Today’s market of statistical software can offer a variety of statistical software for solving different complex tasks. Moreover, we can find the very useful introductory guidance on the diversity of software packages classified as general statistics software, data mining software, mathematical computing packages, spatial statistics, econometrics packages, etc. Also there are a lot of free statistical software packages available today. In this article the IBM SPSS Statistics 19 with its cluster analysis and decision tree procedures is taken as a tool for considering decision making problems. This type of analysis can be applicable in turn, sequentially on the certain problem data. Key words: Cluster analysis Decision tree trees (C&RT) QUEST
CHAID
INTRODUCTION Modern society business can’t be imagined without huge volumes of different formats and goals data waiting for it’s turn to be analyzed and decided. Problems of decision making are researched and offered with the aids of special statistical software [1, 2] and even spreadsheet facilities [3, 4]. In developing countries most of business companies, organizations use existing office spreadsheets for simple analysis, sometimes not complete and not providing highly visualized and reliable analysis results. It’s also well known that there are many different mathematical and statistical approaches applicable for deciding tasks of analysis and decision making [5, 6]. Moreover, there are software packages allowing to solve different analytical and decision making problems on pretty high level [7-9]. In this article the problem of decision making stage is set out with possible solving with the aid of “IBM SPSS 19”
Exhaustive CHAID
Classification and regression
Cluster Analysis for Effective Decision Making: Let’s consider the example of direct marketing division of a company wanting to identify demographic groupings in their customer database to help determine marketing campaign strategies and develop new product offerings [10]. The data and variable views are as in the pictures below respectively: The “Direct Marketing” “Segment my contacts into clusters” technique is the most applicable for the example dataset (Picture 3). Specifying the variables for dividing into segments we’ll obtain the following “Model Summary”: Activating “Cluster view” option we’ll see as below: The Cluster view displays information on the attributes of each cluster. For continuous (scale) fields, the mean (average) value is displayed.
Corresponding Author: U.B. Baizyldayeva, KIMEP, Polevaya 8 st., 050007 Almaty, Kazakhstan.
1207
World Appl. Sci. J., 21 (8): 1207-1212, 2013
Picture 1: IBM SPSS Data view
Picture 2. IBM SPSS Variable view
Picture 3. Cluster Analysis 1208
World Appl. Sci. J., 21 (8): 1207-1212, 2013
Picture 4: Model summary
Picture 5: Cluster view For categorical (nominal, ordinal) fields, the mode is displayed. The mode is the category with the largest number of records. In this example, each record is a customer. By default, fields are displayed in the order of their overall importance to the model. In this example, Age has the highest overall importance. You can also sort fields by within-cluster importance or alphabetical order. If to select (click) any cell in Cluster view, you can see a chart that summarizes the values of that field for that cluster. For example, select the Age cell for cluster 1. For continuous fields, a histogram is displayed. The histogram displays both the distribution of values within that cluster and the overall distribution of values for the field. The histogram indicates that the customers in cluster 1 tend to be somewhat older. Select the Income category cell for cluster 1 in the Cluster view.
For categorical fields, a bar chart is displayed. The most notable feature of the income category bar chart for this cluster is the complete absence of any customers in the lowest income category. The Cluster comparison view allows to visualize the clusters differences The received output information in tables and diagrams makes it easy to take decision on the problem. Decision Tree for Effective Decision Making: IBM SPSS Decision Trees helps better identify groups, discover relationships between them and predict future events. This module features highly visual classification and decision trees. These trees enable to present categorical results in an intuitive manner, so that to clearly explain categorical analysis to non-technical audiences. IBM SPSS Decision Trees enables to explore results and visually determine how the model flows. This helps to find specific subgroups and relationships that impossible with more traditional statistics. Decision Trees is applicable for:
1209
World Appl. Sci. J., 21 (8): 1207-1212, 2013
Picture 6: Cluster histogram
Picture 7: Cluster bar type chart
Picture 8: Cluster comparison view Database marketing Market research Credit risk scoring Program targeting Marketing in the public sector Transportation problem IBM SPSS Decision Trees provides specialized treebuilding techniques for classification – entirely within the IBM SPSS Statistics environment. It includes four established tree-growing algorithms: CHAID: A fast, statistical, multi-way tree algorithm that explores data quickly and efficiently and builds segments and profiles with respect to the desired outcome. CHAID algorithm only accepts nominal or ordinal categorical predictors. When predictors are continuous, they are transformed into ordinal predictors before using the following algorithm. Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting and stopping. A tree is grown by repeatedly using these three steps on each node starting from the root node.
Exhaustive CHAID – A modification of CHAID, which examines all possible splits for each predictor Classification and regression trees (C&RT) – A complete binary tree algorithm, which partitions data and produces accurate homogeneous subsets QUEST – A statistical algorithm that selects variables without bias and builds accurate binary trees quickly and efficiently With four algorithms it’s possible try different types of tree-growing algorithms and find the one that best fits your data. Because classification trees are created directly within IBM SPSS Statistics, it’s convenient to use the results to segment and group cases directly within the data. Additionally, it’s possible to generate selection or classification/prediction rules in the form of IBM SPSS Statistics syntax, SQL statements or simple text (through syntax). Decision tree output in the Viewer may be saved to an external file or saved in XML model for later use to make predictions about individual and new cases.
1210
World Appl. Sci. J., 21 (8): 1207-1212, 2013
Let’s consider the example hypothetical data file containing demographic and bank loan history data (see the sample table below Picture 9).
Picture 9: IBM SPSS Variable view To Run a Decision Tree Analysis, from the Menus Choose: Analyze > Classify > Tree... Set the dependent, independent and influence variables. For Our Case It’ll Be as Following:
Picture 10: Decision Tree Variables Specification
Picture 11: Decision Tree view The resulting output view will look as in the schema below: 1211
World Appl. Sci. J., 21 (8): 1207-1212, 2013
REFERENCES
Picture 12: Risk estimation view Risk Estimate 218 Growing Method: CHAID Dependent Variable: Credit rating
Std. Error .008
Picture 13: Classification Tree view Classification Predicted -----------------------------------------------------------Observed Bad Good Percent Correct Bad 808 212 79.2% Good 325 1119 77.5% Overall Percentage 46.0% 54.0% 78.2% Growing Method: CHAID Dependent Variable: Credit rating
The estimation of the risk is reflected as 0.218 or 21.8%. And, finally, the classification of the observed cases will be represented in crosstabs: These types of analysis and output views allow to ease the decision making process of any type of business and scientific problem. CONCLUSION In this article we’ve considered one of the application approaches possible for decision making process of the concrete problem with IBM SPSS 19.
1. Cluster Analysis (Wiley Series in Probability and Statistics) (5th Edition) by Brian S. Everitt, Dr Sabine Landau, DrMorvenLeese, Dr Daniel Stahl, Wiley, 2010. 2. IBM SPSS Statistics Student Version 18.0 for Windows and Mac OS X (1st Edition) by SPSS Inc. 2010. 3. Excel models for business and operations management / John F. Barlow.Chichester, West Sussex ; Hoboken, NJ : Wiley, c2005. 4. Data analysis & decision making with Microsoft Excel / S. Christian Albright, Wayne L. Winston, Christopher Zappe ; with cases by Mark Broadie... [et al.] Australia ; United Kingdom : Thomson SouthWestern, c2006. 5. Aspiration based decision support systems: theory, software and applications / A. Lewandowski, A.P. Wierzbicki, eds. Berlin ; New York : Springer-Verlag, 1989. 6. Business information systems: a problem solving approach / Kenneth C. Laudon, Jane Price Laudon. Chicago: Dryden Press, c1991. 7. Amstatonline URL: http://www.amstat.org/ 8. Stata: Data Analysis and Statistical Software. URL: http://www.stata.com/ 9. SAS: Business Analytics and Business Intelligence Software URL: http://www.sas.com/ 10. IBM SPSS Software URL: http://www142.ibm.com/software/products/us/en/spss-decisionmanagement
1212