Decision Making Procedure: Applications of IBM SPSS Cluster ...

World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 © IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree U.B. Baizyldayeva, 2R.K. Uskenbayeva and 3S.T. Amanzholova

1

KIMEP, Almaty, Kazakhstan International IT University, Almaty, Kazakhstan 3 K. Satpayev Kaz NTU, Almaty, Kazakhstan 1

2

Abstract: Decision making process is one of the most complex tasks faced by all types of businesses. If the decision depends on the multi-criteria problem conditions with huge volumes of data and the data is represented in different formats, it becomes impossible to process the data without the aid of computer software. Today’s market of statistical software can offer a variety of statistical software for solving different complex tasks. Moreover, we can find the very useful introductory guidance on the diversity of software packages classified as general statistics software, data mining software, mathematical computing packages, spatial statistics, econometrics packages, etc. Also there are a lot of free statistical software packages available today. In this article the IBM SPSS Statistics 19 with its cluster analysis and decision tree procedures is taken as a tool for considering decision making problems. This type of analysis can be applicable in turn, sequentially on the certain problem data. Key words: Cluster analysis Decision tree trees (C&RT) QUEST

CHAID

INTRODUCTION Modern society business can’t be imagined without huge volumes of different formats and goals data waiting for it’s turn to be analyzed and decided. Problems of decision making are researched and offered with the aids of special statistical software [1, 2] and even spreadsheet facilities [3, 4]. In developing countries most of business companies, organizations use existing office spreadsheets for simple analysis, sometimes not complete and not providing highly visualized and reliable analysis results. It’s also well known that there are many different mathematical and statistical approaches applicable for deciding tasks of analysis and decision making [5, 6]. Moreover, there are software packages allowing to solve different analytical and decision making problems on pretty high level [7-9]. In this article the problem of decision making stage is set out with possible solving with the aid of “IBM SPSS 19”

Exhaustive CHAID

Classification and regression

Cluster Analysis for Effective Decision Making: Let’s consider the example of direct marketing division of a company wanting to identify demographic groupings in their customer database to help determine marketing campaign strategies and develop new product offerings [10]. The data and variable views are as in the pictures below respectively: The “Direct Marketing” “Segment my contacts into clusters” technique is the most applicable for the example dataset (Picture 3). Specifying the variables for dividing into segments we’ll obtain the following “Model Summary”: Activating “Cluster view” option we’ll see as below: The Cluster view displays information on the attributes of each cluster. For continuous (scale) fields, the mean (average) value is displayed.

Corresponding Author: U.B. Baizyldayeva, KIMEP, Polevaya 8 st., 050007 Almaty, Kazakhstan.

1207

World Appl. Sci. J., 21 (8): 1207-1212, 2013

Picture 1: IBM SPSS Data view

Picture 2. IBM SPSS Variable view

Picture 3. Cluster Analysis 1208

World Appl. Sci. J., 21 (8): 1207-1212, 2013

Picture 4: Model summary

Picture 5: Cluster view For categorical (nominal, ordinal) fields, the mode is displayed. The mode is the category with the largest number of records. In this example, each record is a customer. By default, fields are displayed in the order of their overall importance to the model. In this example, Age has the highest overall importance. You can also sort fields by within-cluster importance or alphabetical order. If to select (click) any cell in Cluster view, you can see a chart that summarizes the values of that field for that cluster. For example, select the Age cell for cluster 1. For continuous fields, a histogram is displayed. The histogram displays both the distribution of values within that cluster and the overall distribution of values for the field. The histogram indicates that the customers in cluster 1 tend to be somewhat older. Select the Income category cell for cluster 1 in the Cluster view.

For categorical fields, a bar chart is displayed. The most notable feature of the income category bar chart for this cluster is the complete absence of any customers in the lowest income category. The Cluster comparison view allows to visualize the clusters differences The received output information in tables and diagrams makes it easy to take decision on the problem. Decision Tree for Effective Decision Making: IBM SPSS Decision Trees helps better identify groups, discover relationships between them and predict future events. This module features highly visual classification and decision trees. These trees enable to present categorical results in an intuitive manner, so that to clearly explain categorical analysis to non-technical audiences. IBM SPSS Decision Trees enables to explore results and visually determine how the model flows. This helps to find specific subgroups and relationships that impossible with more traditional statistics. Decision Trees is applicable for:

1209

World Appl. Sci. J., 21 (8): 1207-1212, 2013

Picture 6: Cluster histogram

Picture 7: Cluster bar type chart

Picture 8: Cluster comparison view Database marketing Market research Credit risk scoring Program targeting Marketing in the public sector Transportation problem IBM SPSS Decision Trees provides specialized treebuilding techniques for classification – entirely within the IBM SPSS Statistics environment. It includes four established tree-growing algorithms: CHAID: A fast, statistical, multi-way tree algorithm that explores data quickly and efficiently and builds segments and profiles with respect to the desired outcome. CHAID algorithm only accepts nominal or ordinal categorical predictors. When predictors are continuous, they are transformed into ordinal predictors before using the following algorithm. Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting and stopping. A tree is grown by repeatedly using these three steps on each node starting from the root node.

Exhaustive CHAID – A modification of CHAID, which examines all possible splits for each predictor Classification and regression trees (C&RT) – A complete binary tree algorithm, which partitions data and produces accurate homogeneous subsets QUEST – A statistical algorithm that selects variables without bias and builds accurate binary trees quickly and efficiently With four algorithms it’s possible try different types of tree-growing algorithms and find the one that best fits your data. Because classification trees are created directly within IBM SPSS Statistics, it’s convenient to use the results to segment and group cases directly within the data. Additionally, it’s possible to generate selection or classification/prediction rules in the form of IBM SPSS Statistics syntax, SQL statements or simple text (through syntax). Decision tree output in the Viewer may be saved to an external file or saved in XML model for later use to make predictions about individual and new cases.

1210

World Appl. Sci. J., 21 (8): 1207-1212, 2013

Let’s consider the example hypothetical data file containing demographic and bank loan history data (see the sample table below Picture 9).

Picture 9: IBM SPSS Variable view To Run a Decision Tree Analysis, from the Menus Choose: Analyze > Classify > Tree... Set the dependent, independent and influence variables. For Our Case It’ll Be as Following:

Picture 10: Decision Tree Variables Specification

Picture 11: Decision Tree view The resulting output view will look as in the schema below: 1211

World Appl. Sci. J., 21 (8): 1207-1212, 2013

REFERENCES

Picture 12: Risk estimation view Risk Estimate 218 Growing Method: CHAID Dependent Variable: Credit rating

Std. Error .008

Picture 13: Classification Tree view Classification Predicted -----------------------------------------------------------Observed Bad Good Percent Correct Bad 808 212 79.2% Good 325 1119 77.5% Overall Percentage 46.0% 54.0% 78.2% Growing Method: CHAID Dependent Variable: Credit rating

The estimation of the risk is reflected as 0.218 or 21.8%. And, finally, the classification of the observed cases will be represented in crosstabs: These types of analysis and output views allow to ease the decision making process of any type of business and scientific problem. CONCLUSION In this article we’ve considered one of the application approaches possible for decision making process of the concrete problem with IBM SPSS 19.

1. Cluster Analysis (Wiley Series in Probability and Statistics) (5th Edition) by Brian S. Everitt, Dr Sabine Landau, DrMorvenLeese, Dr Daniel Stahl, Wiley, 2010. 2. IBM SPSS Statistics Student Version 18.0 for Windows and Mac OS X (1st Edition) by SPSS Inc. 2010. 3. Excel models for business and operations management / John F. Barlow.Chichester, West Sussex ; Hoboken, NJ : Wiley, c2005. 4. Data analysis & decision making with Microsoft Excel / S. Christian Albright, Wayne L. Winston, Christopher Zappe ; with cases by Mark Broadie... [et al.] Australia ; United Kingdom : Thomson SouthWestern, c2006. 5. Aspiration based decision support systems: theory, software and applications / A. Lewandowski, A.P. Wierzbicki, eds. Berlin ; New York : Springer-Verlag, 1989. 6. Business information systems: a problem solving approach / Kenneth C. Laudon, Jane Price Laudon. Chicago: Dryden Press, c1991. 7. Amstatonline URL: http://www.amstat.org/ 8. Stata: Data Analysis and Statistical Software. URL: http://www.stata.com/ 9. SAS: Business Analytics and Business Intelligence Software URL: http://www.sas.com/ 10. IBM SPSS Software URL: http://www142.ibm.com/software/products/us/en/spss-decisionmanagement

1212

Decision Making Procedure: Applications of IBM SPSS Cluster ...

Decision Making Procedure: Applications of IBM SPSS Cluster ...

Suggest Documents

IBM SPSS Decision Trees 20

IBM SPSS Conjoint 22

IBM SPSS Conjoint 22

IBM SPSS Forecasting 22

Cluster Analysis - Straight Talk about Data Analysis and IBM SPSS ...

IBM SPSS Categories 19

IBM SPSS Exact Tests.pdf

IBM SPSS Regression 20

IBM SPSS Exact Tests

IBM SPSS Regression 19

IBM SPSS STATISTIC 21

IBM SPSS Visualization Designer

Introduction to IBM SPSS

Revision of IBM SPSS Statistics

The SPSS TwoStep Cluster Component - SPSS Schweiz

A Decision Making Procedure for Collaborative ...

SPSS Tutorial-Cluster Analysis.

IBM SPSS Statistics Base 19

IBM SPSS Statistics – neue Funktionen

Intermediate IBM SPSS - Flinders University

IBM SPSS Statistics Base 19

IBM SPSS Advanced Statistics 20

IBM SPSS Advanced Statistics 23

IBM SPSS Missing Values 22