A Goal-Oriented Big Data Analytics Framework for Aligning with Business Grace Park, Lawrence Chung
Liping Zhao
Sam Supakkul
University of Texas at Dallas Texas, USA
[email protected],
[email protected]
University of Manchester Manchester, UK
[email protected]
Sabre Corp. Texas, USA
[email protected]
Abstract—Big data analytics is the hottest new technology which helps turn hidden insights in big data into business value to support a better decision-making. However, current big data analytics has many challenges to do it since there is a big gap between big data analytics and business. This is mainly because lack of business context around the data, lack of expertise to connect the dots, and implicit business objectives. In this paper, we present IRIS - a big data analytics framework for aligning with business in a goal-oriented approach. It is composed of ontology for a business context model, analytics methods for connecting big data with business, an action process for collaborative work and an assistant tool utilizing Spark. In this framework, problems of the current process and solutions for the future process are hypothesized in an explicit business context model and validated them by using diverse analytics methods implemented on top of Spark libraries. Also, a goal-oriented approach enables to explore and select alternatives among potential problems and solutions. A business process for clearance pricing decision is used to show how big data analytics can be turned into business value by using our framework which align big data to business goals, as well as for an initial understanding of the applicability of IRIS. Keywords—Big Data Analytics, Big Data, Goal-Orientation, Business Alignment, Business Process
I. INTRODUCTION As the quantity of data not only from business transactions but also from system logs, social media and smart sensors tremendously grows, business organizations need to find ways in which to harness the best value for gaining competitive advantages. Big data analytics, one of the ways, is a technology that supports decision-making by applying advanced analytics such as machine learning or text mining on big data in order to turn hidden insights into business value such as creating new revenue, enhancing operational efficiency or reducing cost [1]. According to an industrial survey [2] in which executives representing leading Fortune 1,000 companies and large Federal agencies participated, 85% of the respondents have Big Data initiative planned or in progress. However, another survey [3] says that one of the biggest challenges for transforming hidden insights into business value lies in the fact that big data analytics is not aligned with business goals. This is mainly because 1) lack of business context around the data, 2) lack of expertise to connect the dots, and 3) implicit business objectives.
There are several previous work on business analytics including goal concepts (e. g, BIM [15], GOMA [16], FBCM [17]), but they do not provide big data processing. Additionally, although business process analytics technologies (e.g. process mining [18], [19]) can handle big data, it is hard to express alignment relationships between business goals, business processes and big data analytics. Both of them does not have explicit insights on problems and solutions for business process improvement. To address those problems, we propose IRIS1 – a novel big data analytics framework for aligning big data with business in a goal oriented approach. It consists of ontology (i.e. important concepts and relationships), diverse analytics methods, a process about how to use this framework, and an assistant tool. In the ontology, there are three layers: business layer, business process layer and big data layer to bridge the gap between big data and business goals. Lower layers serve to qualitatively satisfy its higher layers and the whole concepts in those three layers consist of a Business Context Model (BCM) to help explicitly model alignment relationships. Also, diverse analytics methods are provided such as Big Analytics utilizing Spark MLlib, Big Query using Spark SQL or Claim-Based Reasoning Method and others which are techniques not only to analyze big data itself, but also to connect the results to business goals. While analyzing the alignment, problems within the current business process (i.e. AS-IS) and solutions for the future process (i.e. TO-BE) can be hypothesized and validated through big queries or big analytics for collaboration work, while problems or solutions are explored and significant ones are selected after trade-off analysis, towards a rational decision-making process for business process improvement. For an assistant tool, Eclipse Modeling Framework (EMF) and Spark are used for modeling business elements and analyzing big data. Big queries or big analytics connect big data and business context. For the purpose of illustration, a business process for clearance pricing is used, as well as for an initial understanding of the applicability of IRIS. In this example, we will show how big data analytics can be turn into business value by using our framework which aligns analytics results with business goals.
1
IRIS, in Greek mythology, is a goddess who symbolizes a bridge between heaven and earth – in our adaptation, “connecting the dots”.
Section III describes adopted concepts and Section II introduces a clearance pricing process as a running example. In section IV, we present IRIS framework. Sections V provides empirical study to show the applicability of IRIS and Section VI describes related work. A summary of the paper with future work is in section VII. II. ADOPTED CONCEPTS FOR IRIS In this section, we will introduce several adopted concepts for aligning big data analytics with business in IRIS. For each category, we will give more detailed explanation and why we selected specific ones among other alternatives. A. Goal-Oriented Requirements Engineering (GORE) Concerning the representation for both functional business goals and non-functional business goals, there are several goaloriented frameworks, including KAOS [4], i* [5], and the NFR (Non-Functional Requirements) Framework [6], each with its own emphasis and characteristics. We adopt the NFR Framework, since it can represent both functional and nonfunctional goals and easy to show traceability. NFR framework considers how non-functional requirements such as security or usability can be dealt using Softgoal notion which has no clear-cut criteria whether it is achieved or not. Softgoals are expressed in the form of Type [Topic] which Type is non-functional part and Topic is a functional part. There are three kinds of Softgoals, i.e., NFRSoftgoal, Operationalizaing Softogoal and Claim Softgoal. These Softgoals are further refined using Satisficing relationship, i.e., good enough satisfaction, toward another Softgoal with fully or partially positively (MAKE or HELP), or fully or partially negatively (BREAK or HURT) and decomposition relationships such as AND or OR. Additionally, there are several kinds of root-cause analysis techniques for problems such as Fish bone analysis, fault tree, Problem Interdependency Graph (PIG) [7]. While Fish bone analysis is good to represent uncertainty relationships between root causes and fault tree analysis is proper for logical AND/OR relationship, PIG can represent both relationships, so we adopt PIG. PIG is similar to SIG and also represented in the form of Type [Topic]. B. Business Process Models Business process plays an important role to bridge the gap between big data analytics and business because it describes not only business activities, but also other important information such as data objects or participants. There are many kinds of business process models such as BPMN
(Business Process Model and Notation), Activity diagram in UML and Petri-net. Among them, BPMN is a standard notation to represent a business process as well as data objects such as inputs or outputs of an activity, so we adopt it [8]. BPMN’s basic element categories are flow objects, connecting objects, swim lanes, artifacts. As flow objects, there are events, activities which describes the kind of work and gateways which represent conditions. As connecting objects, sequence flow between activities, message flow across organizational boundaries. Swim lanes such as Pool and Lane are used to organize and categorize activities. Artifacts gives more information such as data objects which represent required data in an activity, group which is used to make a group different activities. In Section III, we will explain more. C. Big Data Analytics Computing Frameworks Big data analytics computing frameworks process large scale data with a parallel and distributed algorithms on commodity clusters. MapReduce and Spark are main streams in industry. MapReduce is a batch processing system and intermediate (shuffle) files for sorting are stored on disks during the processing. Spark is a real-time processing system centered on RDD(Resilient Distributed Dataset). Shuffle files are stored in memory, thus, it is faster than Maprduce. We selected Spark which provides diverse library such as SparkSQL and Spark MLlib with better performance [9]. III. A RUNNING EXAMPLE: A CLEARANCE PRICING DECISION As a running example, Fig. 1 is about a demand and a pricing prediction process models with BPMN for the clearance pricing of a company, Z [20]. The decision process consists of two sub-processes: one for Determining Initial Markdown (discount) Category before a clearance starts, and the other for Updating the Markdown Category during the clearance. Determining the initial markdown starts with a task Predict Demand Manually, which involves reviewing unsold inventory and sales performance during a regular season, and then the participant Pricing Committee comes in to reach the initial Final Markdown. This first sub-process takes about 1 month. During clearance sales, each Country Manager Estimates Time to Sell (ETS), using the sales average of the previous three weeks of Weekly Clearance Sales Reports and their own personal experiences. If sales is slower than predicted (i.e., the ETS is greater than the actual time remaining), the initial Final Markdown may be considered risky, which can lead to Further Markdown Manually.
Fig. 1. AS-IS Business Process of Clearance Pricing Decision in BPMN.
IV. IRIS: A GOAL-ORIENTED BIG DATA ANALYTICS FRAMEWORK FOR ALIGNING WITH BUSINESS A. Overview IRIS framework consists of 4 parts, i.e., Business Context Model (BCM) which integrates important concepts for aligning big data analytics and business, Analytics Methods which enables to analyze big data itself and connect the results to business goals, Action Process for collaborative analytics and IRIS Assistant to support all those concepts. B. Ontology of Business Context Model (BCM) BCM’s ontology is described in Fig. 2 and TABLE I. and consists of three layers; Business Goal Layer, Business Process Layer and Big Data Analytics Layer. They connect with Satisficing Contribution [6] between layers which means satisfaction in a good enough manner rather than logically true or false and can be Positive ( Make, Help, Some Plus), or negative ( Break, Hurt, Some Minus) for alignments. Business Goal Layer: this layer indicates what are business goals and important problems and solutions to achieve the business goals in a business organization. It includes the concepts Business Goal, Business
Problem, Business Solution, Performance Goal and KPI. Business Process Layer: this layer is an operationalization of Business Goal Layer because business goals can be achieved by business activities. That is why the relationship toward the Business Goal Layer is Satisficing. This plays an important role to bridge the gap between business goal and big data since the activities can be expressed with sequence of activities and collections of data which are involved in the activities. It contains Business Process Goal, activities including Sub-Process and Tasks, Data Object and Participant such as Pool and Node. Big Data Analytics Layer: this layer is also an operationalization of Business Process Layer because business activities are supported by information systems to achieve the business process goals. The role of this layer is to validate hypothesized business problems and solutions which are defined in upper layers by using diverse analytics methods. Big Analytics, Big Query and Big Data belong to this layer.
Fig. 2. Ontology of Business Context Model (BCM) in IRIS Framework. TABLE I. DEFINITION OF ONTOLOGY OF BCM.
Name Business Goal KPI Performance Goal Satisficing Contribution Satisficing Label Phenomenon Problem Solution Business Process Goal Business Process Big Data Big Query Big Analytics
Definition A statement of what a business wishes to accomplish Key Performance Indicator A measurable goal to achieve a business or a business process goal Positive Make, Help, Some Plus towards a parent goal Negative Break, Hurt, Some Minus towards a parent goal Satisficed, Weakly Satisficed, Weakly Denied, Denied, Conflict, Undecided Any event (or events) that is (are) observable A Phenomenon, which makes some negative contribution towards achieving a Business (Process) Goal A Phenomenon, which makes some positive contribution towards achieving a Business (Process) Goal
Notation
,
,
, ,
,
, ,
,
,
-
A statement of what a business process is intended to accomplish A collection of inter-related business activities or tasks. The Business process-specific ontology is adopted from BPMN [8] and simplified. Data which has characteristics of high volume, high velocity, high variety, and high veracity Query for Big Data which can execute in a forma of SQL regardless of underlying big data platform Analytics results from Big Data using machine learning library
BPMN Notation in Fig. 1
C. Analytics Methods In our framework, we will provide diverse analytics methods, i.e., Big Analytics, Big Query, Phenomenon Hypothesizing/Validation, Claim-based Reasoning Method and Solution Selection Method. These methods enables big data analysis to find the most critical business problems and the best solutions to achieve business goals by validating hypothesized problems and solutions in a BCM model. How the methods can be used in our framework will be described in detail in Section V with an example. 1) Big Analytics Methods Big analytics methods return analytics results using Spark Machine Learning Library (Spark MLlib) to predict or describe data. The analytics results can be saved back to Big Data Platform or can be used to retrieve them by Big Query Methods. Among the methods that Spark MLlib provides, we have implemented Decision Tree and Support Vector Machine (SVM) in our tool. 2) Big Query Methods Big Query Methods can also be used to get results from Big data by using Spark SQL. While Big Analytics Method is for processing big data, Big Query is for retrieve data either processed- or not-processed data. 3) Phenomenon Hypothesizing Method A business process is a whole-part structure similar to an onion rings, so there are several kinds of hypothesizing Phenomenon including Problems or Solution using this structure. Top-down is a hypothesizing method which starts from the outer most elements to the inner most elements, Bottom-up is from inner most to the outer most and Hybrid is a way to mixture Top-down and Bottom-up. 4) Phenomenon Validation Method This method plays a role to validate problems or solutions by using the results from Big Analytics Method or Big Query Method. For this method, Performance goal is used in the form of “Achieve (KPI, Operator, Target Value)”, where Operator = {} to check whether a goal is achieved or not and Performance goal has Threshold Offset which allows a flexible evaluation range. Problem/Solution Validation Algorithm constitutes of two stage: deciding goal-achievement status using analytics values and deciding validation label according to the goalachievement status. The algorithm is like the followings. /* 1. Decide Goal-Achievement Status by Analytics_Value*/ If Operator = “>” then If Analytics_Value >= Target_Value – Threshold_Offset then Goal_Achievement = True Else Goal_Achievement = False End If Else If Operator = “= 7 AND sales_month