Building Use Cases With Activity Reference Framework for Big Data Analytics Simon Lau Boung Yew Senior Lecturer: School of Computing University College of Technology Sarawak Sibu, Sarawak, Malaysia
[email protected] Abstract—Application of Big Data analytics to exploit for insight and assist in decision making have seen phenomenal growth among organizations in recent years. Although many organizations have invested heavily in Big Data technologies, there still seems to be a lack of common reference framework for organizations to derive business values and answer critical business questions from Big Data Analytics (BDA). In this paper, a vendor-neutral and technology-agnostic Big Data Analytics Activity Reference Framework is presented to close the gap. The framework describes the high level Big Data Analytics activities and serves as a reference to practitioners to decide on the selection of the infrastructure, platform and computing layers. This reference framework is aimed at serving as a launching pad for researchers and implementers to customize their respective Big Data Analytics projects before implementing with the relevant technology and tools. Keywords—Big Data; analytics; reference framework;
I.
INTRODUCTION
Data tells us what has happened and improves strategic planning moving forward. Success and decision making are predicated on the proper understanding of data. The realization of the strengths and weaknesses of an organization and the way to move forward is not only dependent on having access to all data within the organization but also on knowing the method to discover the hidden information in the data for enhanced insight and decision making. Conventional data analytics approaches appear to have been inadequate to meet the demands of the ever increasing volume and variety of data sets under consideration and the velocity in which predictive and prescriptive analytics are needed. In this regard, many organization in recent years have invested heavily in Big Data tools and assets in order to derive business values and answer critical business questions about their customers, products and operations [1]. However, much of the emphasis of existing literature has been on the methodology to manage high-volume, high-velocity and highvariety (the 3Vs of Big Data) data assets for more costeffective access [2]. Many organizations at this stage still lacks advanced capabilities and understanding on how to derive values from Big Data [2][3].
II.
CONTRIBUTION
This paper proposes a reference framework for Big Data Analytics (BDA), focusing on using BDA use case modeling as the point of reference. The paper serves as the reference for domain experts, Big Data users and the general audience on technique to align and define the implementation strategy to business goals when piloting Big Data projects. The rest of this paper is organized as follows. Firstly, an overview of current related work on Big Data architecture and frameworks is presented. It is followed by an overview of BDA and what and how value can be derived from BDA. Next, the features of the novel use case driven BDA reference framework is presented. The usage of the framework is explained with two real life use case scenarios. Finally, the paper is concluded with a summary of work done in this paper and potential future work. III.
RELATED WORK
Big Data is a new and emerging field which is continuing to evolve with newer technologies and techniques slowly maturing. In recent times, efforts have been spent in standardizing Big Data architecture and frameworks in research, industry and commerce. In this section, an overview of the relevant current effort is presented. Among the early industry initiatives to define Big Data Architecture are enumerated such as in [2]: •
Open Data Center Alliance (ODCA) Information as a Service (INFOaaS) [5]
•
Research Data Alliance (RDA) which focuses on data related aspects, but not Infrastructure and tools [6]
•
NIST Big Data Working Group (NBD-WG)’s Big Data Reference Architectures [7]
One notable early work is the NBD-WG Big Data Reference Architecture which distinguish the roles of BDA providers from other Big Data framework providers for the physical, infrastructure, platform, and processing services [7]. Besides, NBD-WG other contribution is to define the general requirements and use cases of BDA [8]. Apart from that, another Big Data Architecture Framework (BDAF) is defined in [4] as a technology stack of components such as Data
Models and Structures, Big Data Lifecycle (Management) Model, Big Data Infrastructure (BDI), BDA Tools etc. Similarly, various Big Data and Data Analytics architectures have been formulated by major commercial data analytics providers such as IBM, LexisNexis etc. and major Cloud Service Providers including AWS Big Data Services, Microsoft Azure, etc. [2]. Microsoft has formulated the Big Data Ecosystem Reference Architecture [9] while IBM devised the Business Analytics and Optimization Reference Framework [10]. In short, there have been extensive standardization effort from the Big Data infrastructure and technology aspects in the early work with the aim to enable cross platforms implementation of Big Data for both distributed data (collection, storage, processing) and metadata/discovery services [11]. To the best of our knowledge, there has been relatively less reported work on the methodological framework to derive business values from BDA implementation and to measure its effectiveness. In the following sections, the abovementioned issue addressed. IV.
VALUE CREATION IN BIG DATA ANALYTICS
Decision making in any organization is closely linked to the business goals of the organization. In traditional business model shown in Figure 1, business goals and strategic directions are defined a-priori. It then defines the rest of the business activities including how data and information available in the organization are utilized.
Fig. 1. Traditional data analysis model
Today, with Big Data analytics, organizations endeavor to exploit data and information available and advanced analytics technology to answer business questions about the organization’s customers, products and operations such as shown in Figure 2 below.
infrastructure layer, computing layer and application layer. The infrastructure and computing layer architecture normally address the optimization of ICT resources, including various data integration, data management, and programming models to meet Big Data demand [12]. Assets in these layers help to manage, prepare and pre-process high volume, high variety and high velocity data before the data can be of any value. They are deemed to be the support layers. On the contrary, the application layer implements the actual data analytics functions leveraging on methods such as statistical analysis, machine learning, data mining etc. to derive value for users [12]. The application layer presents the opportunities to derive the intended impact on Big Data based on domain specific data analytical techniques. Different application domains with different application requirements and data characteristics may leverage on similar underlying technologies and analytical techniques, provided there is a common framework to facilitate the more high level BDA activities. Apparently, such a framework is still some distance away from an accomplished work, if any. Therefore, this paper focuses on modeling a BDA activity reference framework to facilitate the assimilation of BDA technologies into the business process. In order to formulate such a framework, first and foremost the concept of BDA needs to be formalized. As defined by [13], BDA refers to the process of collecting, organizing and analyzing large sets of data (so called "Big Data") to discover patterns and other useful information. Though much has been talked about the benefits of BDA to organizations, the idea on how to derive such benefits or “values” from BDA is still vague at best. To understand the value of BDA, we need to consider the different levels of “knowledge” that can be derived from BDA as outlined in the Gartner’s Analytic Capabilities [14][15]: • Descriptive analytics exploits historical data (information obtained through observation, measurement, or experiments about a phenomenon of interest) to answer the question of “what has happened?” It is mainly applied for reporting purpose looking at the hindsight. • Dialogistic analytics address the question of “why did something happen?” by root cause analysis and data discovery and exploration. • Predictive analytics strives to make known in advance “what will happen?” by predicting future probabilities and trends to provide insight and forecasts. • Prescriptive analytics recommends “what should be done to make it happen?” for decision making by using quantitative analysis of real-time data to find optimal solutions under given constraints.
Fig. 2. Big Data Analytics model
From the technology view point, tools, technologies and techniques in Big Data can generally be categorized into the
The scope of work in focus is highlighted in the Big Data value chain are highlighted in Figure 3 below.
procured? 7
What to do with big data?
Determine Big Data analytics application domain
8
How to derive knowledge from big data?
Determine analytics technique (Data Science)
9
How to provide data in a fast and timely manner?
Select and implement data management technologies (IT department)
10
What is the pattern / trend / conclusion?
Data analytics: Visualize for deep insight for decision making (Data scientist)
11
What are the benefits of Big Data analytics?
Assess/Evaluate for benefit(s)
Fig. 3. Big Data value chain
V. BIG DATA ANALYTICS ACTIVITY REFERENCE FRAMEWORK BDA is the process of examining Big Data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information [16]. The BDA Activity Reference Framework proposed is shown in Table I as a common guiding framework for BDA. TABLE I.
BIG DATA ANALYTICS ACTIVITY REFERENCE FRAMEWORK
Step
Question
1
What is the business domain of concern? What are the business goals? Is there a use case for Big Data Analytics? How Big Data analytics can solve problem or enhance business?
2 3
4
5
6
What can be the most important knowledge / conclusions that can help the organization? What data is being collected or what data needs to be newly
Value Creation Activity Select business domain Determine business goals Analyze Big Data Needs Design use cases
Understand data and derive value (useful knowledge / conclusion)
Determine data Sources
Goals / Assets / Tools / Methodology • • •
Customer Product Operation / Organization
Defined business goals Problem solving (pain)
Explore for better ways (lust)
Big Data Analytics Business Goal-Use Case Reference Framework (Refer to Table II) Examples: • Government Operation • Commercial • Defense • Healthcare and Life Sciences • Deep Learning and Social Media • The Ecosystem for Research • Astronomy and Physics • Earth, Environmental and Polar Science • Energy • Descriptive • Predicti ve • Diagnostic • Prescri ptive
Examples: • Real time systems • ERP • CRM • RSS feeds • Social networks
• Web logs • Click streams Examples: • Business Intelligence • System management • Fault prevention • Fraud detection • Unsupervised learning • Visualization • Simulation • Pattern discovery • Network analysis • Cyber threat analysis • Social network analytics • Sentiment analysis • Predictive analytics • Predictive modelling and risk analysis • Tendency analysis • Financial risk analysis • Customer segmentation • Product performance analysis • Campaign analysis • Customer relationship management • etc. Examples: • Generic data methods • Machine learning methods • Data mining methods • Parallel data analytics • Query processing approaches • Correlation • Classification • Clustering • Etc. Examples: • Hadoop HDFS • Hadoop MapReduce • NoSQL • ETL • ECL • Etc. Examples: • Revolution R • Tableau • Matlab • SAS • Apache Mahout • Apache Hive • SPSS • Scipy • Pivotal • Etc. Big Data benefits assessment as feedback to refinement of business goals (Step 1 & 2).
Requirements WG Use Case template [8] is adopted. The application will be discussed in Section VI. TABLE II.
SAMPLE USE CASES ON VARIOUS BUSINESS DOMAIN
Customer • • • • • •
Fig. 4. Big Data Analytics Activity Model
According to Figure 4, the first step towards BDA is to select the business domain (customer, product or operation) and determine the business goals. Next, a business need whether an existing business problem (termed as “pain” in [17]) or value enhancement to customer, product or operation (termed as “lust” in [17]) is analyzed and identified. The value of Big Data can only be derived with a useful purpose of the data [18]. In this regard, use case modeling of the purpose. A use case describes how a Big Data activity/task can be performed. It is used in system analysis to identify, clarify, and organize system requirements. It helps to outline, from a user’s point of view, a system’s behaviour [16][19].
360 degree customer view Market share expansion Real time sales optimization Improve customer relation Effective marketing Quick response to market trend
Product • • •
Production cost reduction Market driven production Real-time cost optimization
Operation/Organization • • • • •
Improve employees loyalty Improved profit margin Improve operational efficiency Higher ROI on R&D Autonomic supply chain management
A well described use case done in the previous step will be essential for other variables such as the sources of data, the targeted application domain and analytics techniques to be determined early in the project. Lastly, data are analyzed for deep insight on trends and answers to the business questions formulated earlier. As far as selecting the technical data analysis technique is concerned, the technique proposed by business analysis goal vs. data pattern model [20] is referred. This work can possibly be done with the help of • Data management technologies such as the Hadoop framework, Extract-Transform-Load (ETL), ExtractCleanse-Load (ELC) etc. • Data analytics and visualization technologies such as Revolution R, SPSS, Maltab, Hive, Mahout etc.
Fig. 5. Big Data value creation by use case modeling
Need analysis and use case construction is of foremost importance to ensure technique employed for BDA is aligned with the business goals. They influence the overall outcome of the implementation. In case where the organization is with only vague idea at best about how to carry out the exploration of Big Data, it may want to start with more exploratory questions as inputs such as shown in Table I. The design of use cases is to fulfill the business need identified. Use case description may be selected from some possible sample use cases such as shown in Table II. There is growing number of work in progress in defining use cases for Big Data. One of the notable effort in this area is the 51 use cases drafted by NIST Big Data Public Working Group Use Cases and Requirements Subgroup [8]. For use case description, the NBD (NIST Big Data)
It is recommended that an organization should start the initiative small, with data assets, tools/technology and technical know-how that is already available, and iteratively adding further capabilities as the implementation progresses. At every pre-determined interim period after one iteration of implementation cycle of devised methodology, benefits assessment can be done to further refine the business goals and subsequent Big Data analytics activities. VI. DISCUSSION In this section, how the proposed reference framework is applied in two real life scenarios are demonstrated. In use case 1, a local university marketing department is facing challenge in recruiting new undergraduate students and increase student intake number due to intense competition from other educational institutions. It is looking for clues on the pattern of student profile and potential target market segment for future marketing strategy. In use case 2, faculty teaching performance in a university is analyzed based on student teaching evaluation results. Its correlation to students’ academic progress is determined in order to find better ways to teach so that students’ academic performance can be improved. The studies are modeled after the proposed BDA Activity Reference Framework is shown in Table III and IV below.
TABLE III. USE CASE 1: VALUE CREATION ACTIVITIES FOR UNDERGRADUATE STUDENT PROFILING FOR MARKETING USING BIG DATA ANALYTICS Step 1 2
Value Creation Activity Select business domain Determine business goals
3
Determine Big Data Needs
4
Design use cases
5
Understand data and derive value (useful knowledge / conclusion) Determine data Sources
6
7
8
9 10
11
Determine Big Data analytics application domain
Determine analytics technique (Data Science)
Select and implement data management technologies Data analytics: Visualize for deep insight for decision making Assess for benefit(s)
1
Value Creation Activity Select business domain
Determine business goals
3
Determine Big Data Needs
4
Design use cases
5
Understand data and derive value (useful knowledge / conclusion) Determine data Sources
Goals / Assets / Tools / Methodology Customer Defined goals: • Understand pattern of existing student profile (origin of students, family background, courses selected etc.) • Study the profile of students from other successful and more established educational institutions to devise marketing strategy Problem solving: Explore better identify target ways: identify market segment marketing strategy • Use case description based on NBD (NIST Big Data) Requirements WG Use Case Template • Predictive • Prescriptive Structured/unstructured data Examples: • University student registry • Marketing activities feedback form • Website visitor logs • Student profile statistics of the ministry of education • Social network data of existing students • Feedbacks from existing students • Online survey data Examples: • Visualization • Pattern discovery • Sentiment analysis • Predictive modelling • Tendency analysis • Customer segmentation • Campaign analysis Examples: • Generic data methods • Machine learning methods • Data mining methods • Correlation • Classification • Clustering With the help of Information Technology department for the setting up the system. Performed by data scientist or trained marketing staff. Does the analysis improves student intake number in subsequent academic years?
TABLE IV. USE CASE 2: VALUE CREATION ACTIVITIES FOR FACULTY TEACHING PERFORMANCE AND STUDENT ACADEMIC PROGRESS EVALUATION Step
2
Goals / Assets / Tools / Methodology Operation / Organization
6
7
Determine Big Data analytics application domain
8
Determine analytics technique (Data Science)
9
Select and implement data management technologies
10
Data analytics: Visualize for deep insight for decision making Assess for benefit(s)
11
Defined goals: • Understand faculty teaching performance and its relation to students’ academic performance • Devise optimal teaching approach to optimize students’ academic performance Problem solving: Explore better ways identify teaching to teach in order to performance issues improve students’ academic performance • Use case description based on NBD (NIST Big Data) Requirements WG Use Case Template • Predictive • Prescriptive Structured/unstructured data Examples: • Teaching evaluation results • Student academic results Examples: • Visualization • Pattern discovery • Predictive modelling Examples: • Generic data methods • Machine learning methods • Data mining methods • Correlation With the help of Information Technology department for the setting up the system. Performed by data scientist or trained academic staff. • •
Does the analysis improve academic staff teaching performance in subsequent evaluation? Does the analysis improve student academic results in subsequent academic sessions?
The use case descriptions are illustrated in Table V and Table VI based on NBD (NIST Big Data) Requirements WG Use Case Template [8]. TABLE V.
USE CASE 1: UNDERGRADUATE STUDENT PROFILING FOR MARKETING USING BIG DATA ANALYTICS Use case title
Vertical (area) Author/Company/Email Actors/Stakeholder and their roles and responsibilities Goals
Undergraduate students profiling for marketing using Big Data Analytics Student market study Marketing department Marketers, educators, data scientists and students. This use case aims to achieve the following: • Understand pattern of existing student profile (origin of students, family background, courses selected etc.) • Study the profile of students from other successful and more established
Use Case Description Customer Solutions
Big Data Characteristics
Big Data Science (collection, curation, analysis, action)
educational institutions to devise future marketing strategy Marketers can make use of the analysis to focus marketing activities and resources on market segment with higher closing rate. Compute A high performance (System) computing cluster Storage High end storage device Networking High speed broadband Software Hadoop framework, Revolution R Data Source Structured/unstructured (distributed/ data centralized) Examples: • University student registry • Marketing activities feedback form • Website visitor logs • Student profile statistics of the ministry of education • Social network data of existing students • Feedbacks from existing students • Online survey data Volume • File sizes can be of hundreds of GB • Hundreds of thousands of relational database records Velocity Data loaded in batches Variety Data sets are varied. Variability Existing student profile information are rather static while Website logs, social network data etc. grows at high speed. Veracity Challenging especially data which volume grows at high speed. Visualization Data Quality
Data Types Data Analytics
TABLE VI.
Visualization of results and diagnostics in charts and graphs. • From defined sources: data quality is stable. • From social network / internet: data is to be randomly validated and verified by human experts. Text, relational database records Information extraction, filtering, search, and Summarization, machine learning methods, data mining methods, correlation, classification, clustering
USE CASE 2: FACULTY TEACHING PERFORMANCE AND STUDENT ACADEMIC PROGRESS EVALUATION
Use case title Vertical (area) Author/Company/E
Faculty Teaching Performance and Student Academic Progress Evaluation Teaching and academic performance University registrar and QA department
mail Actors/Stakeholder and their roles and responsibilities Goals
Use Case Description
Customer Solutions
Big Data Characteristics
Big Data Science (collection, curation, analysis, action)
Educators, data scientists and students. This use case aims to achieve the following: • Understand the correlation between faculty teaching performance and students’ academic performance • Enhance teaching performance • Devise teaching approach to optimize students’ academic performance Faculty can make use of the analysis to enhance or personalize teaching approaches to targeted cohort of students University management can make use of the results to optimize teaching workload based on faculty expertise and student profile. Compute A high performance (System) computing cluster Storage High end storage device Networking High speed broadband Software Hadoop framework, Revolution R Data Source Structured data (distributed/cen Examples: tralized) • Faculty teaching evaluation • Student academic results Volume • Tens to hundreds of thousands of relational database records Velocity Data loaded in batches Variety Defined data sets Variability Datasets are rather static. Veracity Data does not grow exponentially Visualization Data Quality Data Types Data Analytics
Visualization of results and diagnostics in charts and graphs From defined sources: data quality is stable. Relational database records Information extraction, filtering, search, summarization, machine learning methods, data mining methods, correlation
VII. CONCLUSION Even though Big Data Analytics is emerging as an important and useful tool for organizations to validate and answer business questions, it is essential to note that Big Data Analytics can at best be an effective supporting tool if it is aligned with the overall organization business strategy. The primary goal will always be oriented towards solving business problems or improve business strategy. There needs to be a reference framework to help organizations which invest heavily in Big Data tools and technologies to better utilize their Big Data assets to derive business values for organization. As a contribution to that, in this paper, the value creation process has been illustrated with a use case driven Big Data Analytics Activity Reference Framework. The framework is illustrated with real life use cases. At present, benchmarking tools for the effectiveness of the abovementioned use cases is being devised to measure the level of effectiveness of the deployment. It is a
work in progress and is beyond the scope of this paper. The tool is to be experimented with more real life use cases. ACKNOWLEDGMENT I would like to thank University College of Technology Sarawak for the financial and facility support for the successful completion of this work. REFERENCES [1]
[2]
[3]
[4]
[5]
[6] [7]
[8]
B. Schmarzo. (2011, September 22). Do it right - proven techniques for exploiting big data analytics [Online]. Available: http://cdn.oreillystatic.com/en/assets/1/event/63/Do%20it%20Right%20 %E2%80%93%20Proven%20Techniques%20for%20Exploiting%20Big %20Data%20Analytics%20Presentation.pdf Y. Demchenko, “Big data standardisation in industry and research”, EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud, Luxembourg, Oct. 2013. Ventana Research (2012). The Challenge of Big Data Benchmarking Large-Scale Data Management [Online]. Available: http://www.ventanaresearch.com/uploadedFiles/Content/Landing_Pages /Ventana_Research_Big_Data_Benchmark_Research_Presentation.pdf Y. Demchenko, C. Ngo, and P. Membrey. (2015, February). Architecture Framework and Components for the Big Data Ecosystem Draft Version 0.2 [Online]. Available: http://www.uazone.org/demch/worksinprogress/sne-2013-02-techreportbdaf-draft02.pdf M. Estes, M. Fania, J, L.-Y. Lin, I. L., S. Ramsey, B. Rasaratnam, and M. Symonds. (2013). “Open Data Center Alliance Master Usage model: Information as a Service, Rev 1.0” [Online] Available: http://www.opendatacenteralliance. org/docs/Information_as_a_Service_Master_Usage_Model_Rev1.0.pdf Research Data Alliance (RDA) (2015). “Research data sharing without barriers”. [Online]. Available: https://rd-alliance.org/ NIST Big Data Public Working Group Use Cases and Requirements Subgroup (2014, April 23). DRAFT NIST Big Data Interoperability Framework: Volume 6, Reference Architecture [Online]. Available: http://bigdatawg.nist.gov/_uploadfiles/BD_Vol6RefArchitecture_V1Draft_Pre-release.pdf NIST Big Data Public Working Group Use Cases and Requirements Subgroup (2014, April 23). DRAFT NIST Big Data Interoperability
[9]
[10]
[11]
[12]
[13] [14]
[15]
[16]
[17]
[18]
[19] [20]
Framework: Volume 3, Use Cases and General Requirements [Online]. Available: http://bigdatawg.nist.gov/_uploadfiles/BD_Vol3UseCaseGenReqs_V1Draft_Pre-release.pdf O. Levin. (2013, July 1). Big Data Ecosystem Reference Architecture (Microsoft) [Online]. Available: http://bigdatawg.nist.gov/_uploadfiles/M0015_v1_1596737703.docx J. Bloom, M. Rallapalli, B. Rosen, and H. Schlenker (2012). Smarter analytics: Making better decisions faster with ibm business analytics and optimization solutions. [Online]. Available: http://www.redbooks.ibm.com/redpapers/pdfs/redp4886.pdf Y. Demchenko. (2014). Big Data and Data Intensive (Science) Technologies, System and Network Engineering, UvA [Online]. Available: https://www.os3.nl/_media/20142015/courses/es/uva2014sne-bigdata-overview-v01.pdf H. Hu, Y.-G. Wen, T.-S. Chua, and X.-L. Li (2014). “Toward scalable systems for big data analytics: A technology tutorial”, IEEE Access, vol. 2, 2014, pp. 652 – 687 V. Beal (2014). Big data analytics [Online]. Available: http://www.webopedia.com/TERM/B/big_data_analytics.html D. J. Power, “Using ‘big data’ for analytics and decision support, journal of decision systems”, Journal of Decision Systems, vol. 23, issue 2, pp. 222-228, March 2014 K. Lisa (2012). Advancing Analytics [Online]. Available: http://meetings2.informs.org/analytics2013/Advancing%20Analytics_L Kart_INFORMS%20Exec%20Forum_April%202013_final.pdf M. Rouse (2014). Big data analytics [Online]. Available: http://searchbusinessanalytics.techtarget.com/definition/big-dataanalytics G. Polzer. (2012, September 25). Big Data Use-Cases Across Industry [Online]. Available: http://www.slideshare.net/SwissHUG/big-datausecases-across-industries-georg-polzer-teralytics P. Baumann. (2014, July 15). A Big Picture for Big Data [Online]. Available: http://europe.foss4g.org/2014/slides/2014-07-15_FOSS4GE_big-picture.pdf Usability.gov. (2015). Use cases [Online] Available: http://www.usability.gov/how-to-and-tools/methods/use-cases.html P. Kulkarni (2013, March 27). Determining big data strategy: analyzing use cases and data pattern [Online]. Available: http://blog.harbingersystems.com/2013/03/determining-big-data-strategy-analyzing-usecases-and-data-pattern/