Turning information and data quality into sustainable business value White Paper Boris Otto, Dimitrios Gizanis, Hubert Österle, Gerd Danner
© 2013 – Business Engineering Institute St. Gallen AG. All rights reserved. Reproductions in whole or part prohibited except by written permission. E-mail requests or feedback to
[email protected]. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
Preface Driven by a number of developments in society, the economy, and technology, corporate data management is increasingly gaining importance. Examples of these developments are the management and use of large data volumes, the growing business relevance of consumer data due to the success of social networks, and the proliferation of mobile devices. In parallel, powerful technologies such as in-memory computing are emerging, which allow making use of these developments. It is clear that business opportunities resulting from these innovations can only be leveraged with data and information of high quality. Companies across all industries must treat their data as an intangible asset, i.e. maintain its value and ensure its quality, instead of just “firefighting” data defects which lead to poor business process performance. This white paper aims at giving guidance to companies undertaking this endeavor. It brings together the expertise of more than six years of work in the Competence Center Corporate Data Quality at the University of St. Gallen and SAP’s comprehensive solution portfolio in the field of Information Management as well as SAP’s breakthrough in-memory technology SAP HANA. The white paper provides tangible benefits for companies. First, it gives a sound overview of corporate data quality management capabilities and SAP solutions available in this field. Second, it indicates which SAP solutions are available for supporting individual data management functions, such as data governance (e.g. SAP Master Data Governance), data architecture design (e.g. SAP Sybase Power Designer), or data quality measurement (e.g. SAP Information Steward). Third, it forms a reference of best corporate data management practices as it channels the knowledge of more than twenty partner companies of the Competence Center Corporate Data Quality and hundreds of SAP customers. Forth, it provides an outlook on how SAP’s in-memory technology SAP HANA can drive significant business value regarding data management. I would like to take the opportunity and thank the mixed team of authors from the University of St. Gallen, Business Engineering Institute St. Gallen (BEI), and SAP for their commitment and effort to make this happen.
Walldorf, February 2013
Gerhard Oswald
Management Summary 1
Data and information of high quality is not just a “hygiene factor” for business, but has turned into an asset for competitive advantage. In line with this, data must be carefully managed, thoughtfully governed, strategically used, and sensibly controlled. Excellent organizations recognize the importance of timely, accurate, and reliable data and accordingly treat data as an asset the same way they treat all other corporate assets (such as employees, patents, or manufacturing equipment, for example). The opposite, however, is also true; enterprises using only ad hoc data management practices find that important information gets locked in silos, reports are untrustworthy or practically useless, and vital processes depending on data often run incorrectly. 2
Today’s companies are establishing enterprise-wide data quality management as a corporate function in order to ensure smooth business operations provisioned with the right data at the right time at a sufficient quality level. To support enterprises in their efforts, the Framework for Corporate Data Quality Management (CDQM) describes structures and activities that need to be built up and implemented for efficient and effective management of enterprise-wide data. The Framework has been published as a standard for master data and data quality management by the Competence Center Corporate Data Quality (CC CDQ) of the University of St. Gallen and the
European
Foundation
of
Quality
Management
(EFQM,
see
http://www.efqm.org). The Framework focuses on raising awareness of the topic and on giving guidance for establishing CDQM in organizations. What the Framework does not do, however, is providing guidelines or recommendations as to how corporate data quality management is supposed to be implemented from a technical point of view.
1
2
When data is used within a certain context or when data is processed, it turns into information. Although the terms ‘data’ and ‘information’ are clearly distinguished in theory, the white paper uses the terms as synonyms. This white paper focuses on the quality management of data which is used across the enterprise. SAP Solutions for Information Management is a broader approach and covers the management of both structured data (such as master data, for example) and unstructured data (such as text documents or drawings, for example). The white paper therefore considers only a subset of SAP’s Information Management portfolio. The entire set of SAP’s solutions for Information Management is described in the book Enterprise Information Management with SAP (Gatling et al., Galileo Press, 2012).
This white paper aims at filling this gap, as it describes how the Framework for CDQM can be implemented using solutions and products which are part of SAP Solutions for Information Management. The white paper addresses both experienced practitioners (who need to expand their skills regarding SAP’s Information Management domain) and practitioners who are new to managing, governing, and optimizing the use of data that has an impact on enterprise operations. This white paper can be used in several ways:
as a reference regarding practices and methods for establishing corporatewide data quality management,
as a guide to quickly identify specific products of SAP’s Information Management portfolio and how these products support the implementation of the Framework for CDQM,
as a reference regarding a common terminology to be used by business and IT professionals.
Table of Contents 1.
The Business Perspective on Data Quality............................................................. 8 1.1
Data Quality as a Success Factor for Business ..................................................... 8
1.2
Challenges ............................................................................................................... 10
1.3
Typical Situation in Companies ........................................................................... 10
2.
The Framework for Corporate Data Quality Management (CDQM) .............. 12 2.1
Strategy .................................................................................................................... 14
2.2
Controlling .............................................................................................................. 15
2.3
Organization & People .......................................................................................... 16
2.4
Processes & Methods ............................................................................................. 17
2.5
Data Architecture ................................................................................................... 18
2.6
Applications ............................................................................................................ 19
3.
Implementing the Framework for CDQM with SAP Solutions for Information Management ........................................................................................ 21 3.1
Overview of SAP Solutions for Information Management .............................. 21
3.2
Designing Corporate Data Quality Management ............................................. 21
3.2.1
CDQM Assessment & Strategy Workshop .................................................. 22
3.2.2
CDQM Benchmarking .................................................................................... 23
3.3
Monitoring Corporate Data .................................................................................. 24
3.4
Supporting and Implementing Data Governance and Data Workflows ....... 26
3.5
Implementing and Controlling the Data Lifecycle ........................................... 28
3.6
Designing and Modelling the Data Architecture .............................................. 30
3.6.1
SAP Sybase PowerDesigner ........................................................................... 30
3.6.2
SAP NetWeaver MDM ................................................................................... 33
3.6.3
SAP Data Services ........................................................................................... 35
3.7
4.
Managing the Application Landscape ................................................................ 37
3.7.1
SAP HANA ...................................................................................................... 38
3.7.2
SAP System Landscape Optimization .......................................................... 40
Summary ..................................................................................................................... 42
5.
4.1
Toolbox for Establishing Corporate Data Quality Management .................... 42
4.2
Resources ................................................................................................................. 44
4.3
Contact Persons ...................................................................................................... 45 List of Abbreviations ................................................................................................ 46
The Business Perspective on Data Quality
8
1. The Business Perspective on Data Quality 1.1
Data Quality as a Success Factor for Business
Enterprises use data in their information systems to support business processes. Although this is not new, data quality management currently is receiving a lot of attention. This is because organizations need to respond to a number of business drivers for which high-quality data is a critical prerequisite. A couple of business drivers 1
that require high-quality corporate data are depicted in Figure 1 and described in the following.
Risk Management & Compliance. Many industries are increasingly affected by legal regulation. This development is very likely to become even stronger, as companies need to meet more and more legal and regulatory provisions. Regulatory requirements result in demands for enterprise-wide, standardized management of business data.
Integrated Customer Management. Organizations whose added value is characterized by a high proportion of services and short product life cycles need to be able to have all information related to their customers available by the push of a button (e.g. contracts, pricing conditions, service inquiries, product information). Usually this set of data has to be retrieved from various business units, divisions, departments, branches, and information systems within the organization, making it difficult to get a consistent picture of each individual customer.
Business Process Integration, Automation & Standardization. Business process integration, standardization, and automation allows organizations to benefit from economies of scale and at the same time reduce complexity of their business processes. To be able to do so, there has to be a common understanding of business information entities used in all business areas, since business process standardization cannot be achieved if, for example, corporate data about suppliers and materials are defined, produced and used in different ways.
Reporting. When corporate data needs to be consolidated from different business units, divisions, departments, or branches of a company, it must be
1
By definition, corporate data is data that is used by more than one business division, unit, or department.
The Business Perspective on Data Quality
9
used consistently and it needs to be up to date. As corporate data is used in all company areas, these requirements must be met whenever and wherever corporate-wide reporting takes place (in procurement, sales and distribution, or financial accounting and controlling, for example).
IT Consolidation. Regarding their IT departments, many companies try to live up to the maxim “Do more with less”. They want to reduce IT expenses, either in absolute numbers or in relation to turnover. The biggest cost drivers are operating costs, which is why companies tend to consolidate their applications and infrastructure systems. However, due to application system landscapes having grown over several decades, many companies do not know which systems are responsible for which master data, and they do not know about the flow of master data between the systems. So without sufficient transparency of the master data architecture, companies cannot make any sound and reasonable decision as to which systems accommodating master data can be eliminated or consolidated. Especially data migration is often underestimated and not adequately considered in projects, leading to increased costs and delays.
All these strategic business drivers have in common that their requirements on data quality affect the enterprise as a whole and cannot be met by each business division, unit, or department alone (see Figure 1).
Figure 1 – Strategic (cross-divisional) requirements demanding high-quality data
The Business Perspective on Data Quality
1.2
10
Challenges
Enterprises are facing a number of challenges that make it difficult to establish corporate-wide data quality management. The following list provides examples of these challenges:
Enterprise Size. Large enterprises, particularly if they operate on a global level, have complex organizational structures. Stakeholders in such enterprises need to develop a common understanding of data objects and agree on common data quality goals. But the size of such enterprises and the complexity of their structures often impede transparency regarding the creation and use of data.
“Big Data”. Data volumes are steadily growing. “Big data” usually includes large and complex data sets with sizes beyond the capability of commonly used software tools to capture, consolidate, manage and process data within a tolerable time frame.
Division of Labor. Employees who are responsible for creating corporate data typically are not users of this data at the same time. Data operators often do not know what happens with "their" data and for what purpose it is needed. They can hardly estimate the economic significance of the data for subsequent process steps. This segregation between data creation and data usage increases complexity and bears the risk of insufficient awareness regarding the quality of data at the point of data creation.
Constant Change. Many companies are in a process of constant transformation. Especially mergers and acquisitions as well as restructuring activities (such as divestments, for example) hinder the development of harmonized master data models and definitions and of harmonized enterprise-wide master data creation and maintenance processes.
1.3
Typical Situation in Companies
Despite the importance of data quality with regard to responding to the business drivers and reacting to the challenges listed above, many companies manage data quality entirely in a reactive mode. If this is the case, the importance of data quality often is not recognized until process errors occur, data migration projects fail, or reports turn out to be wrong. In such situations, “firefighting” initiatives are often launched to fix urgent data quality issues (these situations are illustrated by the stars in Figure 2). This ad hoc approach prevents companies from achieving high and sus-
The Business Perspective on Data Quality
11
tainable levels of data quality. Reliable risk management is hardly possible in such
Data quality
cases.
Project 1
Project 2
Project 3
Time
Figure 2 – Reactive mode of data quality management
Therefore, many companies are reorganizing their data quality management toward a more preventive approach. It is widely accepted today that CDQM is not only a couple of software systems, but a steady corporate function. In order to ensure close cross-functional and cross-regional cooperation, enterprises need an overarching CDQM strategy and appropriate solutions. The next section structures and explains the organizational transformation that is required in order to be able to move the enterprise from an "as-is" to a "to-be" state in respect to excellent data quality management. Key Takeaways
Organizations need to respond to business drivers for which data of highquality is a critical prerequisite. In line with this, data must be carefully managed, thoughtfully governed, strategically used, and sensibly controlled.
Many companies have recognized the importance to reorganize their data management practices in order to improve data quality in a sustainable fashion.
To support enterprises in their efforts, the Framework for CDQM provides structures and activities that need to be built up and implemented for efficient and effective management of corporate data.
CDQM is not just about software systems. It is a corporate function that requires cross-functional and cross-regional cooperation as well as appropriate approaches to manage the organizational transformation.
The Framework for Corporate Data Quality Management (CDQM)
12
2. The Framework for Corporate Data Quality Management (CDQM) The Framework for CDQM encompasses the whole set of activities required to improve corporate data (both through reactive and preventive measures). The value of CDQM for organizations lies not only in supporting narrow aspects of the business operations, but in guiding decision makers and experts in establishing CDQM as a corporate function. The main driver of CDQM is the business relevance of high-quality corporate data with regard to, for example, growth, improving organizational efficiency, or improv1
ing customer service (as discussed in Section 1.1). The reference model depicted in Figure 3 groups the activities of a quality oriented data management by means of six design areas: Strategy
3
2
6
2
Monitoring of business-critical data defects by data quality metrics
3
Defined and established roles and responsibilities for CDQM
4
Transparency on usage and maintenance of master data in business processes
5
Unambiguous understanding of data and business objects for the whole company
6
Systematic analysis of business requirements on CDQM applications
Controlling
Organization and People
Processes and Methods
4
local
5
Alignment of data quality management strategy with corporate strategy
Strategy
1
Organization
1
global
Data Architecture
Applications
System
2
Figure 3 – Framework for Corporate Data Quality Management (CDQM)
1
2
The Framework is a standard published by the Competence Center Corporate Data Quality (CC CDQ) of the University of St. Gallen and the European Foundation of Quality Management (EFQM, see http://www.efqm.org). Further results of the CC CDQ are available for download from http://cdq.iwi.unisg.ch. Figure 3 illustrates (on the right side) one exemplary success factor for each design area. All key success factors for implementing CDQM are described in more detail in the following paragraphs.
The Framework for Corporate Data Quality Management (CDQM)
1.
13
Strategy. Aims at evaluating a set of strategic choices around data management in order to be able to make decisions with regard to the way corporate data is to be managed and used. It includes a vision, business benefits of data management, objectives of data management, and a strategic action plan.
2.
Controlling. Aims at planning, implementing, and controlling all activities for measuring, assessing, improving, and ensuring data quality as well as the performance of CDQM as an organizational capability.
3.
Organization & People. Aims at designing organizational structures and defining organizational accountabilities to ensure effective and efficient management and use of data. Data stewardship must be distinguished from data ownership: Data owners specify business requirements for data quality, while data stewards do not own the data, but make sure these requirements are met along the data lifecycle.
4.
Processes & Methods. Aims at managing all processes regarding the acquisition, creation, storage, maintenance, use, and deletion of data (“from cradle to grave”).
5.
Data Architecture. Aims at defining and maintaining specifications that provide a common business vocabulary, express strategic data requirements, and outline high-level integrated application system landscape designs (for both storing and distributing data of enterprise-wide validity).
6.
Applications. Aims at planning, implementing, and maintaining software systems designed to create, maintain, use, and archive data and ensure data quality.
The following sections provide more details about the six design areas.
The Framework for Corporate Data Quality Management (CDQM)
2.1
14
Strategy What is it about? Excellent leaders contribute to achieve the organization’s mission and vision. They recognize the importance of high-quality corporate data as a prerequisite to be able to respond to business drivers, like compliance with regulatory and legal directives, integrated customer management, and reporting (as discussed in Section 1.1). They encourage a culture of preventive CDQM.
Main activities
Define CDQM strategy based on the organization’s vision and mission. Meet stakeholders’ needs and expectations and align them with the business and IT strategy.
Determine, analyze, document and communicate the impact of corporate data quality on business objectives and operational excellence.
Identify and ensure a clear ownership for CDQM. Define the organizational range and functional scope. Design the organization’s structure to support the development, ownership and delivery of CDQM as a corporate function.
Strategy
Define a roadmap and update priorities for CDQM projects or activities that are aligned with the business strategy and are based on cost-benefit analyses.
Business example Unclear ownership of data, heterogeneous system interfaces, or project specific data models were examples of governance deficiencies resulting in duplication of work, firefighting activities, and duplicated data at a leading IT service provider in Switzerland. As a response to these deficiencies, the company decided to specify company-wide decision rights and roles to facilitate a consistent behavior in the use of corporate data. To achieve this goal for a corporate-wide scope, a strategic direction was required. For that reason the company developed a CDQM strategy that comprises scope, value propositions, and strategic guidelines for the design and implementation of corporate data governance. Implications of the guidelines for processes and lines of business were explained in the context of a transformation approach and a vision outlining the transition from the current situation to efficient CDQM that provides sustainable and reliable corporate data.
Key questions:
Has your organization developed a strategy for CDQM that is reviewed and updated based on the organization’s business and IT strategy?
Do you have an executive sponsor? Are leaders personally involved in ensuring that CDQM is developed, shared, implemented, continuously improved, and integrated with the overall organizational management system?
The Framework for Corporate Data Quality Management (CDQM)
2.2
15
Controlling What is it about? Corporate data quality controlling quantitatively assesses the quality of corporate data. Interrelations between corporate data quality and business process performance are identified and monitored.
Main activities
Identify and define corporate data quality dimensions for corporate data classes according to business needs and priorities.
Specify corporate data quality metrics (e.g. scales, points of measurement, methods of measurement) based on cause-and-effect relationships between corporate data defects and business performance indicators.
Identify thresholds and targets for corporate data quality. Develop and improve methods of measurement for corporate data quality metrics. Define maintenance processes and responsibilities for corporate data quality measures.
Controlling
Business example Smooth cross-company supply chain processes at a German consumer goods manufacturer are attributable to good master data quality. Nevertheless, however, some distribution centers complained about insufficient accuracy of data on weights of newly launched products. These data defects resulted in regularly occurring business operations problems and additional costs due to repackaging and restocking when the tolerance range for pallet weights was exceeded. The company initiated a project targeted at specifying and implementing data quality metrics for monitoring these defects. Meanwhile, seven data quality metrics with a total of 32 validation rules have been specified and are continuously being monitored in order to implement an “early-warning system” for upcoming business operations problems.
Key questions
Are corporate data quality measures defined and managed by business and IT users?
Do you permanently monitor the quality of your data? Does your organization measure the business impact of data quality?
The Framework for Corporate Data Quality Management (CDQM)
2.3
16
Organization & People What is it about? Excellent organizations manage, develop and release the full potential of their employees at an individual, team, and organizational level. They ensure that clearly defined roles, which are specified by clearly defined tasks and decision-making rights, are assigned to the right people. Appropriate assignment of CDQM responsibilities allows for efficient and effective performance of related projects and activities.
Organization & People
Main activities
Identify and update roles and responsibilities required for CDQM.
Manage recruitment, career development and succession planning for people working in the management of corporate data quality.
Ensure that people have the necessary knowledge, skills and information to establish, operate and monitor CDQM.
Communicate to people the CDQM strategy and objectives to anchor them in the organizational culture, leverage active participation of stakeholders and eliminate resistance to changes needed (by means of road shows, success stories, newsletters, for example).
Identify and continuously update decisions and activities for excellent CDQM. Define and establish reporting lines and managerial authority in order to coordinate different roles, e.g. Corporate Data Quality Committee.
Business example Due to an internal reorganization at an engineering company that provides medical equipment, the two major business segments (medical and safety equipment) needed to consolidate their individual ERP systems into one single ERP system. Originally, the medical division had managed their product master data within the R&D department, whereas the safety division had assigned product master data management to the operations department. The management board issued a strategic directive that both departments unify their activities and service portfolios. As part of these activities, roles and responsibilities for the management of product data were defined, people were allocated to the revised or newly introduced roles and were trained as required, and implications of the organizational change were communicated throughout the company in order to support the transformation process.
Key questions
Does your organization define, manage, and improve people resources for managing and supporting CDQM?
Has your organization established people’s awareness regarding CDQM? Does your organization measure the performance of your CDQM organization?
The Framework for Corporate Data Quality Management (CDQM)
2.4
17
Processes & Methods What is it about? By using CDQM related processes and services, excellent organizations meet the expectations of internal customers and other stakeholders. Furthermore, the use and maintenance of corporate data in core business processes is actively managed to ensure high data quality throughout the whole data lifecycle (with increased use of “first time right” principles).
Processes & Methods
Main activities
Develop and maintain rules, structures, and standards for handling corporate data (data migration guidelines, for example).
Identify corporate data users and other stakeholders who deal with corporate data and need to manage it to meet their needs.
Model and document the life cycle of corporate data (as-is and to-be) for a better understanding of the use of corporate data within the organization.
Design, implement, monitor and improve data creation, use and maintenance processes.
Define and offer trainings to internal customers (business units, for example) to deepen the knowledge with regard to maintaining corporate data and conducting processes according to the guidelines.
Provide the technical infrastructure to ensure optimal support of the process portfolio from a technical perspective.
Business example A global telecommunications provider was confronted with data quality problems concerning the supplier on-boarding process (incorrect data maintenance, creation of duplicates, high lead times for initial data creation, among other things). These problems were caused by a heterogeneous application landscape and numerous business process variants on different organizational levels (both local and global), resulting in a low quality of supplier master data. The company initiated a project to analyze the existing lifecycle of supplier master data, derive process improvements, and conduct a cost-benefit calculation of the new solution. Once an understanding of the data lifecycle was created and all involved stakeholders, applications, and databases were identified, guidelines for the creation and maintenance of supplier data were developed and implemented, leading to a massive improvement of the quality of the company’s supplier master data.
Key questions
Does your organization continuously define, manage and improve its CDQM processes?
Does your organization identify, model and actively manage the use of corporate data in conjunction with core business processes?
Does your organization measure the performance of your CDQM processes?
The Framework for Corporate Data Quality Management (CDQM)
2.5
18
Data Architecture What is it about? The corporate data architecture is planned and managed in order to ensure data quality with regard to corporate data storage and distribution.
Data Architecture
Main activities
Identify core business objects that are used on a company-wide level and lie within the scope of the CDQM strategy.
Develop a common business glossary by agreeing on unambiguous definitions of terms and business objects (including metadata such as attribute definitions, synonyms, homonyms, and values allowed).
Develop, maintain and publish a business data model to establish a common understanding of core business entities.
Formalize the business object model (business rules, for example) and its respective metadata using common standards (SBVR for business rules modeling, for example) and communicate it throughout the organization in order to plan and develop company-wide, consistent data models.
Close the gap between the “as-is” and the “to-be” (storage and distribution) architecture for each corporate data class (customers, vendors, etc.).
Business example A national European railway network operator was faced with the challenge to ensure that its entire infrastructure is accurately kept in an inventory database in order to be able to report to national authorities. A consistent view on infrastructure data was a major challenge because of the traditional line and staff organization and the heterogeneous IT system landscape. The majority of information creation processes require involvement of virtually all business functions, such as construction planning, schedule planning, asset management, and maintenance. As infrastructure inventory constitutes the basis for the government to decide on co-financing of the infrastructure, insufficient or poor reporting by the company can have substantial negative effects. The company initiated a project to develop and agree upon unambiguous definitions of data objects (e.g. track, tunnel) and attributes (e.g. geo location, length). The definitions are stored, maintained, and accessible for all relevant stakeholders (business and IT roles) in a central repository. Key questions
Does your organization have developed a common understanding of a data model for the main business entities?
Does your organization keep this knowledge up to date, and make it available to employees?
Is data storage, distribution, and use systematically designed, implemented and managed?
The Framework for Corporate Data Quality Management (CDQM)
2.6
19
Applications What is it about? Applications for CDQM are supposed to provide functionality that supports data quality management tasks. Furthermore, requirements have to be derived from defined standards, data quality measures, business rules, data quality rules, which are finally implemented in operational systems and applications.
Applications
Main activities
Identify areas that need to be supported by CDQM (for example, governance of data creation, proactive or reactive data cleansing, data profiling, change request tracking, data quality dashboards).
Classify, evaluate and select software applications from the vendor base.
Document and continuously maintain time and milestone plan (roadmap).
Document and continuously close the gap between as-is and to-be application landscape to support CDQM activities.
Prepare, implement and continuously monitor the deployment of the application landscape.
Business example A consumer goods manufacturer is using a central product lifecycle management (PLM) system for the management of its global product data. The PLM system provides new or modified product data at regular intervals (i.e. every three hours) to five regional ERP systems and a number of other global information systems (a decision support system (BW), a planning system (APO), and a procurement system (EBP), among others). Planning, maintenance, and implementation of the system landscape is done by a shared service department. The majority of systems (PLM, ERP, BW, or APO systems) have been implemented with products from SAP. A major benefit of this single-source strategy is that complexity regarding interfaces and license contracts could be reduced. But the consolidation process in the market for master data management (MDM) systems in general, together with changes in SAP’s portfolio in particular, has raised a number of questions: Which of the MDM functions available are required by customers of the shared service (i.e. the business users)? Does the shared service offer all required functions in the quality desired? The company uses the functional reference model for CDQM software systems to find answers to those particular questions.
Key questions:
Does your organization plan, manage, and improve the application landscape to support CDQM activities?
Does your organization plan, manage and continuously monitor the gap between the “as-is” and “to-be” application landscape?
Is a rollout plan being managed to support CDQM activities?
The Framework for Corporate Data Quality Management (CDQM)
20
Key Takeaways
Companies moving from ad hoc approaches toward a preventive approach for managing data quality can focus on solving strategic problems instead of fighting daily data incidents.
The Framework for CDQM provides a catalog of activities and guidelines to improve and sustain corporate data quality in organizations.
The Framework for CDQM is based on the EFQM Excellence Model, which is a well-established standard used by over 30,000 organizations.
Implementing the Framework for CDQM with SAP Solutions for Information Management
21
3. Implementing the Framework for CDQM with SAP Solutions for Information Management 3.1
Overview of SAP Solutions for Information Management
SAP is best known for business processing applications such as the SAP Business Suite. However, over the last several years, SAP has made significant investments in extending its portfolio beyond the Business Suite. One important extension includes the management of data and information, which is covered by SAP Solutions for Information Management. SAP Solutions for Information Management span the full range of capabilities for managing both structured and unstructured data along the entire data lifecycle, i.e. from acquisition to use to finally retirement and deletion. The following sections describe SAP’s main Information Management solutions and outline typical challenges data professionals encounter, key features of the Information Management solutions with regard to these challenges, and their business benefits. It introduces to readers how these solutions support the implementation of the six design areas of the Framework for CDQM (as discussed in Section 2). 3.2
1
Designing Corporate Data Quality Management
As a starting point to implement CDQM, a data management strategy is required to outline the objectives, define the scope, and determine the business value of enterprise-wide data management, and to develop a strategic action plan (see Section 2.1). Based on this strategy, a CDQM program is set up. BEI and SAP jointly offer two approaches for effective establishment and improvement of a CDQM program, 2
which are detailed in the subsequent sections:
CDQM Assessment & Strategy Workshop (see Section 3.2.1)
CDQM Benchmarking (see Section 3.2.2)
1
2
SAP Solutions for Information Management covers the management of both structured data (such as master data, for example) and unstructured data (such as text documents and drawings, for example). The white paper focuses on the quality management of structured data and therefore considers only a subset of SAP Solutions for Information Management. For more information on SAP’s solutions for Information Management see Enterprise Information Management with SAP (Gatling et al., Galileo Press, 2012). Neither approach requires purchasing SAP Information Management software.
Implementing the Framework for CDQM with SAP Solutions for Information Management
3.2.1
22
CDQM Assessment & Strategy Workshop
Companies start their CDQM journey at various levels of maturity. Regardless of the respective level, the CDQM Assessment and Strategy Workshop is a good starting point. It aims at collecting the baseline information to evaluate a company’s CDQM maturity, identifying major challenges, presenting best practices, and defining an individual roadmap (for more details see Table 1).
CDQM Assessment & Strategy Workshop
Typical challenges CDQM professionals encounter
“I am supposed to set up an enterprise-wide data quality management initiative and to transform the company from a reactive ‘problem solving’ towards a structured and efficient organization“.
“I need recommendations in order to achieve CDQM related goals and how to build a business case for CDQM”.
“I need to understand the ‘big picture’ of CDQM from a strategic, organizational, and systems perspective”.
“I would like to understand the toolbox for establishing CDQM available from SAP and the Business Engineering Institute St. Gallen (BEI)”.
Key Features
One-day workshop – To give first answers, SAP and BEI offer a workshop in order to convey the “big picture” of CDQM, present best practices, and define a roadmap for CDQM based on a quick assessment.
Business Benefits
Review of the current situation and challenges. Mapping the organizations’ requirements to the Framework for CDQM. Definition of a company specific CDQM roadmap. Identification of tools that help to establish CDQM. Table 1: CDQM Assessment & Strategy Workshop at a glance
Key Takeaways: CDQM Assessment & Strategy Workshop The
Strategy
CDQM Assessment & Strategy Workshop is a service offered by BEI and SAP. It covers all design areas of the Framework for CDQM.
Strategy Organization
Controlling
Organization and People
Processes and Methods
Data Architecture Applications System
The one-day workshop helps structure existing requirements and develops roadmap toward company-wide CDQM.
a
Implementing the Framework for CDQM with SAP Solutions for Information Management
3.2.2
23
CDQM Benchmarking
If companies have already established CDQM practices and want to learn how they compare to others, CDQM Benchmarking offered by BEI and SAP is the right approach to learn from peers and to improve existing practices (see Table 2).
CDQM Benchmarking
Typical challenges CDQM professionals encounter
“I want to compare my organization with external peers or with other divisions”.
“I need to establish a performance baseline prior to a business transformation or implementation project”.
“I’m supposed to review the progress after the completion of a business transformation or implementation project”.
“I want to set up a dashboard for continuous improvement”.
“I want to learn how other companies approach corporate data quality management”.
Key Features
Benchmarking studies are tailored for an executive audience – Studies focus on the most relevant metrics as well as on actionable recommendations.
Reports provide comprehensive benchmarking comparison – Comparisons focus on both best-in-industry and best-in-class.
Business Benefits
Learn how the KPIs of your organization compare with best-in-class KPIs. Receive a detailed, organization specific benchmarking report. Receive recommendations for organizational change by taking benchmarking insights and best practices back to your business. Table 2: CDQM Benchmarking at a glance
Key Takeaways: CDQM Benchmarking CDQM Benchmarking is a service jointly
Strategy
Strategy
offered by BEI and SAP. It covers all design areas of the Framework for CDQM.
Organization
Controlling
Organization and People
Processes and Methods
CDQM Benchmarking helps to assess and improves the progress of CDQM initiatives.
It can be used to set up an enterprise-wide Data Architecture Applications System
CDQM program, but also to compare the company’s CDQM situation with others and learn from them.
Implementing the Framework for CDQM with SAP Solutions for Information Management
3.3
24
Monitoring Corporate Data
In accordance with the Framework for CDQM a management system is required for planning, implementing, and controlling all activities for measuring, assessing, and improving data quality. SAP Information Steward, which is part of SAP Solutions for Information Management, enables companies to continuously measure, monitor and control corporate data quality. SAP’s Information Steward allows gathering measurements based on predefined metrics and demonstrating the success of CDQM programs. “Drillable” scorecards on each data domain (customers, suppliers, for example) guide users from a highlevel score of “supplier is red” to specific data records that are not passing the 1
threshold (data defects). In doing so, it provides a starting point for business and IT to improve data quality. Integrated “data lineage” and “impact analysis” capabilities allow users to find the origin of any piece of data and to analyze the impact of data defects and quality issues on downstream data flows or processes. Continuous data quality monitoring and early detection of data defects enables data managers to proactively set up tailored cleansing activities and to optimize datarelated processes before data quality issues have a negative impact on core business processes. Table 3 provides detailed information about key features and benefits of SAP Information Steward.
1
By using complementary Information Management products such as SAP Data Services, metrics associated information can also be moved to other reporting systems. SAP Data Services can then be used to provide a basis for cleansed data with a reduced number of duplicates (compare section 3.6.3).
Implementing the Framework for CDQM with SAP Solutions for Information Management
25
Typical challenges CDQM professionals encounter
“I want to show the benefits of enterprise-wide data management to business leaders and to establish a ‘before/after view’ of data quality metrics”.
“I need to provide continuous insight into the quality of our data”. “I want to analyze the trustworthiness of our existing corporate data”. “I have to visualize data quality metrics for various audiences”.
SAP Information Steward
Key Features
Data quality dashboard and monitoring – Key user interface with the ability to monitor and visualize data quality.
Data quality metrics – Ability to define validation rules to assess the quality of specific data domains (supplier data, for example) and apply against reference 1 data sources to continuously monitor data quality.
Data profiling – Identification of data defects based on specified validation rules.
Metadata business glossary – Provides a central glossary to store business terms 3 or definitions that have been approved by business experts.
Root cause and impact analysis – Determine the origin of data quality problems 2 and how they impact on critical business processes; a drill down to the data sources that have failed to meet defined validation rules is possible.
Business Benefits
Enhanced data quality via increased transparency of data quality, origins, and lineage.
Reduced manual effort for aggregating information and building one-off dashboards.
Improved data management processes using tools to support data governance.
Reduced complexity of the IT landscape by deploying a single solution for data profiling, data quality monitoring, and metadata management.
Common understanding and acceptance of business terms, which are centrally defined and managed for the whole organization.
Increased efficiency and reduced costs of data quality projects via a collaborative environment for IT and business users.
Table 3: SAP Information Steward at a glance
1
2
3
Validation rules intended to meet requirements posed by policies, industry standards or external regulation can be reused for validation of master data while it is being created or to cleanse and standardize data. SAP Information Steward provides a means to define ownership and accountability of critical business data assets (compare with Section 2.3). According to the Framework for CDQM this feature is linked with the “Data Architecture” (compare with Section 2.5).
Implementing the Framework for CDQM with SAP Solutions for Information Management
26
Key Takeaways: SAP Information Steward SAP Information Steward is associated with
Strategy
Strategy
the measurement of data quality (“Controlling”). It supports root cause analysis and impact analysis, for example.
Organization
Controlling
Organization and People
Processes and Methods
Data Architecture
SAP Information Steward also supports metadata management and provides a central glossary to store business terms or definitions (“Data Architecture”).
Applications System
3.4
Supporting and Implementing Data Governance and Data Workflows
Clearly defined roles, which are specified by clearly defined tasks and decisionmaking rights, need to be assigned to competent people in order to enable efficient and effective performance of data creation and maintenance activities (as discussed in Section 2.3). SAP Master Data Governance (SAP MDG) is a solution that centralizes the creation and management of corporate data for the SAP Business Suite as well as for non-SAP environments. With SAP MDG companies are enabled to define and enforce CDQM responsibilities centrally, i.e. to determine who is supposed to be allowed to maintain, validate and distribute corporate data in order to facilitate compliance to the company’s standards, rules, and policies. SAP MDG provides centralized and role based governance for selected master data domains (financial, supplier, customer, and material master data, for example) as well as for custom domains. It is based on core SAP business technology and consistently uses SAP’s standard data models. It provides master data maintenance capabilities allowing companies to guide and maximize the efficiency of their business processes. SAP MDG uses SAP Business Workflow to establish and execute governance processes within the application. The solution comes with prepackaged workflows for the central master data creation and maintenance process for key data domains. The workflows typically involve multiple people adding their expertise regarding master data (one person edits a material’s classification information and another adds units of measurement, for example). The workflows also include workflow steps for ap-
Implementing the Framework for CDQM with SAP Solutions for Information Management
27
proval of changed data. SAP MDG tracks all changes to approvals of data records for subsequent audits. Typical challenges CDQM professionals encounter
“I want to establish data governance (processes) for specific master data domains”.
“I need to implement defined data governance roles and responsibilities”.
“I want to govern the creation of master data before it enters my operational systems”.
SAP Master Data Governance
Key Features
Integrated and centralized approach – Involves multiple users who contribute their knowledge to the creation or change of master data.
Prebuilt governance scenarios – Selected master data domains are supported based on SAP standard data models; governance processes can be extended.
Rule based workflows with integrated rules management functionality – Ensure the quality of master data at the point of creation (supports “first time right” principle).
Prebuilt validation –Provides prebuilt validation rules against SAP business logic and customer’s configuration settings as well as a generic business rule engine and integration with SAP Data Services.
Data distribution framework – Distribution of master data to SAP and non-SAP applications.
Mass maintenance – Supports the processing of multiple selected objects of the same entity type in one step (mass creation and mass changes).
Native integration – Integration with SAP Data Services and SAP Information Steward ensures data remediation, duplicate prevention, and data enrichment.
Interactivity and usability – Provides an intuitive interface to give business users role based access to the information they need when they need it, (for creating, changing, and approving master data, for example); different user interfaces can be designed with a user interface generator tool.
Tracking – All changes to and approvals of master data are documented.
Business Benefits
Ready for use in SAP applications through ready-to-run governance processes for specific master data domains.
Enables compliance with enterprise standards, rules, and policies; prebuilt validation against SAP business logic and customers’ configuration settings.
Facilitates compliance through collaborative and role based information management. Table 4: SAP Master Data Governance at a glance
Implementing the Framework for CDQM with SAP Solutions for Information Management
28
SAP MDG can also be triggered by SAP Information Steward using the workflow capability. This is beneficial when a user drills into failed data using the Information Steward (see Section 3.3) and wants to trigger a prompt request for data correction. In addition, SAP MDG integrates with SAP Data Services for informing the user about duplicate data sets (see Section 3.6.3). It also uses its enrichment features (to support users when entering correct addresses of business partners, for example). This, together with the ability to distribute master data to the relevant business systems within the landscape, can replace the often error-prone process of manually maintaining master data in multiple systems.
Key Takeaways: SAP Master Data Governance SAP Master Data Governance (SAP MDG)
Strategy
Strategy Organization
Controlling
Organization and People
Processes and Methods
Data Architecture Applications System
enables efficient and effective data creation and maintenance activities based on clearly defined roles, before data enters operational systems.
It supports the “Organization and People” and the “Processes and Methods” design area within the Framework for CDQM.
SAP MDG provides standard data models for selected master Architecture”).
3.5
data
domains
(“Data
Implementing and Controlling the Data Lifecycle
Data, like any other asset of an enterprise, has a lifecycle that needs to be managed: Data is created and updated in databases, repositories, or software solutions, archived, and finally destroyed. An effective data lifecycle management approach is an essential part of a CDQM strategy (see Section 2.1). It specifies steps to define, document and introduce ways to better manage the data that exists across the organization. SAP NetWeaver Information Lifecycle Management (SAP NW ILM) comprises the policies, processes, practices, and tools to align the business value of information with the most appropriate and cost-effective IT infrastructure from the time information is created to its final destruction. It includes knowing and categorizing data, defining policies that govern what to do with the data, and setting up a system in such a way that these policies can be applied to corporate data.
Implementing the Framework for CDQM with SAP Solutions for Information Management
29
SAP NW ILM has evolved from a simple data archiving solution to a comprehensive solution for determining how long and in what ways data should be kept and used. It supports the decommissioning of both SAP and non-SAP solutions (more details are listed in Table 5).
SAP NetWeaver Information Lifecycle Management
Typical challenges CDQM professionals encounter
“I need to support the management of the entire data lifecycle across the organization and automated destruction of data after its expiration date”.
“I need to support legal requirements and policies that are automatically and systematically applied to certain data domains”.
“I need to place legal holds on data to ensure it is not deleted, in case of a pending lawsuit, for example”.
“I need to shut down a legacy system and keep the related data stored”.
Key Features 1
SAP Data Archiving – Enables to move transactional and master data that is no longer required in everyday business from the database into long-term, less expensive storage.
Retention Management – Provides retention policy management functions that support the complete information lifecycle from creation to retention to destruction; enables to enter different rules and policies reflecting various criteria, including where data is stored, duration of data retention, or when data can be destroyed (based on an expiration date); policies can be applied to structured and unstructured data.
System decommissioning – Provides a complete approach to shutting down legacy systems and bringing the data from both SAP and non-SAP applications into a central SAP NetWeaver ILM retention warehouse, where it is securely stored with the relevant expiration dates and can be viewed with report options.
Business Benefits
Enables automation – At the end of the data lifecycle an automated destruction function permanently destroys the data – in the productive SAP applications as well as in the archive – when the expiration date has been reached.
Table 5: SAP NetWeaver Information Lifecycle Management at a glance
1
Not to be confused with SAP Archiving by OpenText, which provides secure long-term storage of archived data (including unstructured data, such as scanned invoices, for example).
Implementing the Framework for CDQM with SAP Solutions for Information Management
30
Key Takeaways: SAP NetWeaver Information Lifecycle Management SAP NW Information Lifecycle Management
Strategy
Strategy
supports the complete data lifecycle from creation to retention to destruction.
Organization
Controlling
Organization and People
Processes and Methods
It supports the “Processes and Methods” design area within the Framework for CDQM.
Data Architecture Applications System
3.6
Designing and Modelling the Data Architecture
Following the Framework for CDQM, it is crucial to plan and manage the data architecture in order to ensure enterprise-wide data quality with regard to data usage, data modeling, data integration, data storage, and data distribution. Moreover, a common understanding of a data model for the critical business entities is a fundamental prerequisite for all other aspects of reliable enterprise-wide data quality management (defining data governance, data workflows, business rules etc.). The following main solutions from SAP support the implementation of the data architec1
ture (according to Section 2.5):
SAP Sybase Power Designer (see Section 3.6.1)
SAP NetWeaver MDM (see Section 3.6.2)
SAP Data Services (see Section 3.6.3)
3.6.1
SAP Sybase PowerDesigner
SAP Sybase PowerDesigner is an industry leading data modeling and architecture tool and a core component of SAP Solutions for Information Management. It supports the inclusion of the data architecture into broader architecture environments, like “Information Architecture” and “Enterprise Architecture”. Using the SAP Sybase PowerDesigner’s integrated models, enterprises are able to build a blueprint of their enterprise that ensures connections between business goals, applications, 1
SAP Information Steward covers also some specific aspects of the data architecture. For example, Metapedia as part of SAP Information Steward is a helpful component that helps to bridge the gap between IT and business users, providing a common understanding of where what information is available in the company (as discussed in Section 3.3).
Implementing the Framework for CDQM with SAP Solutions for Information Management
31
processes, data, and systems. The tool’s models are fully integrated and remain synchronized using the “Link & Sync” technology. The SAP Sybase PowerDesigner’s scalable “Enterprise Repository” offers role-based security capabilities and version control to support metadata management and teamwork environments. It provides robust reverse-engineering and forwardengineering for the leading database systems, and it allows cross-functional teams to easily visualize, analyze and manipulate metadata for effective database design and data architecture implementation. Several modeling techniques, such as logical data modeling, data warehouse modeling, physical data modeling, or XML modeling, are supported. More details are listed in Table 6.
Implementing the Framework for CDQM with SAP Solutions for Information Management
32
Typical challenges CDQM professionals encounter
“There is no clear understanding of core business terms and entities in my organization. My company needs a single definition of data and how that data is being used across the organization”.
“I need to understand how data is created, maintained, distributed and consumed across my enterprise for both packaged and custom built environments”.
“We need to build a strong foundation for our Business Intelligence platform by integrating all data sources into a common architecture”.
“I need a tool to support data model analysis and design activities for both IT and business users”.
“The impact of changes made to the corporate data model is not transparent in my organization”.
SAP Sybase PowerDesigner
Key Features
Support of several modeling techniques – Supports, for example, conceptual, logical, physical, and data warehouse data modeling as well as application and business process modeling.
Business and Data Alignment – Allows the organization to align processes and information for a holistic view of the enterprise.
Data Consistency – Provides the ability to understand where and how data is being used throughout the enterprise from concept to implementation.
Enterprise Glossary – Defines business terms and categories, completely with aliases, to fully align data models with the business language; enforces management of naming conventions throughout all models; supports consistent naming standards, business language alignment, and data stewardship as well as governance efforts.
Impact Analysis – Models are fully integrated using the unique “Link & Sync” technology; this integration ties all model types together for complete enterprisewide or project-wide impact analysis; impact analysis streamlines communication and collaboration to dramatically increase the entire organization’s responsiveness to change.
Business Benefits
Reduces database creation, maintenance, and re-engineering efforts due to the model driven approach (brings impact analysis and design time change management together with formal database design techniques).
Improves team productivity with a single metadata repository for all modeling types; all model types are integrated providing complete enterprise-wide or project-wide impact analysis.
Provides a customizable interface to make common tasks easier while empowering advanced users with rapid access to all features.
Impact analysis streamlines communication and collaboration to increase the entire organization’s responsiveness to change. Table 6: SAP Sybase PowerDesigner at a glance
Implementing the Framework for CDQM with SAP Solutions for Information Management
33
Key Takeaways: SAP Sybase PowerDesigner SAP
Strategy
Sybase PowerDesigner supports companies in establishing a common understanding of core business terms and entities.
Strategy Organization
Controlling
Organization and People
Processes and Methods
Data Architecture Applications System
Its integrated models allow organizations to make a blueprint of the entire enterprise and ensure connections between business goals, applications, data, and systems. The connections remain synchronized throughout all levels of iterations and changes.
SAP Sybase PowerDesigner supports the “Data Architecture” design area within the Framework for CDQM.
3.6.2
SAP NetWeaver MDM
SAP NetWeaver Master Data Management (SAP NW MDM) is a key component 1
alongside with SAP MDG for managing and governing master data. The focus of SAP NW MDM is consolidating and harmonizing master data in application and system agnostic contexts to ensure trustworthy data for enterprise reporting in analytical scenarios, for example. Hence, SAP NW MDM is used as a platform to consolidate, cleanse and synchronize a single version of the truth for master data (“golden records”) within a heterogeneous application landscape (see Table 7). SAP NW MDM supports all domains for master data consolidation, harmonization, and stewardship, covering several business initiatives such as, mergers and acquisitions.
1
SAP MDG provides pre-built scenarios for central master data creation, maintenance, and governance for selected master data domains running in the SAP Business Suite (see Section 3.4).
Implementing the Framework for CDQM with SAP Solutions for Information Management
34
Typical challenges CDQM professionals encounter
“My company requires a consolidated single source of truth for all globally relevant master data”.
“We need to consolidate master data for specific domains across multiple systems, harmonize attributes on a global level, and provide them back to local units”.
SAP NetWeaver MDM
Key Features
Master data consolidation – SAP NW MDM supports data consolidation from disparate SAP and non-SAP systems; it can be used as a hub for integration and consolidation of globally relevant master data; once data is consolidated, it can be searched across linked systems in order to identify identical or similar objects across systems and provide key mapping for reliable company-wide analytics and reporting.
Master data harmonization – Following master data consolidation SAP NW MDM harmonizes the data to meet global quality standards; it ensures highquality master data by distributing harmonized data that is globally relevant using distribution mechanisms; subscribing applications can enrich master data with locally relevant information.
“Import wizard” for importing master data – Loads structures and data from other data sources while resolving and maintaining all dependencies during the process; a built-in scheduler allows batch execution of imports and deployments at specified times.
Workflow support – Ensures consistent approval processes with the right people involved at each step; notifications and escalations keep the process moving through to completion.
Stewardship control – Provides mechanisms for rules management, change management, user and role administration as well as security.
Tracking – Every change made to the data model and to master data is logged; this makes it easy to audit and report on changes.
Authoring area – Data creation and changes are performed in an authoring area; after workflow approvals are met, the new or changed data is moved to the release area; this protects “production data” from being changed until the change is fully approved.
Business Benefits
Single source of truth for business critical corporate data.
Systems can access master data using, for example, web services; this makes the data easily available to subscribing systems.
Supports multiple data domains on a single platform. Easily browse, maintain, search, filter, sort, print, email, edit, copy, delete and restore master data (master data resides outside of SAP Business Suite).
Table 7: SAP NetWeaver MDM at a glance
Implementing the Framework for CDQM with SAP Solutions for Information Management
35
Key Takeaways: SAP NetWeaver Master Data Management SAP NW MDM consolidates and harmonizes
Strategy
Strategy
master data across applications and systems.
It supports the several design areas of the
Organization
Controlling
Organization and People
Processes and Methods
Framework for CDQM.
Data Architecture Applications System
3.6.3
SAP Data Services
SAP Data Services provides core data management capabilities that form SAP’s data foundation. The product combines data integration and data quality capabilities. Another capability includes text data processing and text analysis. All capabilities are available for batch processing of large data sets, or can be used in real-time via web service calls. SAP Data Services extracts, transforms and loads data from one or more source systems into one or more target systems. Hence, SAP Data Services allows integrating, transforming, improving, and delivering trusted data to critical business processes for both SAP and non-SAP systems. SAP Data Services builds the foundation for many data related solution scenarios that require to move, enrich, transform or cleanse data (for data migration, data synchronization, loading data to data warehouses, data dashboard provisioning, or query reporting, for example). It can handle every type of data – structured (such as master data, for example), semi-structured (e-mails, blogs, or posts, for example), and unstructured (pictures, scanned documents, or videos, for example). More details are listed in Table 8.
Implementing the Framework for CDQM with SAP Solutions for Information Management
36
Typical challenges CDQM professionals encounter
“I need an all-in-one solution to transform, improve, monitor and deliver corporate data”.
“I need to develop a common understanding of a data model for the main business entities and make this knowledge accessible for all relevant stakeholders”.
“I want to load data of high quality into SAP HANA”.
SAP Data Services
Key Features
Support of many connectivity options – Accesses and integrates SAP and nonSAP sources and targets regarding both structured and unstructured data.
Integration capabilities – Provides seamless integration with SAP and non-SAP applications allowing to be used as a service, (checking for duplicates of addresses in SAP CRM, for example); also supports the loading of non-SAP data into SAP HANA or SAP NetWeaver BW.
Data Validation – Supports the definition of extraction, validation, and cleansing rules for data loads (in SAP and non-SAP).
Data Cleansing – Cleansing of records related to a variety of domains, such as business partners, materials, or services, that need parsing and standardization; correcting of data can be performed via reference data and cleansing rules; duplicates identified can be merged into one consolidated, best record.
Data Profiling – Determines the overall quality of data and finds data anomalies in order to discover problems.
1
Native text data processing – Unlocks the meaning from unstructured text data for increased business insight.
Intuitive business user interfaces – Guides through the process of standardizing, correcting, and matching data to reduce duplicates and identify relationships.
Business Benefits
Increased productivity and elimination of rework costs by using a single data quality and data integration application.
Enhanced business process efficiency with access to one version of the truth across the entire enterprise. Table 8: SAP Data Services at a glance
SAP Data Services enables connectivity to many applications and provides the capability to drive data migration projects. SAP’s solution for Data Migration (which is
1
While SAP Information Steward is the primary Information Management tool for business users doing data profiling, technical data profiling can be done directly in SAP Data Services (for data integration or data movement jobs in order to gain quick insight into embedded data sources, for example).
Implementing the Framework for CDQM with SAP Solutions for Information Management
37
mainly built on SAP Data Services) is based on predefined migration content (i.e. formalized understanding of the required fields, relationships between objects, etc.) and rules. It covers six major steps:
Analyze. It includes data profiling and getting to know data (understanding of data structures, for example) prior to migrating it to an SAP application.
Extract. After understanding the source data it can be extracted and placed in a staging area in SAP Data Services.
Clean. Data cleansing can range from applying simple rules (such as checking for null values, for example) to complex business rules supporting address cleansing, duplicate checking, and so on.
Validate. The validation phase begins with value mapping. This is necessary to map and convert specific values for the new applications (based on the content provided for data migration).
Load. Once value mapping has been performed and the data quality is considered “good”, the data is loaded into the SAP system based on the application’s requirements (via SAP NetWeaver loading mechanisms, such as ALE (Application Link Enabling), for example).
Reconcile. Once the data has been loaded, reconciliation is performed to check what was loaded versus what was planned to be loaded.
Key Takeaways: SAP Data Services SAP Data Services is a tool for extracting,
Strategy
Strategy
transforming and loading data from one or more source systems into one or more target systems.
Organization
Controlling
Organization and People
Processes and Methods
It combines data integration and data quality capabilities and supports several scenarios, such as data migrations.
Data Architecture Applications
It supports several design areas of the
System
3.7
Framework for CDQM.
Managing the Application Landscape
In order to operate CDQM efficiently, applications are required that support and automate the various CDQM tasks and activities (according to Section 2.6). Moreover, requirements for operational systems and Information Management applications have to be derived (for the management of data quality requirements, for ex-
Implementing the Framework for CDQM with SAP Solutions for Information Management
38
ample), and suitable SAP Information Management applications have to be evaluated, selected and, finally, implemented and rolled out. In the following, SAP’s new platform (SAP HANA) as well as SAP’s System Landscape Optimization is described:
SAP HANA (see Section 3.7.1)
SAP System Landscape Optimization (see Section 3.7.2)
3.7.1
SAP HANA
SAP HANA is SAP’s new platform for real-time analytics and business applications. The platform leverages state-of-the-art in-memory computing technology that enables real-time computing by bringing together online transaction processing applications and online analytical processing applications. Combining the advances in hardware technology with in-memory computing empowers the entire business, from shop floor to boardroom, by giving business processes instantaneous access to data. SAP HANA was developed in response to SAP customers’ requirements of dealing with high data volumes and the need for speed when analyzing large amounts of data. It gives companies immediate insight into large volumes of operational data by placing real-time decision making in the hands of the business user. The SAP Business Suite now runs on HANA, allowing customers to integrate analytics and transactions into a single in-memory platform. This enables significant performance gains (for Maintenance Resource Planning (MRP) runs, or profitability reporting, for example) and true insight-to-action with embedded transactions, reporting, analytics, and prediction. Further, it is a foundation for a new class of applications that respond to the challenges of “big data”. SAP HANA also becomes the foundation of SAP’s Information Management portfolio delivering enhanced value adding capabilities (such as high performance of duplicate checks, real-time consolidation of master data, avoiding dialog processes in batch, for example). SAP Master Data Governance (SAP MDG) is already available on HANA. With its unique performance and scalability, SAP HANA enables companies to run complex system landscapes, consisting of several operational systems, as well as data warehouses and master data management solutions on a single, shared in-memory platform. This leads to a new architecture approach that – from a data management and data quality perspective – shifts the focus from “after the fact data quality corrections in batch” to “real-time data quality processing at source”.
Implementing the Framework for CDQM with SAP Solutions for Information Management
39
Table 9 summarizes the key features and business benefits of SAP HANA. Typical challenges CDQM professionals encounter
“I need to support the analysis of large amounts of data as business is happening and ensure data quality at the same time”.
“I need to get actionable insights more quickly and to decide and act more simply”.
“I need to manage the quality of data at the source of creation and in real-time”. “I need to reduce time and effort for data aggregation”.
SAP HANA
Key Features
Multipurpose, in-memory technology – Instantly explores and analyzes all corporate data in real-time from virtually any data source (including the ability to run all enterprise applications on a single platform).
Real-time analytics – Analyzes corporate data using huge volumes of detailed information as business is happening.
Adaptable, powerful analytic models – Pre-build and adaptable analytical models on SAP’s Business Suite that expose analytic information at the speed of thought.
Extensive, source agnostic data access – Adds external data to analytic models to incorporate data from across the entire organization.
Built-in application logic – Significantly speeds up core ERP processes, such as Maintenance Resource Planning (MRP).
Embedded data management capabilities – Allows, for example, fuzzy search or duplicate checks against large data sets in real-time.
Business Benefits
Support of organizations to become a data driven business by allowing them to collect, consolidate and consume real-time data faster.
Help organizations become an innovation driven business by allowing them to rethink business processes as and when needed and invent new, smarter business models.
Provide organizations the means to become a people driven business by providing their people with actionable insights – on any device – to decide and act more simply.
Real-time data quality insight by analyzing business operations as they are happening and running operational reports inside the SAP Business Suite in realtime.
Better and quicker decisions by gaining immediate access to all relevant information.
Reduced effort for data cleansing and error detection. Dramatically reduced hardware and maintenance costs through a flexible, costeffective real-time approach for managing large data volumes. Table 9: SAP HANA at a glance
Implementing the Framework for CDQM with SAP Solutions for Information Management
40
Key Takeaways: SAP HANA SAP HANA is a multi-purpose in-memory
Strategy
Strategy
platform for integrated business applications, analytics, and data management.
Organization
Controlling
Organization and People
Processes and Methods
SAP HANA delivers enhanced, volumeoriented data management capabilities to the Information Management solutions (e.g. for real-time consolidation of master data, avoiding user dialog processes in batch).
Data Architecture Applications System
It gives companies immediate insight into large volumes of operational data, and it significantly speeds up existing business processes as it serves as a foundation for innovative business applications.
3.7.2
SAP System Landscape Optimization
SAP’s System Landscape Optimization services help analyse and transform the IT landscape by consolidating systems and migrating corporate data (see Table 10). This rationalization requires that a company examines its existing IT structures and invests in a lean and powerful IT foundation to support future transparent management of corporate data. Hence, SAP’s System Landscape Optimization services support the analysis, planning, and roll-out of the Information Management application landscape and its integration with the existing transactional and analytical application landscape. Depending on the project goal, the SAP’s System Landscape Optimization group offers various approaches for transforming and migrating corporate data. These approaches range from transferring one or more clients to a target system, merging several clients into a central solution, or transferring only a selected amount of data from a source to a defined target system. These different scenarios vary in the amount of data selected for final transfer, time and effort required, and the extent to which processes are adapted.
Implementing the Framework for CDQM with SAP Solutions for Information Management
41
Typical challenges CDQM professionals encounter
SAP System Landscape Optimization
“I am expected to identify and close the gap between the “as-is” and the “to-be” Information Management application landscape”.
Key Features
Client transfer service – Build up a multi-client environment. System client merge service – Set up a single-client environment. Process data migration for SAP applications – Transfer selected data into a defined target environment.
Business Benefits
More transparency and insight into the consolidated Information Management application landscape.
Ability to leverage data knowledge and data consistency.
Hybrid consolidation approach – and optional upgrades and Unicode conversion approaches.
High-quality data through harmonization and consolidation efforts. Lower administration costs thanks to a reduced number of data centers and systems.
Table 10: SAP System Landscape Optimization at a glance
Key Takeaways: SAP System Landscape Optimization SAP’s
Strategy
System Landscape Optimization service supports the analysis, planning and roll-out of the Information Management application landscape.
Strategy Organization
Controlling
Organization and People
Processes and Methods
It supports the “Applications” design area within the Framework for CDQM.
Data Architecture Applications System
Summary
42
4. Summary 4.1
Toolbox for Establishing Corporate Data Quality Management
Establishing CDQM as a corporate function within the enterprise takes time and requires a variety of resources. The Framework for CDQM (as introduced in Section 2) supports enterprises in developing the structures and practices required for CDQM. Successful implementation of the Framework and, consequently, of efficient and effective management of corporate data requires appropriate reference models, methods, and approaches as well as “out-of-the-box” solutions and software applications (technical solutions and products which are part of SAP’s Information Management portfolio; see Section 3) and services. The Competence Center Corporate Data Quality (CC CDQ) – which SAP is a member of – works in the field of CDQM and has developed a variety of instruments for sustainable implementation of CDQM. SAP, BEI St. Gallen, and the CC CDQ jointly offer a toolbox that integrates all strategic, organizational and technical aspects of CDQM in one comprehensive framework, which enables professionals to plan, implement and operate CDQM in an efficient and sustainable way. Table 11 summarizes the key features of this toolbox.
Summary
43
Tools provided by
Design Area within the Framework for CDQM
Method for implementing a
Strategy
CDQM Assessment & Strategy
corporate data quality strategy
Workshop
CDQM Benchmarking Controlling
EFQM Excellence Model for
SAP Information Steward
CDQM (maturity assessment)
Method for specifying business-relevant data quality metrics Organization & People
Reference model for data
SAP Master Data Governance
governance
Method for establishing data governance Processes & Methods
Data Architecture
Analysis and modeling method for integrating data quality in business process management
Method for master data integration
Design patterns for data architecture
SAP Master Data Governance SAP NW Master Data Management SAP Data Services SAP NetWeaver Information Lifecycle Management
SAP Sybase PowerDesigner SAP NW Master Data Management SAP Data Services (incl. SAP Solutions for Data Migration)
SAP Information Steward SAP Master Data Governance Applications
Reference model for master data quality management software
SAP HANA SAP System Landscape Optimization
SAP Information Steward SAP Master Data Governance SAP NetWeaver Information Lifecycle Management
SAP Data Services SAP NW Master Data Management Table 11: Toolbox for establishing Corporate Data Quality Management
Summary
44
SAP also offers Rapid Deployment Solutions for many of these products (SAP MDG, SAP Information Steward, SAP NetWeaver MDM, SAP Business Objects, and more). A Rapid Deployment Solution includes pre-configured software and implementation services (with a defined scope and at predictable costs) addressing most pressing business needs and laying the foundation for future expansion. 4.2
Resources
Several publications, tools, and techniques exist which can be used for effectively supporting the implementation of the Framework for CDQM with SAP Solutions for Information Management. Please visit the following resources in order to get access to supplementary material (see Table 12). Resource
Link
Business Engineering Institute St. Gallen
http://www.bei-sg.ch
CC CDQ Benchmarking Platform
https://www.benchmarking.com
CC CDQ Community at XING
http://www.xing.com/net/cdqm
Competence Center Corporate Data Quality
http://cdq.iwi.unisg.ch
SAP Business Rule Framework plus
http://scn.sap.com/docs/DOC-8824
SAP Community Network for SAP Enterprise Master Data Management
http://scn.sap.com/community/mdm
SAP Community Network for SAP Master Data Governance
http://scn.sap.com/community/mdm/master-datagovernance
SAP Community Network for SAP NetWeaver Master Data Management
http://scn.sap.com/community/mdm/netweavermdm
SAP Solutions for Information Management
http://www54.sap.com/solutions/tech/enterpriseinformation-management/software/masterdata/index.html
SAP Master Data Governance
http://www54.sap.com/solutions/tech/enterpriseinformation-management/software/master-datagovernance/index.html
SAP NetWeaver Decision Service Management
http://scn.sap.com/docs/DOC-29158
SAP Rapid Deployment Solutions
http://www.sap.com/solutions/rapiddeployment/index.epx
Table 12: SAP, BEI and CC CDQ resources
Summary
4.3
45
Contact Persons
Please feel free to contact us in order to receive further information on how to implement the Framework for CDQM with SAP Solutions for Information Management.
Prof. Dr. Hubert Österle University of St. Gallen Competence Center Corporate Data Quality
[email protected]
Prof. Dr. Boris Otto, Assistant Professor University of St. Gallen Competence Center Corporate Data Quality
[email protected]
Dr. Dimitrios Gizanis Business Engineering Institute St.Gallen AG Competence Center Corporate Data Quality
[email protected]
Gerd Danner SAP AG Solution Management Information Management
[email protected]
List of Abbreviations
46
5. List of Abbreviations ALE
Application Link Enabling
ALM
Application Lifecycle Management
APO
Advanced Planner and Optimizer
BEI
Business Engineering Institute
BPM
Business Process Management
BW
Business Warehouse
CC CDQ
Competence Center Corporate Data Quality
CRM
Customer Relation Management
CDQM
Corporate Data Quality Management
DQM
Data Quality Management
ECM
Enterprise Content Management
ERP
Enterprise Resource Planning
ETL
Extract, Transform, Load
HANA
High Performance Analytic Appliance
ILM
Information lifecycle management
KPI
Key Performance Indicator
IT
Information Technology
ITIL
Information Technology Infrastructure Library
IWI-HSG
Institute of Information Management of the University of St. Gallen
MDG
Master Data Governance
MDM
Master Data Management
NW
NetWeaver
PLM
Product Lifecycle Management
SBVR
Semantics of Business Vocabulary and Rules
SLA
Service Level Agreement
Information about SAP, BEI, and IWI-HSG SAP, Business Engineering Institute St. Gallen (BEI), and the Institute of Information Management at the University of St. Gallen (IWI-HSG) are working together in the Competence Center for Corporate Data Quality (CC CDQ), in the course of which this white paper has been jointly by the three partners.
SAP AG SAP is the world’s leading provider of business management software, offering applications and services that enable companies of all sizes and in more than 25 industries to become best-run businesses. SAP has more than 183,000 customers in over 120 countries. SAP Solutions for Information Management transform the way the world works by connecting people, information and businesses. It enables organizations to close the gap between business strategy and execution. For more information, visit http://www.sap.com.
Business Engineering Institute St. Gallen AG The Business Engineering Institute St. Gallen AG is a spin-off of the University of St. Gallen and was founded in 2003 by Prof. Hubert Österle. The CDQ division within BEI combines applied research with reliable management consulting services. In close cooperation with the Institute of Information Management at the University of St. Gallen (IWI-HSG) as well as together with subject matter experts from industry and software vendors, BEI advances applied research solutions and delivers tangible results in the areas of strategies for CDQM, corporate data governance, corporate data architectures, and data quality scorecards, among others.
University of St. Gallen The Institute of Information Management at the University of St. Gallen (IWI-HSG) is the coordinator and scientific director of the Competence Center Corporate Data Quality, which is a collaborative research program comprising renowned enterprises from various industries. The program focuses on Corporate Data Quality Management. Results are based on scientific state-of-the-art knowledge and are discussed, improved and tested in cooperation with large companies acting on a global level. The core objective and guiding principle of the program is to transfer theoretical, preliminary work and scientific research results into everyday organizational and business practices of enterprises. For more information, visit http://cdq.iwi.unisg.ch.