Total Data Quality Management and Total Information Quality Management Applied to Costumer Relationship Management Maritza M. C. Francisco1, Solange N. Alves-Souza2, Edit G. L. Campos3, Luiz S. De Souza4 1,2
Escola Politécnica da Universidade de São Paulo (USP) 3 Instituto de Pesquisa Tecnológica (IPT) 4 Faculdade de Tecnologia (FATEC) São Paulo, Brasil
[email protected] ,
[email protected],
[email protected],
[email protected]
risk prevention, and increasing business effectiveness [1]. The importance of data quality (DQ) projects has increased substantially, mainly in companies in which CRM (Customer Relationship Management) is used for client retention and profit increase. Such systems depend on reliable data to establish assertive contacts with customers. In this scenario, DQ becomes a precondition to warranty the data values, so that its use may be effective and promote the company competitiveness. Data quality management increasingly gains importance in the current corporate scenario, stimulating the creation of a data management maturity model [2] and of a reference guide for data strategic management, such as, DMBOK (Data Management Body of Knowledge) [3]. In both references, data is treated as a strategic resource and, its quality is supported as a key success factor for its effective use. The specialized literature proposes different DQ methodologies [4], such as, TDQM (Total Data Quality Management), TIQM (Total Quality Information Management), DWQ (Data Warehouse Quality Methodology), AIMQ (A methodology for information quality assessment), HDQM (Heterogeneous Data Quality Management). This paper analyzed TDQM and TIQM methodologies. Some factors, including those as follows, were chosen to select such methodologies: They are the two seminal methodologies, from which others arise; They are the two methodologies supported by most data quality tools, facilitating their application; They support all DQ cycles, but not a specific phase, such as AIQM that is restricted to the data quality assessment phase; They are generic, i.e., they can be implemented in any environment and are not intended to any specific context, such as, DWQ, which is focused on Data Warehouse environment. HDQM [5] is further mentioned. This paper shows the results of TDQM and TIQM application in a CRM application, emphasizing both methods' strengths and weaknesses. In the applicability assessment of each methodology, the concern was to measure the difficulty, efficacy, and completeness level for solving the DQ problems identified in the CRM studied. The comparative analysis was performed considering the consistent methodology aspects and the needs of the CRM studied. Thus, by analyzing the methodology guidelines and the environment concerned, five aspects were evaluated: (1) organizational structure, (2) data and metadata structure, (3) data
ABSTRACT
Data quality (DQ) is an important issue for modern organizations, mainly for decision-making based on information, using solutions such as CRM, Business Analytics, and Big Data. In order to obtain quality data, it is necessary to implement methods, processes, and specific techniques that handle information as a product, with well established, controlled, and managed production processes. The literature provides several types of quality data management methodologies that treat structured data, and few treating semiand non-structured data. Choosing the methodology to be adopted is one the major issues faced by organizations, when challenged to treat the data quality in a systematic manner. This paper makes a comparative analysis between TDQM – Total Data Quality Management and TIQM – Total Information Quality Management approaches, focusing on data quality problems in the context of a CRM – Costumer Relationship Management application. Such analysis identifies the strengths and weaknesses of each methodology and suggests the most suitable for the CRM scenario.
CCS Concepts
• Applied computing ➝ Enterprise information systems➝ Enterprise applications • Applied computing➝ Enterprise data management • Information systems➝ Data management systems.
Keywords
Data quality; data quality dimensions; data quality management; data quality problems; data quality methodology.
1. INTRODUCTION
Information Technology is evolving, offering technological and computer resources that process a variety of large data volumes in a quick and reliable manner, generating opportunities for managers to improve organizational strategies, promoting innovation, service improvement, market behavior predictability, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. ICIME 2017, October 9–11, 2017, Barcelona, Spain © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-5337-3/17/10$15.00
https://doi.org/10.1145/3149572.3149575 40
quality assessment, (4) data re-engineering, and (5) continuous improvement. In order to evaluate each aspect, questions and metrics adapted from [6] were used. Few papers show the applicability of the different methodologies in a real environment. In addition, instantiating a methodology is not a simple task; in general there is a lack of information or it do not present enough details that facilitate its application. Therefore, this study contributes by showing the steps followed for performing each methodology, besides the facilities and difficulties faced in this application. Although the TDQM and TIQM have been conceived for some time, they have a structure that can be used or adapted to current needs. In addition, the description of the methodologies allows using them to structure environments in which data is essential to the business. The paper is organized as follows: section 2 provides the main aspects of the methodologies used; data quality dimensions and previous work details; section 3 details the methodology application. Section 4 comparatively analyzes the TDQM and TIQM application. Section 5 presents the conclusion.
involving a set of activities or tasks, which are organized in 5 processes, must be used, as follows: Process P1 – assess the data meaning and the data structures - it refers to the need to know the data definition to measure the database information quality. Process P2 – assess the information quality - it analyzes the quality by managing information as an asset. Process P3 – measure the cost resulting from non-quality information - it identifies business performance measures, calculates the cost of related information and of missing information quality, identifies customer segments, calculates the value for the customer, and calculates the information value. Process P4 – perform data re-engineering and cleaning - it improves the information product acting on the symptoms of the missing quality. In this process, the defective data is changed into data with an acceptable quality level. Process P5 – improve the process quality involved in information generation - it verifies the problems found during the quality assessment phase, analyzes the causes, plans and implements improvement processes to prevent defects, acts directly on the problem causes. The improvements can include changes in business processes and in information systems or applying the data improvement process itself. Process P6 - establish an information quality environment (Action Plan) - it represents the guidance and cultural requirements required to sustain a continuous improvement environment of information quality. Therefore, this is the basis of the other processes and defines the action plan for implementing DQ. Data and metadata architecture aspects are relevant for TIQM and guide the data quality treatment.. However, it uses non-clearly defined formulas to calculate the data importance value. Another important but not observed methodology aspect is the role and responsibility definition that each professional [4] should have in process performance.
2. DATA QUALITY METHODOLOGIES 2.1 TDQM
TDQM [7] adopts the information perspective as a product, which has a defined production cycle. In this cycle, the roles of Collectors (those who create or collect data and information), Custodians (who design, develop, and maintain information systems and infrastructure), Custodian - Information Quality Managers (those responsible for information quality management throughout the lifecycle), and Customers (those who use the information in their work) are identified. According to the TDQM, analyzing the causes of existing problems is required to define the improvement plan. This analysis is possible only when a measure of such problems is obtained under the quality requirement perspective, which needs to be defined, measured, analyzed, and improved. TDQM can be summarized by four basic processes, as follows: (i) Define: it aligns data quality purposes to the company's strategic purposes, identifies and analyzes the information lifecycle that characterizes its production process, identifies the quality perceived and desired by information consumers, and evaluates the user satisfaction in relation to data quality. (ii) Measure: it defines objective metrics for data quality, formulas and applies such metrics to different data sources and information lifecycle points. (iii) Analyze: it identifies the quality problem causes in order to support quality improvement planning, which may involve both data cleansing and process redesign. (iv) Improve: it prioritizes key areas and elaborates an improvement process planning. Thus, TDQM is a methodology that focuses both on the information production process, by identifying its lifecycle, and on the data content assessment. However, it does not thoroughly specify organizational issues that directly influence DQ. The methodology does not define a route that clarifies the management levels to be achieved, since it is not based on the management maturity model [2].
2.3 Data Quality Dimensions
DQ literature provides a complete classification of data quality dimensions [9] [1] [10] [11] [8] [12] [13] [14] [15] [16]. However, there is no complete consensus either on the total dimensions or on their definition [4]. The DQ problems are those noticed by its users. Thus, DQ dimensions are used within a context according to the problem identified in that context. The dimensions impacted by the problems identified in the CRM were: Accuracy: Correct, error-free data, consistent representation and content [11] [13]. Timeless: The data is current enough to be used in the specific customer task [10]. Completeness: Data enough for use [10] [13]. Reliability: Concept related to data credibility and in the system that stores and manipulates it. Also, related to the reliability regarding some reference source [13]. Representation: Ease of understanding and interpreting, consistent and concise representation [10]. Table 1 presents the relationship between the data quality problems found in CRM and the impacted dimensions.
2.2 TIQM
TIQM [8] considers that a data quality project is beyond a data improvement or cleaning process. It is a process that involves the entire organization; everyone must be responsible for the continuous improvement process. An action plan (process 6)
41
Table 1. DQ problem in CRM x DQ dimension Problems
Dimensions
Registrations in duplicate Several registrations related without identifying which one is valid. Incomplete data
Reliability, accuracy, consistency Reliability, accuracy, effectiveness, timeless.
A single field storing several data concepts Non-standardized data Typing errors
3. EVALUATION OF APPLICABILITY TO CRM APPLICATION: TDQM AND TIQM 3.1 CRM – Real Case
A mobile telephone company is the target of this study. This company needs to retain customers by providing better quality services, cheaper taxes, plans more adequate to customers' needs to allow assertiveness in direct and indirect contact strategies with each of them. DQ problems and the need to use a methodology to improve the data was identified due to a proposed project that would use marketing email as an effective channel for disclosing new company plans and promotions. For that purpose, a customer behavior profile study was conducted for developing more economic and attractive plans for customers.
Completeness, reliability, accuracy, added value, relevance Completeness, reliability, accuracy, added value, relevance, homogeneity, ease of understanding Reliability, accuracy, representation consistency Accuracy, reliability, reputation added-value,
The data quality problems identified by the user area and that prevented this strategy performance were: Impossibility to use email marketing. Customers did not have a registered email. Some customers had several addresses, which hindered identifying which one was the most current. Customers in duplicate, whose names were spelled differently, along with spelling errors in customer names and addresses, were identified.
2.4 Related work
The literature presents papers that analyze several data quality methodologies. [4] [17] compare the main methodologies for assessing and improving data quality, considering their phases and steps, data quality dimensions and metrics, strategies and techniques, data types and types of information systems regarded by the methodology. Despite the good survey of several methodologies made by [4] [17], highlighting their main characteristics, no evaluation regarding their application in a real case is made. In this way, the comparison presented is conceptual but not practical.
In order to obtain a real data quality assessment (baseline), from which the priorities for action would be defined [18], a preliminary study was conducted with the following purposes: Identifying those persons involved in the production of customer registration data. Mapping the existing data architecture. Confirming DQ problems presented by user areas and identifying others. Identifying the potential causes of identified problems.
Cai and Zhu [14] propose a model for assessment quality in Big Data. However, their proposal does not present the other stages of data quality management methodology. Britto and Almeida [16] elaborate a guide to implement data quality exclusively in DW environment. They discuss challenges, problems and dimensions for DW, but no exploration of their guide to other contexts are done.
The information product chosen for the preliminary study was the customer registration form, containing identification data, customer contact and qualification. The fields included were: CPF [Individual Taxpayer's Registry], name, gender, date of birth, (full) address, phone number, email, marital status, level of education, monthly income, and job. As a result of the study, registrations in duplicate, missing identification of more current registrations, incomplete or missing data, non-standardized data and typing errors were identified
In [5], the HDQM methodology is applied to an environment whose main business focus is the wireless handheld devices sale and associated services for customer request registration from restaurants, pubs and bistros. HDQM defines a meta-model for describing and conceptually representing the important elements of a context, which represent organizational units, conceptual entities, resources, processes, dimensions, and quality metrics. The methodology is divided into 3 phases: (i) state reconstruction (reconstruction of knowledge about all the elements involved in the Organization), (ii) evaluation (quantitative assessment obtainment of quality problems), (iii) improvement (improvement activity selection based on the DQ dimension/cost ratio). In this case study, only problems associated with accuracy and timeliness dimensions were identified. Thus, considering only these dimensions, it is proposed metrics for DQ measurement for structured, semi-structured, and unstructured data identified in the application context. Considering the CRM context contemplated herein, HDQM could have been used. However, TDQM and TIQM were preferred since they are more detailed regarding the support for surveying the DQ problems verified. In addition, data from the study scope are structured, with no need to review or to extend metrics for semi and unstructured data.
Figure 1. CRM Architecture Mapping The existing architecture mapping (Figure 1) was performed based on interviews, especially with system analysts and DBAs. Support infrastructure systems and technologies, points of
42
integration between the systems and the databases involved were thus identified. As shown in Figure 1, there were three source systems (Systems A, B, and C) that had individual customer tables in which data registration occurred.
case, the metrics were based on [18], which was chosen for being an integral part of the TDQM research group, and the concepts in [10], as a standard for defining QD concepts.
3.2.1 TDQM Table 2 shows the compilation of the assessment results.
System A - Line Sales System, first registration point of basic customer data. From this system, System B was supplied. System B - Account Control System. By a batch replication process, it receives the basic customer information from System A and additional services from System C. When requested through the customer service, address updates were performed directly in this system, although without consistency with system A. System C - Sales System for Other Services. This system was not directly integrated with System A, but integrated with System B, as already detailed. DW - DW was supplied by extraction, transformation, and load (ETL) routines that operated at night. Customer DM: Similar to the DW, the customer DM was loaded through the ETL process and served as a database for the company's CRM. Analysis and CRM applications: Applications that allow customer data analysis and segmentation generation. The plans and promotions are wrapped in campaigns that, for example, are sent by means of direct mail, inserted in the company's website, among other media.
Table 2. TDQM applicability assessment Step Define Measure Evaluate Improve
DL Low Medium Low Medium
EF Satisfactory results Satisfactory results Satisfactory results Satisfactory results
CR Complete Complete Complete Complete
See the following notes in relation to the CT: For the Measure step, it does not present metrics guided to DW and DM environments and metadata assessment. For the Evaluate step, it does not consider causes and assumptions related to metadata. For the Improve step, since improvement actions for metadata have not been defined, there was no prioritization and targeting for such issues. TDQM suggests the use of many techniques, so the difficulty level for its application was low. Yet, as the suggested techniques did not guide all such methodology steps, some activities required complementary references to be performed, such as the activities of the Measure step. Therefore, it was deemed complete for the Define step, only. It was possible to carefully analyze the DQ problems, finding their causes and directing improvement actions. For this reason, the methodology was deemed effective in its application. However, some weaknesses were identified, as follows: The literature on TDQM [7] [12] [11][[19] [20] does not explicitly approach the metadata issue, which is an essential point for data quality [6] [19] [21] [22]. Besides, there is no suggestion of metrics for data quality assessment, and complementary reference is required. In addition, it does not treat issues related to costs.
3.2 Applicability Assessment
The individual analysis of each methodology allowed verifying the management of CRM DQ problems. The criteria for this assessment were the level of difficulty in the methodology implementation, its efficacy, and completeness, measured as follows: Difficulty level (DL): High: Missing detail, generic activity, and lack of conditions required to perform the activity. Medium: Some detail, activity with a specific focus, and existing conditions required to perform the activity. Low: Existing detail, activity with specific focus, and existing conditions required to perform the activity. Efficacy (EF): Satisfactory results: If all the results expected from the methodology were achieved, ensuring the process purpose and activity fulfillment. Partially-satisfactory results: At least one result was not achieved. Unsatisfactory results: No result was achieved. Completeness: CR: Regarding the activity and/or processes steps proposed by the methodology: it is deemed complete if it was performed in its entirety, and incomplete if at least one step has not been performed. CT: Regarding the data quality problems identified in the context of the studied CRM: it is deemed complete if all problems are treated, including the identification of causes related to the problems, data correction and the improvements to prevent the recurrence of problems, and incomplete if at least one problem has not been treated. All data quality assessment processes are performed considering the dimensions demonstrated in section 2 - C. The metrics for each dimension were those defined by the methodology itself (TIQM), or by complementary reference (TDQM). In the latter
3.2.2 TIDM Table 3 shows the compilation of the assessment results. Table 3. TIQM applicability assessment Step Process 1
DL Low
Process 2
Medium
Process 4
Low
Process 5
Medium
Process 6
High
EF Satisfactory results Partiallysatisfactory results Satisfactory results Partiallysatisfactory results Unsatisfactor y results
CR Full Incomplete
CT Not applicable Not applicable
Incomplete
Complete
Incomplete
Complete
Incomplete
Not applicable.
Although the results of its application were partially satisfactory or even unsatisfactory for some activities (Table 3), CRM DQ problems were duly identified and solutions for their respective causes were treated. Thus, it was deemed effective. The
43
assessment of aspects related to metadata, the DW environment, and the emphasis given to data cleaning are the strengths of this methodology, especially for the context applied. Therefore, the TIQM application result was deemed effectual. Although the methodology brings an entire process intended to analyze the costs involved with the DQ, the available literature on the TIQM studied does not clearly detail the obtainment of such costs.
e) Does it define the DQ visibility from the user point of view? f) Does it define information methodology as a product with a defined lifecycle? g) Does the methodology offer details on how to involve and to organize the company for the DQ works? h) Is it mandatory to have experts in DQ for applying the methodology? Table 4. Comparative Analysis Aspects Criterion
3.3 Analysis
Activities that should be identified
Organizational - It structure changes in operational processes, structure area creation, and responsibility delegation to ensure that data quality is approached in all points in the data and information lifecycle; - Individuals involved in producing data and providing information are trained; - Ensure that the data quality program has a corporate scope and the commitment of all management levels. Data and - It aims at periodically assessing the data metadata structures in the database to identify if there are structure failures in the project and in the application of the modeling technique adopted; - It assesses the metadata that follow the business standard development; Data quality - From time to time, it assesses the data quality assessment for identifying DQ problems; - It covers data quality at an individual level (isolated system) and at corporate level (integrated systems, as DW); Data re- - Activities of data cleaning and those refining engineering the causes of the problems identified; Continuous - Activities that plan the production process improvement improvement of the information products.
From the assessment carried out, it is possible to affirm that for implementing any DQ methodology, the company needs to organize itself and to learn DQ concepts. In general, TDQM was deemed objective and detailed concerning techniques for its application. For some of its activities to be performed (dimensions and metrics definition), other references were used, though. In order to implement TDQM, the involvement of the various company areas can be gradual, due to the prioritization of the information products to be submitted to the DQ management, allowing implementation and incremental investment. TIQM is well detailed for issues related to the data and metadata structure, which facilitated its application. The same is applicable to activities related to data re-engineering, its strength. However, it was very generic regarding the activities related to the organizational structure assessment and to the data quality costs, and there is not enough detail for implementing these activities. In addition, although there is a slightly higher level of detail for improvement activities, it is not very simple to execute them. Due to the last point, the participation of at least one expert in the methodology was considered important to better guide the implementation work. At the end of the process, it was concluded that standard defining dimensions and respective DQ metrics must be used along with the methodology chosen for the improvement DQ. DQ methodology chosen may not have some or all dimensions and/or metrics required for the context concerned. On the other hand, the standard tends to be complete, filling the methodology gap and allowing a regular management regarding DQ dimensions.
Considering that TDQM has 4 steps and TIQM has 6 processes, an association between the activities of each methodology and comparative assessment criteria was required. Table 5 presents this association.
4. COMPARATIVE ANALYSIS 4.1 Assessment Method
Table 5. Association between TDQM and TIQM Methodologies structures Criterion Organizational structure Data and metadata structure Data quality assessment Data re-engineering Continuous improvement
The assessment aimed to compare the use of each methodology and to verify which one was more adequate for the CRM environment studied. The applicability and depth at which each methodology treats the defined assessment criteria was compared. The applicability was assessed considering the difficulty level (DL), efficacy (EF), performance completeness (CR) and management completeness of the studied CRM (CT) problems. Each applicability criterion was assessed according to the score: (i) DL – low: 2; medium: 1; high: 0 (ii) EF – satisfactory: 2; partial: 1; unsatisfactory: 0 (iii) CR and CT - complete: 1; incomplete: 0. The comparative criteria were defined considering the methodology common points and the factors that directly influenced the solution of problems related to the CRM shown. Thus, five assessment criteria were defined, as per Table 4, and each criterion was assessed by questions adapted from TIQM. Examples of questions used for Organizational Structure are: a) Does it define activities that guide the DQ process definition? b) Does it define a DQ team/responsible area? c) Does it define training activities related to the DQ? d) Does it define the DQ program inclusion in the organization strategies, with defined purposes?
TDQM Define Measure Measure and Assess Improve Improve
4.2 Assessment and Comparison
TIQM Process 6 Process 1 Process 2 Process 4 Process 5
The individual applicability assessment of each methodology was presented in section 3. According to this assessment, considering the score and the association between the methodologies, besides the comparison criteria, Table 6 shows the results. The isolated applicability assessment of the approaches shows that TIQM is slightly better than TDQM for the context studied. Since TIQM immediately treats the problems identified, it appears to be more effective as it has strengths of the structures data and metadata, encompassing DW and DM and data re-engineering. On the other hand, for a long-term data quality treatment, TDQM is more appropriate, since it deals with organizational aspects and continuous improvement with greater objectivity and detail level than TIQM.
44
LLC, 2009. C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, “Methodologies for data quality assessment and improvement,” ACM Comput. Surv., vol. 41, no. 3, p. 16:1– 16:52, 2009. [5] C. Batini, D. Barone, F. Cabitza, and S. Grega, “A Data Quality Methodology For Heterogeneous Data,” Int. J. Database Manag. Syst. ( IJDMS ), vol. 3, no. 1, 2011. [6] L. P. English, Information quality applied : best practices for improving business information, processes, and systems. Wiley, 2009. [7] R. Y. Wang and R. Y., “A product perspective on total data quality management,” Commun. ACM, vol. 41, no. 2, pp. 58–65, Feb. 1998. [8] L. English, “Total Information Quality Management,” DM Rev. Mag., 2003. [9] C. Batini and M. Scannapieca, Data quality : concepts, methodologies and techniques. Springer, 2006. [10] ISO/IEC 25012:2008, “Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) Data Quality Model.” ISO/IEC, Switzerland, 2015. [11] Y. R. (Yng-Y. R. Wang, M. Ziad, and Y. W. Lee, Data quality. Kluwer Academic Publishers, 2002. [12] L. L. Pipino, Y. W. Lee, and R. Y. Wang, “Data quality assessment,” Commun. ACM, vol. 45, no. 4, p. 211, Apr. 2002. [13] J. E. Olson, Data quality : the accuracy dimension. Morgan Kaufmann, 2003. [14] L. Cai and Y. Zhu, “The Challenges of Data Quality and Data Quality Assessment in the Big Data Era,” Data Sci. J., vol. 14, no. 0, 2015. [15] N. Abdullah, S. A. Ismail, S. Sophiayati, and S. M. Sam, “Data Quality in Big Data: A Review.,” Int. J. Adv. Soft Comput. Its Appl., vol. 7, no. 3, pp. 16–27, 2015. [16] M. S. Britto and J. R. J. Almeida, “Data Quality for Data Warehouse - Implementation Guide,” in 3o CONTECSI International Conference on Information Systems and Technology Management, 2006. [17] C. Batini and M. Scannapieco, Data and information quality : dimensions, principles and techniques. Springer, 2016. [18] Y. W. Lee, Journey to data quality. MIT Press, 2006. [19] E. G. L. Campos, V. L. R. Y. Shidomi, S. N. A. Souza, J. Pavon, and L. S. DeSouza, “Calidad de Datos – De La Teoria a La Práctica,” in The 5th IBIMA International Conference on Internet & Information Technology in Modern Organizations, 2005, pp. 414–420. [20] R. Wang, T. Allen, W. Harris, and S. Madnick, “An information product approach for total information awareness,” in 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652), vol. 6, p. 6_3005-6_3020. [21] B. Stvilia, L. Gasser, M. B. Twidale, S. L. Shreeves, and T. W. Cole, “Metadata Quality For Federated Collections,” in Proceedings of the Ninth International Conference on Information Quality (ICIQ-04), 2004. [22] K.-S. Ryu, J.-S. Park, and J.-H. Park, “Data Quality Management Maturity Model,” ETRI J., vol. 28, no. 2, pp. 191–204, 2006.
Table 6. Association between TDQM and TIQM Methodologies structures TDQM applicability Criterion DL EF CR CT Total Organizational structure 2 2 1 1 6 Data and metadata structure 1 1 1 0 3 DQ assessment 1 1 1 0 3 Data re-engineering 1 1 1 0 3 Continuous improvement 1 2 1 0 4 Total 19 Criterion Organizational structure Data and metadata structure DQ assessment Data re-engineering Continuous improvement Total
TIQM applicability DL EF CR CT 0 0 0 0 2 2 2 2 1 1 0 0 2 2 2 2 1 1 0 1
[4]
Total 0 8 2 8 3 21
5. CONCLUSION
In general, TDQM was deemed objective and detailed concerning the techniques for its application. Aspects that should be highlighted: (a) the involvement of the several company areas can be gradual; (b) over time, the methodology changes the data quality treatment into proactive actions; (c) weaknesses: No explicit metadata, metrics treatment or issues related to the costs involved with poor data quality. Regarding TIQM the following is highlighted: (a) attention to data and metadata structures, and data re-engineering; (b) weakness: the participation of one expert, at least, in the methodology is mandatory. The mix of methodologies, using the best of them, can be a good strategy for implementing DQ processes in an organization. In the case studied, both methodologies treat information as a product, which facilitates their integrated use. Another point that should be highlighted is that any methodology whatsoever used to improve data quality is essential to use appropriate software tools that support the assessment, cleaning and re-engineering data process, among others [2] [3]. This paper is restricted and was developed in the light of a CRM, focusing on customer registration data, not including quality problems in business operations data. In addition, the comparison evidenced only the strengths and weaknesses of the methodologies used, not analyzing the possibility to integrate the proposals, building a unified DQ management methodology. Then, this is integration, considering structured and non-structured data, is viewed as a continuity of this work. The following is further suggested: DQ approach comparative analysis for data quality treatment in other environments, such as big data [17]. Development of a proposal to measure the costs related to data quality as a way to measure the financial returns brought by DQ.
6. REFERENCES
[1] B. Saha and D. Srivastava, “Data quality: The other face of Big Data,” Proc. - Int. Conf. Data Eng., pp. 1294–1297, 2014. [2] CMMI Institute, “Data Management Maturity (DMM) Model,” no. August, 2014. [3] DAMA INTERNATIONAL, The DAMA guide to the data management body of knowledge, First. Technics Publications,
45