Knowledge Discovery in Databases on the Example ...

Knowledge Discovery in Databases on the Example of Engineering Change Management Armin Sharafi1, Petra Wolf1, Helmut Krcmar1 1

Chair for Information Systems I17, Technische Universität München, Boltzmannstr. 3, 85748 Garching bei München, Germany {armin.sharafi, petra.wolf, krcmar}@in.tum.de

Abstract. The objective of this research is to apply Knowledge Discovery in Databases to derive patterns in Engineering Change Requests. Our work describes a use case with the claim to improve Product Development Processes. The approach was accomplished by maintaining a business understanding in the context of the automotive industry. To gain a better sense of the domain we conducted Interviews to be able to select a relevant test data set. We modelled a Text Mining process to analyse whole set of records. The results show that requests have similarities in the content, e. g. cause of the Engineering Change. These similarities can be used as a source to develop further assistance for change request applicants. To reach this goal a Text Mining process can now be applied on a data set of an automobile manufacturer containing Engineering Change Requests of last ten years. Keywords: Knowledge Discovery in Databases, Text Mining, Automotive Product Development, Engineering Change Management

1

Introduction

The application of Knowledge Discovery in Databases (KDD) is mainly carried out in domains such as financial services (granting of credits), electricity supply industry (load forecasting) and marketing and sales [1]. Its application for product development purposes and in particular for Engineering Change Request (ECR) is new and unexplored. This research work describes a use case to apply the process of KDD for product development purposes. Taking a look on high growth markets with short product life cycles, time of Product Development Process (PDP) can have a significant impact on profits [2]. As a consequence it gets harder for a conventional PDP to fulfil rapidly changing requirements [3]. Especially the environment of automotive industry is subject to constant change of conditions (e. g. technologies, laws, competition) and as consequence has to change its products, processes and production frequently [4]. PDP in engineering has special characteristics, such as involvement of multiple companies, high information need and creativity. Therefore this domain requires coordinated actions to comply with e.g. control of complexity, planning reliability and regulatory frameworks.

2

Armin Sharafi1, Petra Wolf1, Helmut Krcmar1

The execution of Engineering Change Requests (ECR) is one representation for such requirements in PDPs. Development processes, inspection processes and additional related activities are summarized in the concept of Engineering Change Management (ECM) [5, 6]. It includes the management and monitoring of organizational change processes [7, 8]. Engineering Changes must be seen as modifications of already shared documents or parts [9]. They are reflected ultimately in the products and their design and production processes [4]. Especially in complex and large-scale projects it is necessary to go through iterative processes. But the pure creation of such a request consumes time and other resources and is not value adding. We believe that ECM and especially the creation of ECRs can be supported by surfacing hidden, yet valuable knowledge. That can be found in the ECM history. Improvements in knowledge creation and data management can accelerate development results. Organisations can therefore achieve competitive advantages by considering the fact, that the output of any PDP is not only the product itself (which is the output of the production process), but information and knowledge about, and how to produce the future product. Such artefacts generated during the development process are often collected in an integrated product information model. Therefore it is quite evident that deliberations to benefit from the application of KDD are obvious. KDD can therefore be used to detect analogies between ECR and to derive patterns. Therefore the goal of this research is to discover pattern in the historical data of ECR to develop organisational and technical concepts to support the creation of otherwise time consuming requests. Submitted change requests and their categorisation as a result of the analysis are helpful to identify characteristics as a basis for recurring elements of an ECR or to identify templates for ECRs. They are used to generate new change requests faster and with higher data quality. Based on the results of the KDD process and the qualitative interviews, support concepts will be developed in the next step. This research will be guided by the following research question: What is a proper approach to analyse data of historical ECRs and which patterns can be found?

2

Related Work and Research Approach

The process model of Fayyad/Piatetsky-Shapiro/Smyth [10] lends to structure the KDD process (see Fig. 1). It is inherently iterative, starting with the selection of data, preprocessing, transforming it into a more manageable format, applying data mining algorithms over it and interpreting the results. We expanded the process by adding Business Understanding as first step, which is one phase in the CRISP-DM model [11]. It is essential for a successful implementation of a KDD project and serves as a reference for the implementation of data mining projects. The model includes all necessary phases and tasks and their relations to each other in the form of a (cyclic) process model.

Knowledge Discovery in Databases on the Example of Engineering Change Management 3

Business Understanding

Fig. 1. Extended overview of the steps that compose the KDD process following [10] Our use case is a large German automobile manufacturer, which has a complex, federal and distributed ECM process supported by information systems. The manufacturer develops several automobiles every year and therefore has a large product development division. In such companies the number of ECRs has reached dimensions that make it impossible to manage them manually. The process to execute ECRs is therefore predefined and supported by an IT system. Analysing our use case we figured out, that about 70 new ECRs are written per day. Each of these ECRs contains data about following characteristics: Table 1. Attributes of ECR data

ID

description

status

project

implementation date

cause of change

solution

benefit

date of creation comment

In order to gain a better understanding of the domain, guideline based interviews with 40 of the most contributing ECR creators were planned, out of which 16 were willing to be interviewed. The qualitative content analysis (computer-aided with ATLAS.ti) was used for the evaluation of the transcribed interviews [12, 13]. On average, these interviews took 90 minutes. They all base on the same interview guideline containing 21 questions about as-is situation and potentials of the engineering change process. Additionally, arising problems within the process of the creation, potentials for improvements and perceived similarities were identified. Thereby we also discovered that old ECRs are not used to derive knowledge to support the creation of new ECRs. This fact is even more surprising and interesting considering the dataset of approximately 120.000 ECRs collected over the last ten years.

4


3

Results

Using interviews we were able to maintain a business understanding in the context of the analysis. An essential component is to identify support options for future ECRs by using and analyzing the request history. On that basis we derived targets for our analysis in order to plan the KDD project. The following table shows the objectives and possibilities to achieve the aims. They were derived from the qualitative content analysis. Table 2. Objectives of KDD project and starting points

objective proof the benefit of a keyword search for key issues in the change history discover groups or subclasses

possible approach identification of keywords ECRs term occurrences in ECRs relationships in ECRs patterns in ECRs

of

Furthermore, the interviews have shown that term occurrences, patterns and groups or subclasses should be discovered particularly in some attributes of ECRs. They are presented in table 3. Table 3. Objectives of KDD project and relevant attributes of ECR database

objective proof the benefit of a keyword search for key issues in the change history

formation of groups or subclasses of ECRs

ECR attribute to be studied cause of change solution affected products/projects technical extent of problem cause of change solution affected products/projects technical extent of problem

To test our approach we selected a data set which contains 1.402 change requests out of 3 development projects (each representing a series-production vehicle) within a development period over 3 years. This data set includes inter alia information about the cause of change, a possible solution of the problem, the benefit of the solution and related comments. According to table 3 we selected some criteria depending on the statements of the interviewee. They recommended looking after similarities concerning the cause of change, which induce the need of an ECR. Every change request has a free text field, where the engineer describes the cause in some sentences. Therefore our KDD process implies a Text Mining part. The process of Data Mining and Text Mining is a (semi) automated process to detect patterns in existing data bases [1, 14]. The main difference between the two disciplines is the nature of the


analysed information. Data Mining deals with structured data and Text Mining with unstructured data as an analysis basis. Text Mining extracts new, previously unknown and useful information from texts and categorises them [15, 16]. But Text Mining in such dimensions can only be done with tools supporting the whole KDD process. We chose RapidMiner [17] for our analysis and modelled the process with following steps: 1. Read Excel 2. Data Conversion (Nominal to Text) 3. Select Attributes 4. Replace special document parts 5. Tokenize documents 6. Filter Tokens by length 7. Filter Stopwords 8. Stemming 9. Create word vector These steps are common for Text Mining applications and can be expanded by other methods like e. g. classification and regression, clustering and segmentation and correlation and dependency computation. Based on the attribute cause of change we executed the process and calculated term occurrences for all change requests by creating a word vector. 5.783 different terms were found in the data, which can be analysed through different perspectives. The following table shows anonymized results in case of occurring words. The findings show that location of problem is one of the most occurring contents. Another fact is the description of affected parts. Nearly each record of an ECR includes affected parts. In this meta-level the results are not surprising. But they give the possibility to carry out detailed and specific evaluations. Table 4. Word list of occurring words (extract)

document occurrences 116 90 56 50 48

project A 70 57 43 46 30

project B 41 34 19 7 21

project C 14 7 6 13 5

...

...

...

...

...

total occurrences 125 98 68 66 56 ...

keyword (1 to 5.783) area assembly part x part y part z

The distribution of keywords is a starting point for further calculations and visualisations. For example, Fig 2 shows all 1.403 records and analyses occurrences of the keywords area and assembly. We can see that 7 requests include both keywords and are from two different projects (colour of the dots). Further Text Mining computations and interpretation and evaluation of our results are next necessary steps. It is planned to increase the data set and to analyse nearly all available records of ECRs.

6


B

C

assembly

term occurrences

Project A

area

term occurrences

Fig. 2. Scatter of term occurrences

4

Discussion

Based on the results of the interviews we analysed the current situation in ECM. Business Understanding is a relevant step that should definitely be carried out before any KDD projects. It allowed us to identify objectives for our KDD process. In the second phase, the developed Text Mining process generated word vectors. By doing so, we identified relevant keywords to find pattern in the data. Two approaches are possible now. First, we can use the results to find out what kind of circumstances lead to change requests, and second we are now able to take the data as source to develop further assistance to change request applicants. For example, data shows that it is relevant when describing the cause of change to locate the problem and to state out affected parts. To simplify the process, localization of the problem can be realized e.g. with a clickable parts list. The functionalities of RapidMiner allow a relatively simple modelling of the Text Mining process. A wide variety of operators can be selected to build an appropriate analysis process in the tool, which allows different perspectives on the data. The possibilities to visualise the results are also helpful to detect interesting pattern in the data.


5

Conclusion and Outlook

Nowadays, the problem is not a lack of data, but rather to find useful information in existing databases. These difficulties also appear in the case of ECM, where the creation of ECR can be fastened or prevented by identifying hidden knowledge in the history of generated and recorded data. If knowledge is discovered, developer can concentrate more on their main tasks, namely creatively design products. Development processes become less time consuming and companies can offer their products earlier. As a result, the process of ECM gets better organised regarding the fact, that ECRs consume one-third to one-half of engineering capacity [18]. This research project shows how Text Mining methods can be used to improve and coordinate processes which are supported by cooperative work systems. It states out which information about existing processes are relevant to derive possibilities for the development and use of supporting concepts. It therefore serves as a motivation to extend the integration of KDD processes into Information Technology (IT). Maimon [19] stated out that KDD processes become more and more part of integrated IT. The authors see benefits in easier and more convenient preprocessing steps starting from the data sources, followed by Text Mining tools and Data Warehouse. This is even more important as the most time consuming part in a KDD process is cleansing and transforming the data. The integration of these processes creates an awareness of the importance of precise data management and data use. This is why it can be recommended in IT-supported, complex and frequently occurring processes, where enough data is generated. The affected ECRs in the database can act as “lessons learned” and can be used to e.g. adjust processes or to reuse solutions. A systematic approach for the creation of ECRs is an essential precondition for the successful implementation of KDD. Companies who develop products and fulfil this requirement have an advantage compared with competitors.

References 1. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005) 2. Howe, V., Mathieu, R.G., Parker, J.: Supporting new product development with the Internet. 100 (2000) 277-284 3. Eversheim, W., Roggatz, A., Zimmermann, H., Derichs, T.: Information management for concurrent engineering. European Journal of Operational Research 100 (1997) 253-265 4. Huang, G.Q., Mak, K.L.: Current practices of engineering change management in UK manufacturing industries. International Journal of Operations and Production Management 19 (1999) 21-37 5. Huang, G.Q., Yee, W.Y., Mak, K.L.: Development of a web-based system for engineering change management. Robotics and Computer Integrated Manufacturing 17 (2001) 255-267 6. Vianello, G., Ahmed, S.: Engineering changes during the service phase. ASME 2008 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, New York (2008)

8


7. Jarratt, T., Eckert, C., Clarkson, P.: Development of a product model to support engineering change management. Proceedings of the TMCE 2004 (2004) 331-342 8. Conrad, J., Deubel, T., Köhler, C., Wanke, S.r., Weber, C.: Change impact and risk analysis (CIRA) - combining the CPM/PDD theory and FMEA-methodology for an improved engineering change management. ICED’07 - Internatinal conference on engineering design, Paris (2007) 9. Clark, K., Fujimoto, T.: Product Development Performance: Strategy, Organization, and Management in the World Auto Industry, 1991. Harvard Business School Press, Boston, MA (1991) 10.Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. Communications of the ACM 39 (1996) 24-26 11.Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: Step-by-step data mining guide. SPSS inc 78 (2000) 12.Atteslander, P., Cromm, J., Grabow, B., Klein, H., Maurer, A., Siegert, G.: Methoden der empirischen Sozialforschung. E. Schmidt Verlag, Berlin (2008) 13.Gläser, J., Laudel, G.: Experteninterviews und qualitative Inhaltsanalyse. VS Verlag für Sozialwissenschaften, Wiesbaden (2009) 14.Fundel, K.: Text Mining and Gene Expression Analysis Towards Combined Interpretation of High Throughput Data, München (2007) 15.Hearst, M.: What is Text Mining. (2003) 16.Chen, L., Nayak, D.: A case study of failure mode analysis with text mining methods. Australian Computer Society, Inc. (2007) 49-60 17.Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid prototyping for complex data mining tasks. ACM (2006) 940 18.Soderberg, L.: Facing up to the engineering gap. The McKinsey Quarterly (1989) 3–23 19.Maimon, O., Rokach, L.: Introduction to Knowledge Discovery in Databases. In: Maimon, O., Rokach, L. (eds.): Data Mining and Knowledge Discovery Handbook. Springer-Verlag New York Inc (2005) 1-17

Knowledge Discovery in Databases on the Example ...

Knowledge Discovery in Databases on the Example ...

Suggest Documents

Knowledge Discovery in Spatial Databases - Computing Science

Introduction to knowledge discovery in databases

Conceptual Knowledge Discovery in Databases ... - Semantic Scholar

Knowledge Discovery in Databases for Competitive Advantage

knowledge discovery in spatial databases - CiteSeerX

Knowledge Discovery and Data Mining in Databases

Knowledge Discovery from Multiple Distributed Databases - CiteSeerX

Generalized Knowledge Discovery from Relational Databases

APPLICATION OF KNOWLEDGE DISCOVERY FROM DATABASES ...

Knowledge discovery from industrial databases - Springer Link

The MiningMart Approach to Knowledge Discovery in Databases

On Knowledge Discovery in Open Medical Data on the Example of the ...

1996-Data Mining and Knowledge Discovery in Databases ...

Knowledge Discovery in Databases (KDD) with ... - Semantic Scholar

Knowledge Discovery in Databases: Rough Set Approach - CiteSeerX

Knowledge Discovery in Databases: An Attribute ... - Google Sites

From Data Mining to Knowledge Discovery in Databases - CiteSeerX

From Data Mining to Knowledge Discovery in Databases - CiteSeerX

An XML Framework proposal for knowledge discovery in databases

K-State Laboratory for Knowledge Discovery in Databases (KDD)

Knowledge Discovery in Multiple Spatial Databases - Springer Link

Logic-based Knowledge Discovery in Databases - Semantic Scholar

From Data Mining to Knowledge Discovery in Databases - KDnuggets

A Virtual Mart for Knowledge Discovery in Databases - Semantic Scholar