Developing Knowledge Systems with Continuous ... - Semantic Scholar

4 downloads 1639 Views 424KB Size Report
During the last decades, the construction of knowledge-based systems has grown ... ”Continuous Integration is a software development practice where members ...
Developing Knowledge Systems with Continuous Integration Joachim Baumeister

Jochen Reutelshoefer

denkbares GmbH Friedrich-Bergius-Ring 15 97076 Würzburg, Germany

Institute of Computer Science University of Würzburg 97074 Würzburg, Germany

[email protected]

[email protected] These requirements are very similar to those already stated for state-of-the-art Software Engineering:

ABSTRACT With the industrial success of knowledge-based systems new requirements with respect to Knowledge Engineering processes arise. Besides advanced knowledge acquisition tools, novel techniques for the quality assurance need to be established in order to maintain a safe development process. In Software Engineering, the application of continuous integration as a collection of practices has proved to be suitable for this task. In this paper, we transfer the general ideas of continuous integration from Software Engineering to Knowledge Engineering, and we demonstrate the implementation of a continuous integration tool into a state-of-the-art Knowledge Engineering workbench.

Categories and Subject Descriptors K.6.3 [Software Management]: Software maintenance; I.2.1 [Applications and Expert Systems]: Industrial automation

General Terms Knowledge Engineering, Software Engineering, Agility

Keywords Continuous Integration, Testing, Knowledge Engineering

1.

INTRODUCTION

During the last decades, the construction of knowledge-based systems has grown from an academic discipline into an industrial development process. In the past, traditional development processes and tools have been developed [1], while more recently agile developments techniques [2] have been focused. With the industrial impact of such systems, new requirements are posed to Knowledge Engineering techniques with respect to the development and maintenance process of knowledge bases.

• Guarantee a constant quality of the ”product” over the development and maintenance process. • Manage versions of the developed knowledge base. • Support a distributed development process. • Provide a running system at any time. In the past years, a novel collection of techniques showed a significant impact in Software Engineering taking into account the requirements as stated above. The term continuous integration emerged from agile development techniques and pragmatic project automation [3]; it proposes a collection of practices for the engineering of software systems in order to improve the overall quality of the system [4]. According to Fowler the term continuous integration (CI) can be summarized as follows: ”Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily — leading to multiple integrations per day. Each integration is verified by an automated build (including tests) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.” [5] Although, being only a set of simple practices, its strict application shows a significant improvement of the software development process. The most recommended practices of continuous integration are: Use of a code repository: To allow for a distributed and asynchronous development. Automated tests on all levels of the software: Tests on component, integration, and system level are provided, to lower the costs of manual quality management. Automated building of the software: To simplify the creation of executable artifacts; automated tests substantially increase the value of the automated building step. Frequent and timely integration of new code: To see whether modifications have an unexpected impact to the entire system, and to provide the new features early. Easy access to the latest builds: To simplify the testing of new artifacts, but also to provide a fallback to older versions.

In this paper, we argue that the ideas and practices of continuous integration—as known from Software Engineering— can be transferred into Knowledge Engineering practices, and we motivate that continuous integration also significantly improves the development process in Knowledge Engineering. We demonstrate its application in combination with the knowledge modeling tool KnowWE, a Semantic Wiki empowered by a plugin to allow for continuous integration. From this prototype implementation, it is easy to see that continuous integration can be implemented in most Knowledge Engineering tools available today.

2.

CONTINUOUS INTEGRATION IN KNOWLEDGE ENGINEERING

In this section, we propose the use of continuous integration practices for Knowledge Engineering, and we motivate that the development of knowledge bases can also benefit from (tailored) continuous integration. Knowledge Base Repository. A central repository stores the artifacts of the knowledge base together with older versions. A repository should not only store the formalized parts of the knowledge base but also includes less formal parts relevant for the development process, such as organizational documents, sheets etc. A central knowledge base repository provides the essential infrastructure of a (distributed) development project, allowing all knowledge engineers to contribute to the project. Automated Building of the Knowledge Base. A build is an executable artefact of the knowledge base, that is ready for deployment into the productive setting. The developed knowledge base needs to be transfered into such a ”binary version”, that is later used to be deployed into practical use. A CI system should be aware that the knowledge base can be automatically transfered into an executable format, which can be accessed by the users. Automated Tests of Current Builds. When introducing CI for knowledge bases, it is essential to provide automated tests for the knowledge. A test is automated, if the expected & correct result of the test is known apriori and it can be applied without manual interaction. After its execution the computed results are compared with expected results automatically. If they are not consistent, a development flaw has been detected and can be communicated to the developers. Such a test suite does not require human interaction, but simply returns failure or success when the tests have been executed. In the literature, there exists a comprehensive collection of validation and verification methods for different knowledge representations, see for instance research on validation [6] and verification [7]. By means of automated tests the knowledge base becomes ”selftestable”. In consequence, tests can be inexpensively applied after every change. Frequent testing with a sufficient test suite increases the confidence of the domain specialists and the engineers involved [8]. Frequent Integration of Knowledge. It should be easy to integrate new developments of the knowledge base into the deployed production systems, for instance, extensions of the terminology or new rules. That way, the changes of the knowledge base can be integrated frequently and thus

immediate feedback of the quality and utility can be generated. A necessary prerequisite of frequent integration is the existence of automated tests and version control. The first one is required for generating quality feedback, the latter one is needed to fall-back to an older version when problems with the latest integration of new developments occur. Frequent integration is a fundamental process, when peer developers are working in a distributed environment; then the (positive and negative) effects of the particular changes are propagated timely with frequent integration. Easy Access and Deployment. Access to builds, i.e., executable versions of the knowledge base including the corresponding test results, are important in continuous integration. For a development tool supporting continuous integration, it should be simple to download the latest build, but also previous build versions. These builds should be directly usable in a productional environment. The easy access of build versions simplifies the flexible adaptation of the knowledge base and thus improves the agility of the development project. The possibility to easily backtrack to older versions of the knowledge base makes potentially wicked development steps less risky and therefore more attractive for the contributors. The CI Dashboard. In the previous paragraphs, we discussed fundamental practices and techniques for continuous integration. For its practical use, the results of the development process and its continuous integration need to be communicated to the knowledge engineers. In Software Engineering, tailored dashboards have been proposed as an effective visualization technique to track the development process. At any time, a dashboard shows the overall quality state of the current version of the entire project at one glance and supports further acting if necessary: • The creation of a new build can be started from the dashboard. • It generates and displays a summarization of generated feedback of the latest build, e.g., the results of the test suite. Links to detailed reports are given. • It provides access to previous builds and the corresponding changes. • A weather report shows the healthiness of the knowledge base considering the past period of development, i.e., an assessment of the latest builds with respect to their quality. A CI dashboard makes effects of the current development activities visible to everyone involved in the development process, i.e., knowledge engineers, domain specialists, or endusers. In summary, the use of continuous integration provides evident benefits for the development of knowledge bases: • Reduced risk during the development and evolution of a knowledge base: Thanks to frequent integration and automated tests, it becomes easier to detect problems early, for instance, when the latest changes have created anomalies. The reduction of risks allows for an agile process model, where changes can be flexibly

integrated during the development. In case of problems, the frequent integration and tests will uncover problems quickly and thus will make debugging very easy. • With continuous integration it is possible to provide a running system at any time of the development process. With every integration, a productional build is created and archived by a version control system. That way, independently of the current development stream, always a reliable version of the system can be downloaded and used, when necessary.

figured by three trigger-modes: On page-save (every time, a wiki article is saved), predefined schedule (for instance, nightly builds and tests), and on demand (build and test is activated due to manual start). Failures are reported on the dashboard of the continuous integration plugin which is shown in Figure 2. This dashboard displays at any time the current state of the wiki knowledge base with respect to quality at one glance. It provides information on the modifications causing a certain problem as well as links to the content sections in the wiki, where the discovered problem is located. Due to the instant compilation of changes to the

In the following section, we propose a Semantic Wiki as a Knowledge Engineering tool integrating CI practices quite intuitively.

3.

CI WITH KNOWWE

Semantic Wikis [9, 10] are an extension to the concept of usual wikis [11], where besides informal content (e.g., text, figures) also formalized relations of domain knowledge are captured. These formalized relations are inserted into the wiki by explicit annotation methods, for instance certain markup or input forms; the made annotations are forming an (executable) knowledge base. That way, a Semantic Wiki serves as a collaborative Knowledge Engineering tool. The well-known wiki interface provides a large degree of freedom in structuring, and it allows for a workflow of incremental formalization. Compile Knowledge Integrate Run Tests

Visualize Feedback Knowledge Engineers

CI Plugin Edit Knowledge

Knowledge Engineers

Knowledge Bases Builds CI Feedback

KnowWE Version Control

Figure 1: Continuous integration in the Knowledge Engineering tool KnowWE.

In the following, we present the continuous integration support of the Semantic Wiki KnowWE [12]. Figure 1 sketches the workflow of continuous integration within the Knowledge Engineering tool KnowWE. On the left, one can see the contributors editing the knowledge using a standard web browser. The wiki server is provided with version control and a CI plugin, that allows for the continuous integration of knowledge bases. The initial continuous integration tool was published as a KnowWE plugin in August 2010 and refined/extended in the following months. It is contained in the default installation and was already used in a number of academic and industrial applications in technical and medical domains. The tool can be configured easily to support tailored quality management for the respective project. Registered automated tests are performed on new states of the wiki knowledge base and give verbose feedback to the knowledge engineers by an integrated dashboard. The tests can be con-

Figure 3: The trend of the knowledge quality considering the recent development is visualized by different weather icons. knowledge base, interface errors (syntax errors) and object dependency errors in KnowWE are always discovered immediately after page-save actions and are also listed in the dashboard. The green ball symbolizes, that the latest build succeeded (a red ball is displayed otherwise). A build is successful, if no errors/anomalies or failed tests are detected by the automated building and testing process. The trend of the knowledge base quality considering the recent development progress is visualized as a ”weather report” shown in Figure 2-(1). It summarizes the quality of the sequence of the most recent builds. Figure 3 shows the different weather states used to represent the latest developments. If less than 20% of the recent build were successful, thunder (a) is displayed. For a series of fully successful builds a sun (e) is displayed. The remaining weather states run in 20%-steps through rain (b), clouds (c), and sunny clouds (d). The history of builds is listed in Figure 2-(2): Older builds can be inspected by clicking on the build number, for instance, because the developer wants to check the reason for the build problem. For the selected build the applied tests are shown in the center of the dashboard; Figure 2-(3): In case of errors, the test gives a detailed report on the error here as well and provides links for further investigation/debugging. Also, one can see which modifications have been made with respect to the previous version using the build-in version control system of the wiki. The KnowWE dashboard provides the central access to obtain the benefits of continuous integration during the development process. It makes the current state and the recent developments visible to every person involved in the development process. Further, it serves as a starting point for well-directed handling of occurring problems if necessary. However, development using continuous integration is only reasonable, if the coverage and quality of the applied tests is sufficient. For this reason, the design of the continuous integration plugin allows for the simple integration of new (possibly project specific) tests into the continuous integration workflow. As a basic component, KnowWE supports the definition of test suites for the given knowledge







Figure 2: The continuous integration dashboard in KnowWE. base. There, (sequential) test cases are defined within the wiki and attached to the continuous integration workflow by specifying a trigger mode.

4.

CONCLUSIONS

We introduced the application of continuous integration practices in Knowledge Engineering and we argued that it can significantly improve the Knowledge Engineering process. Continuous integration is an effective set of practices to support agile development processes. While agile methodologies already play a major role in Software Engineering, they have also been a focus of research in Knowledge Engineering. Thus, we forecast that continuous integration also will emerge to a standard technique in practical Knowledge Engineering. In principle, every Knowledge Engineering tool is able to include support for continuous integration, when the following pre-conditions are fulfilled: 1. Support for automated tests: The tool needs to be able execute automated tests, so that the effects of knowledge modifications can be tracked. 2. Version control: Changes on the knowledge base need to be managed by a version control system, so that undesired effects can be backtracked easily. 3. A dashboard: The current state of the knowledge base need to be visualized and access to older versions of the knowledge base is necessary. 4. Easy deployment of production system: The tool needs to guarantee that a stable (potentially older) version of the knowledge base in a ready-to-deploy format is available at any time. 5. (optional) Distributed access to the knowledge base: The development effectiveness can be improved, when the tool supports a distributed access to the knowledge base. Today, many state-of-the-art tools already support most of these pre-conditions.

5.

REFERENCES

[1] Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, N., de Velde, W.V., Wielinga, B.: Knowledge Engineering and Management - The CommonKADS Methodology. 2 edn. MIT Press (2001) [2] Baumeister, J., Seipel, D., Puppe, F.: Agile development of rule systems. In Giurca, Gasevic, Taveter, eds.: Handbook of Research on Emerging Rule-Based Languages and Technologies: Open Solutions and Approaches. IGI Publishing (2009) [3] Cockburn, A.: Agile Software Development. Addison-Wesley (2002) [4] Duvall, P., Matyas, S.M., Glover, A.: Continuous Integration: Improving Software Quality and Reducing Risk. Addison-Wesley (2007) [5] Fowler, M.: Continuous integration: http://martinfowler.com > /articles/continuousintegration.html [6] Baumeister, J.: Advanced empirical testing. Knowledge-Based Systems 24(1) (2011) 83–94 [7] Preece, A., Shinghal, R.: Foundation and application of knowledge base verification. International Journal of Intelligent Systems 9 (1994) 683–702 [8] Beck, K., Andres, C.: Extreme Programming Explained: Embrace Change (2nd Edition). Addison-Wesley, Boston (2004) [9] Schaffert, S., Bry, F., Baumeister, J., Kiesel, M.: Semantic wikis. IEEE Software 25(4) (2008) 8–11 [10] Kr¨ otzsch, M., Vrandeci´c, D., V¨ olkel, M.: Semantic MediaWiki. In: ISWC’06: Proceedings of the 5th International Semantic Web Conference, LNAI 4273, Berlin, Springer (2006) 935–942 [11] Leuf, B., Cunningham, W.: The Wiki Way: Quick Collaboration on the Web. Addison-Wesley, New York (2001) [12] Baumeister, J., Reutelshoefer, J., Puppe, F.: KnowWE: A semantic wiki for knowledge engineering. Applied Intelligence (2011)

Suggest Documents