TEAM: Towards a Software Engineering Semantic Web

4 downloads 112510 Views 234KB Size Report
connect software engineering knowledge across the web. In this paper, we .... the ”hidden web” of intranets and company networks hosts even more such ...
TEAM: Towards a Software Engineering Semantic Web Hans-Jörg Happel

FZI Research Center for Information Technologies Karlsruhe, Germany

[email protected]

Walid Maalej

Technische Universität München Munich, Germany

[email protected]

ABSTRACT

[email protected]

the solution domain are released more frequently and experience with old versions ages rapidly. Although these issues could be adressed by pragmatic tool support, coordination and knowledge sharing have been side-aspects in software engineering for a long time. For example, while modularity has been treated as a ”technical” engineering concept [11], recent work has shown its importance for coordination and knowledge sharing [10]. Also configuration management systems have been designed from a technical point-of-view, while they also strongly influence team collaboration [23]. This lack of attention for collaboration and knowledge exchange concerns has an impact on the efficiency of software development projects. Empirical studies have shown that technically driven design decisions influence coordination and knowledge sharing in development teams, which can in turn decrease their productivity [17]. A typical example for a lack of knowledge sharing is illustrated in [1], which describes the resolution of a bug in an Open Source community. Although fixing the bug required changing only one line of code, the bug was filed for three months, and the actual fix took almost one month. During this time, several discussions took place, with partly duplicated information. Such examples show that software engineering researchers and practitioners should direct the focus to coordination and knowledge sharing, especially when informal ”hallway” communication is not possible due to the distribution of team members. Accordingly, an increasing interest in human issues such as coordination and knowledge sharing can be observed in the software engineering community in recent years [16, 26]. In the following section, we will shortly summarize some existing work in collaborative software development. Afterwards, we will describe the TEAM1 approach towards knowledge sharing in distributed development teams. We will therefore introduce the notion of a Software Engineering Semantic Web and introduce the TEAM concepts for realizing this idea.

Large software development projects are complex endeavours that involve numerous participants which can work across several sites and act in various roles. Each participant produces and consumes information relevant for the success of the project. In such settings, an effective and efficient allocation of knowledge is a hard challenge, especially if there is no central authority, which enforces standards for the whole ecosystem. We consider semantic technologies as an important enabler to improve information and knowledge sharing in such scenarios, by helping to exchange and interconnect software engineering knowledge across the web. In this paper, we describe the corresponding vision of a Software Engineering Semantic Web and the role of intelligent IDEs in order to benefit from and contribute to it.

Categories and Subject Descriptors D.2.9 [Software Engineering]: Management—Programming teams; H.3.5 [Information Storage and Retrieval]: On-line Information Services—Data sharing

General Terms Design, Management

1.

Ljiljana Stojanovic

FZI Research Center for Information Technologies Karlsruhe, Germany

INTRODUCTION

The development of software systems is a complex and knowledge intensive activity [12, 28]. First, people with different backgrounds and expertise levels participate in roles such as end-users, requirements engineers, developers and managers. Each participant produces and consumes information, and accumulates experience and knowledge in specific areas. Second, changing requirements and evolving functionality result in shorter turn-around time for knowledge about both the application domain and system components. Third, effectively working with powerful frameworks or successfully employing a design pattern requires experience about used artifacts. New versions and technologies in

2.

RELATED WORK

While the aspect of coordination has been adressed by a number of research approaches, pragmatic tool support is still restricted [8]. Cook and Churcher propose the CAISE architecture [8] to assist the construction of such tools. This architecture allows isolated programmers to work collaboratively without sacrificing communication. CAISE-based

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHASE’08, May 13, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-039-5/08/05 ...$5.00.

1 This work was partly supported by the European Commission (IST-35111-TEAM). http://www.team-project.eu

57

tools achieve this by keeping all programmers within a group synchronized in real-time. They also provide customizable user awareness and project state information. The Augur system [13] simultaneously visualizes the structure of a software system and the development process. By looking at how developers worked together on what parts of a system, users can tell how relationships between artifacts and developers have changed over time. Similarly, the Ariadne system highlights the social dependencies which underly technical artifacts [24]. In [21] the authors describe an approach to observe and analyze collaborative activities in online groups, by merging three data sources: communication artifacts, topical relationships and material relationships. This hybrid network shows how the simultaneous visualization of heterogeneous data reveals collaboration patterns, which would not have been visible using social networks exclusively. IBM Jazz [7] addresses the collaboration needs by integrating communication and awareness functionalities into the development environment. Integrating collaboration brings the payoff of reduced friction in the development process, a greater sense of context, and immediate traceability between collaboration artifacts and system artifacts. A similar approach presented by Br¨ ugge et al. is focussing on the modeling activities of software engineering [5]. While collaboration aspects get increasingly integrated into software development environment, the exchange of knowledge and expertise in development teams remains unsupported beyond communication tools. Existing solutions for knowledge sharing in software engineering are mostly based on centralized approaches. The Experience Factory [3] for example embodies the assumption that sharing and using knowledge requires a process of knowledge extraction and refinement, whose aim is to eliminate all subjective and contextual aspects of knowledge, and create an objective and general representation that can then be reused by other people in a variety of situations. More lightweight knowledge sharing approaches such as Wikis are more popular, but also lack direct integration into the working processes and context of developers [6]. Recent works suggest formal knowledge representation models – so called ontologies – as a basis to leverage software engineering knowledge. Witte et al. [27] describe an approach that helps to query artifacts for software maintenance tasks. The Dhruv system [1, 2] aims to assist the bug resolution process by recommending relevant information during bug inspection. Therefore, Dhruv is integrated in a web-based bug tracking system where it displays recommendations in a sidebar. Recommendations may involve source code files, mailinglist discussions or similar bug reports. While nicely integrated into the working environment, Dhruv provides a central, server-based solution which is specific to bug resolution. However, the underlying concept of semantic technologies is well-suited to support more decentralized knowledge sharing scenarios. In the remainder of this paper, we will extend these ideas towards the concept of a Software Engineering Semantic Web. We show that intelligent integrated development environments [29] can play a key role in realizing this concept.

3.

The Semantic Web is a vision to help leveraging the knowledge more efficiently, which already exists in the Internet today. It focusses on the development of techniques that allow machines to better ”understand” internet sites and thus help people in satisfying their information needs [4]. The vision of the ”Semantic Wikipedia” [25] is a good example to illustrate this. The current Wikipedia contains a vast amount of encyclopedic knowledge that allows, e.g. to easily solve the task to ”find all ACM Turing Award Winners born outside the US”. However, it will take a human around 10 minutes to access the list of all Turing Award winners, visit each winner’s page and check where they were born. Although explicit relations (i.e. hyperlinks between the Turing award and its winners) capture this knowledge, it requires a human to interpret the semantics of these relations. The idea of the Semantic Wikipedia is to provide means for expressing semantic information, which goes beyond the expressivity of natural language. Formal relations such as ”hasBeenAwarded” can be used to annotate the relation between the award its winner. Thus, although the knowledge remains the same from a human point of view, machines can then interpret the information stated in the Wikipedia and thus help to answer our initial or even more complex questions automatically. This kind of semantic information can be expressed by using formal models called ontologies. According to a widespread definition, ”an ontology is an explicit specification of a shared conceptualization” [14] – i.e. it captures information in a common, formal and machine-understandable fashion. Ontologies provide formal vocabularies which help to integrate and map semantically equal but syntactically different information. The vision of interconnected ontologies which capture large amounts of information on the Internet is called the Semantic Web [4]. Recently, standards for ontology representation such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL) have been endorsed by the W3C. While the creation of such formal models requires some effort and expertise, the case of Semantic Wikipedia illustrates, that such investments can yield large benefits. Ontologies have also attained increasing attention in the Software Engineering community [HS06]. Similar to the encyclopedic knowledge in the Wikipedia, the internet also contains lots of information related to software projects. Large Open Source development portals such as SourceForge2 contain various kinds of information – e.g. about software releases or bugs. Going beyond Open Source projects, the ”hidden web” of intranets and company networks hosts even more such information. Since knowledge in software development projects is scattered across various people, systems, formats and spaces, the creation of a ”Software Engineering Semantic Web” could help software developers to share a common format for integration and combination of data drawn from diverse sources. However, as in the case of the Wikipedia, most of this knowledge is explicit, but not machine-understandable. To leverage similar information access capabilities as for the Semantic Wikipedia, a Software Engineering Semantic Web requires the following building blocks: 2

58

SOFTWARE ENGINEERING SEMANTIC WEB

http://www.sourceforge.net

• Means for leveraging the knowledge encoded by existing artifacts such as source code, documents or bug descriptions.

of semantic data that can answer basic structured queries. This is complemented by providing access control [22] as well as sharing mechanisms that assist developers in identifying knowledge items worth sharing [15]. To access the knowledge inside the Software Engineering Semantic Web which has been weaved so far, TEAM finally provides semantic search and recommendation functionalities. They help developers to query for information and to recommend knowledge which is relevant in the current situation of the developer. In contrast to existing approaches such as Hipikat [9] or Strathcona [18] our search and recommendation component heavily relies on the monitoring component which provides rich information about the user profile. Thus, our architecture provides a flexible infrastructure for semantically matching different user contexts with suitable assistance such as relevant bug resolution hints, documents, example code or collaborative colleagues who are able to support.

• Means for leveraging artifact metadata such as feature descriptions, compatible platforms or license information. • Means for the simple and user friendly authoring of additional semantic knowledge. While there are first promising attempts into this direction (see e.g. section 2 or the DOAP ontology3 ), the effective usage, creation and exchange of such semantic data remains on open issue. In the following section we describe how the TEAM system provides such functionality to support knowledge sharing in distributed teams.

4.

KNOWLEDGE SHARING WITH TEAM

5.

The TEAM project develops extensions for IDEs which help to improve knowledge sharing in distributed development teams by applying semantic technologies.4 TEAM focuses on knowledge which is relevant during the implementation phase of software engineering such as for reuse (e.g. how to use a certain library) or error handling (e.g. how to resolve a certain bug). In the following, we describe the conceptual building blocks and core features of the TEAM system. TEAM provides means to leverage existing information related to reuse and error handling, such as entries from issue tracking systems, source code, software components and written documents. Ontologies capture the internal structure of these artifacts as well as different semantic interrelations among them. An information extraction component creates semantic representations of the artifacts and employs heuristics to infer additional, implicit relations among the artifacts (see e.g. [9, 1]). For the easy acquisition of semantic knowledge, TEAM provides user-friendly lightweight semantic authoring and annotation. It is based on the concept of Semantic Wikis, which allow to annotate formal relations among different entities. Further benefits of Semantic Wikis in a software engineering context are described in [20]. Besides explicit knowledge from development artifacts and written documents, TEAM also captures implicit knowledge from the developer’s interactions inside the IDE. Tools like mylyn [19] have shown that those user interactions are a valuable source of implicit knowledge. While mylyn records low-level user interactions, the TEAM context subsystem also employs statistical techniques to mine and analyze historic data in order to derive user activities and to discover new knowledge patterns. The knowledge resources described so far are accumulated on the local machine of a software developer. While this can already have some benefit for information management, the full potential of a Software Engineering Semantic Web unfolds only, when developers are enabled to share their knowledge within a project team or even in the larger scope of the web. This can be achieved by Peer-to-Peer based metadata management, which allows every node to become a publisher

CONCLUSION AND OUTLOOK

The issues of team collaboration and knowledge sharing in software development teams have not been a first order concern of software engineering tools for a long time. Recently, tools such as IBM Jazz introduced collaboration into the IDE and thus enable software developers to become a communication hub in their ”web of co-developers”. In this paper, we sketched the vision of a Software Engineering Semantic Web which goes even further by making developers ”knowledge hubs”, which can share knowledge with other developers in public or private webs. We also described how the extension of current IDEs can enable developers to benefit from and contribute to this Software Engineering Semantic Web. The TEAM project develops an open source prototype for decentralized and contextsensitive knowledge sharing in distributed software development teams. It provides a foundation for realizing a selfadapting, knowledgeable software development environment that a) learns developers’ knowledge and preferences from their behavior, b) discover their information need proactively and c) deliver the right information support in the right context automatically.

6.

REFERENCES

[1] Anupriya Ankolekar. Towards a Semantic Web of Community, Content and Interactions. PhD thesis, School of Computer Science, Carnegie Mellon University, 09 2005. [2] Anupriya Ankolekar, Katia Sycara, James Herbsleb, Robert Kraut, and Chris Welty. Supporting online problem-solving communities with the semantic web. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 575–584, New York, NY, USA, 2006. ACM. [3] Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach. Experience Factory. In John J. Marciniak, editor, Encyclopedia of Software Engineering, volume 1, pages 469–476. John Wiley & Sons, 1994. [4] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic Web. Scientific American, 284(5):34–43, May 2001. [5] Bernd Bruegge, Allen H. Dutoit, and Timo Wolf. Sysiphus: Enabling informal collaboration in global

3

http://usefulinc.com/doap/ Within the project, a prototype will developed for the Eclipse IDE 4

59

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

software development. In ICGSE ’06: Proceedings of the IEEE international conference on Global Software Engineering, pages 139–148, Washington, DC, USA, 2006. IEEE Computer Society. Thomas Chau and Frank Maurer. A case study of wiki-based experience repository at a medium-sized software company. In K-CAP ’05: Proceedings of the 3rd international conference on Knowledge capture, pages 185–186, New York, NY, USA, 2005. ACM. Li-Te Cheng, Cleidson R.B. de Souza, Susanne Hupfer, John Patterson, and Steven Ross. Building collaboration into ides. Queue, 1(9):40–50, 2004. Carl Cook and Neville Churcher. Constructing real-time collaborative software engineering tools using caise, an architecture for supporting tool development. In ACSC ’06: Proceedings of the 29th Australasian Computer Science Conference, pages 267–276, Darlinghurst, Australia, Australia, 2006. Australian Computer Society, Inc. Davor Cubranic, Janice Singer, and Kellogg S. Booth. Hipikat: A project memory for software development. IEEE Trans. Softw. Eng., 31(6):446–465, 2005. Cleidson R. B. de Souza, David Redmiles, Li-Te Cheng, David Millen, and John Patterson. How a good software practice thwarts collaboration: the multiple roles of apis in software development. SIGSOFT Softw. Eng. Notes, 29(6):221–230, 2004. Premkumar Devanbu, Bob Balzer, Don Batory, Gregor Kiczales, John Launchbury, David Parnas, and Peri Tarr. Modularity in the new millenium: a panel summary. In ICSE ’03: Proceedings of the 25th International Conference on Software Engineering, pages 723–724, Washington, DC, USA, 2003. IEEE Computer Society. Gerhard Fischer and Matthias Schneider. Knowledge-based communication processes in software engineering. In ICSE ’84: Proceedings of the 7th international conference on Software engineering, pages 358–368, Piscataway, NJ, USA, 1984. IEEE Press. Jon Froehlich and Paul Dourish. Unifying artifacts and activities in a visual tool for distributed software development teams. In ICSE ’04: Proceedings of the 26th International Conference on Software Engineering, pages 387–396, Washington, DC, USA, 2004. IEEE Computer Society. Thomas R. Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199–220, 1993. Hans-J¨ org Happel, Ljiljana Stojanovic, and Nenad Stojanovic. Fostering knowledge sharing by inverse search. In K-CAP ’07: Proceedings of the 4th international conference on Knowledge capture, pages 181–182, New York, NY, USA, 2007. ACM. James D. Herbsleb. Global software engineering: The future of socio-technical coordination. In FOSE ’07: 2007 Future of Software Engineering, pages 188–198, Washington, DC, USA, 2007. IEEE Computer Society. James D. Herbsleb and Audris Mockus. Formulation and preliminary test of an empirical theory of coordination in software engineering. SIGSOFT Softw. Eng. Notes, 28(5):138–137, 2003.

[18] Reid Holmes, Robert J. Walker, and Gail C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12):952–970, 2006. [19] Mik Kersten and Gail C. Murphy. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pages 1–11, New York, NY, USA, 2006. ACM. [20] Walid Maalej, Dimitris Panagiotou, and Hans-J¨ org Happel. Towards effective management of software knowledge exploiting the semantic wiki paradigm. In Proceedings of the Software Engineering 2008 (SE’08), volume 121 of LNI, pages 183–197. GI, 2008. [21] Yevgeniy ”Eugene” Medynskiy, Nicolas Ducheneaut, and Ayman Farahat. Using hybrid networks for the analysis of online software development communities. In CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 513–516, New York, NY, USA, 2006. ACM. [22] Narendula Rammohan. A note on secure p-grid. Technical Report LSIR-REPORT-2008-005, Swiss Federal Institute of Technology (EPFL), 2006. [23] Anita Sarma, Zahra Noroozi, and Andr´e van der Hoek. Palantir: raising awareness among configuration management workspaces. In ICSE ’03: Proceedings of the 25th International Conference on Software Engineering, pages 444–454, Washington, DC, USA, 2003. IEEE Computer Society. [24] Erik Trainer, Stephen Quirk, Cleidson de Souza, and David Redmiles. Bridging the gap between technical and social dependencies with ariadne. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pages 26–30, New York, NY, USA, 2005. ACM. [25] Max V¨ olkel, Markus Kr¨ otzsch, Denny Vrandecic, Heiko Haller, and Rudi Studer. Semantic wikipedia. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 585–594, New York, NY, USA, 2006. ACM. [26] Jim Whitehead. Collaboration in software engineering: A roadmap. In FOSE ’07: 2007 Future of Software Engineering, pages 214–225, Washington, DC, USA, 2007. IEEE Computer Society. [27] Ren´e Witte, Yonggang Zhang, and Juergen Rilling. Empowering Software Maintainers with Semantic Web Technologies. In Proceedings of the European Semantic Web Conference, ESWC2007, volume 4519 of Lecture Notes in Computer Science. Springer, July 2007. [28] Yunwen Ye. Supporting software development as knowledge-intensive and collaborative activity. In WISER ’06: Proceedings of the 2006 international workshop on Workshop on interdisciplinary software engineering research, pages 15–22, New York, NY, USA, 2006. ACM. [29] Andreas Zeller. The future of programming environments: Integration, synergy, and assistance. In FOSE ’07: 2007 Future of Software Engineering, pages 316–325, Washington, DC, USA, 2007. IEEE Computer Society.

60