Use case to source code traceability: The developer ... - CiteSeerX

Use case to source code traceability: The developer navigation view point Inah Omoronyia, Guttorm Sindre † Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway

Marc Roper, John Ferguson, Murray Wood ‡ Dept. of Computer and Information Sciences University of Strathclyde Glasgow, Scotland

{inah1, Guttorm.Sindre}@idi.ntnu.no †, {john.ferguson, marc.roper, murray.wood}@cis.strath.ac.uk ‡

Abstract Requirements traceability is a challenge for modern software projects where task dependencies and technical expertise are spread across system developers, abstract model representations such as use cases, and a myriad of code artefacts. This paper presents an approach that monitors the navigation trails left by developers when building code artefacts to realise project use cases. These trails are analysed to generate a relevance ranking of entities that constitute a traceability link between uses cases and code artefacts and the developers responsible for them. Investigation in a software development scenario shows that a range of use case traceability questions can be answered through visualisations which present ordered relevance lists of the entities associated with use cases and by the use of trace graphs where the size of nodes show the importance, or ’information centrality’, of system entities.

1. Introduction Use cases have become a widespread technique for capturing system functional requirements. Although traceability between use cases and the source code artefacts executing them has been shown to be beneficial, the identification and addition of traceability information still remains a challenging and laborious task [4]. One main reason for this challenge is the interdependency that exists amongst use cases, developers and code artefacts. Traceability involves knowledge acquisition and sharing in a setting where task dependencies and technical expertise is spread across multiple developers, use cases and code artefacts [1]. It is necessary to understand the extent of relevance of each entity associated with an existing trace link based on the context of a selected development activity. With such trace links, answers can be given to trace related questions such as: Which code artefact has most affected the state or current condition

of the use case? On which code artefact or use case has a developer worked most? Who is the appropriate individual to seek for help on a code artefact or use case? The importance of such questions is identified by the study of Ko et al. [7] on information needs in collocated software development teams, and Sillito et al. [10] on the nature of questions raised during a program change task. The aim of this research is to identify trace links that relate code artefacts and developers to use cases and the relative importance of such links based on developer activities during software development. Trace links between code artefacts and developers with use cases, are formed by monitoring events initiated by a developer working in the context of a use case on a code artefact. In this approach, relative importance, or relevance, is based on a number of factors: what kind of actions a developer performs e.g. create, edit and view; how often the action is performed; how dominant the entities are - that is, how many other entities are they already associated with their sphere of influence in the system. The potential of this approach to traceability between use cases and relevant developers or code artefacts is demonstrated through a software development scenario explored using a prototype implementation. By capturing trails of interaction events as code is developed in the context of a use case, relevance rankings are automatically generated. These rankings allow the system to show the relative importance of both code artefacts and developers to system use cases. A historical view allows changes in relevance to be tracked across the project lifetime. Finally, a ’trace graph’ view based on network analysis shows a high level perspective of the overall importance of each use case, developer and code artefact to a software project.

2. Example Scenario Bill, Amy and Ruben are members of a team collaborating to achieve an online cinema ticketing system called

TickX. There are two front-end use cases required to accomplish TickX: Purchase Tickets and Browse Movies. In addition, there will be some use cases for system administrators which are not included here. A number of code artefacts are being developed to achieve TickX, including Ticket.java, Customer.java, Account.java, Booking.java, Movie.java, MovieCatalog.java, and Cinema.java. While Amy and Bill have been collaborating to implement the Purchase Tickets use case, Ruben has been responsible for the Browse Movies use case. The following interaction trails were observed as these collaborators worked on their associated use cases: • While Amy worked on Purchase Tickets she created and updated the Account.java and Customer.java code artefacts. She viewed and updated Booking.java a number of times. She also viewed MovieCatalog.java and Cinema.java. • In Bills early work on Purchase Tickets use case, he viewed Account.java and MovieCatalog.java. This was subsequently followed by his creation and update of Ticket.java and Booking.java. • Ruben’s work on the Browse Movies use case involved the creation and update of MovieCatalog.java, Cinema.java and Movie.java. He also viewed Ticket.java a number of times. In this scenario, the Purchase Tickets use case is associated with Bill, Amy and a number of code artefacts. Also, MovieCatalog.java is associated with the three collaborators as well as the two use cases. Traceability questions that can arise from such associations include: Which code artefacts have most affected the state of Purchase Tickets? Who are the appropriate developers to seek help from on Purchase Tickets? Which use cases has Amy spent most effort on? Which use cases does MovieCatalog.java contribute to? The approach taken in this research is to monitor the developers and code artefacts that constitute the work context of each use case, and to derive their relevance to that use case. The novelty of the proposed approach is a system that does not only harvests traceability links automatically, but also indicates the extent of the relevance of the links and can generate traceability link history over the project timeline.

3. The CRI Model This paper adapts a CRI (Continuum of Relevance Index) model [9] to address key traceability questions such as those mentioned in the previous section. It is first proposed that traceability between use cases, code artefacts and developers can be enhanced by capturing interaction events

Figure 1. Work context graphs.

within a collaboration space. These are captured as developers go about their daily activities defining, updating and implementing the code associated with use cases, leaving historical traces behind. The basis of the CRI model is the monitoring of core interactions with code artefacts such as views, updates, creates and deletes within a collaboration space. The model is used to provide relevance rankings that depend on the work context of the developer. Relevance rankings are then provided of developers and code artefacts to specific use cases. Thus, a developer can be identified as being highly relevant to the current state of a particular use case, but irrelevant to the state of another use case, though all such use cases exist in the same collaboration space. The CRI model is able to capture these diverse relationships and reflect their relevance rankings. The subset of entities considered by the model in a collaboration space includes projects, use cases, developers, and code artefacts. These entities are strongly inter-related. There exist many-to-many relationships between developers, use cases and artefacts, although no relations are currently supported for entity instances of the same type. Relationships between entities are established by the operations that a developer carries out. Rather than monitoring the entire space of interactions that can occur, the CRI model focuses on a core set of four interaction types that influence the changing state of a software project create, update, view and delete. A create event causes the manifestation of a code artefact within a collaboration space. Associated with an update is the update delta the absolute difference in the number of characters associated with the code artefact before and after the event. A view event indirectly affects the state of artefacts, possibly enhancing the understanding in order to update the same artefact or other artefact instances. A delete event transforms an artefact to an intangible state, where it is unable to receive any further events. Deleted artefacts can later be viewed from a historical perspective. During collaboration different work contexts - associations between use case, developer and artefact entities are formed. These work contexts are constantly changing in response to events, and entities may participate in many work contexts. Figure 1 shows example work contexts (represented as graphs) for Amy, Purchase Tickets and MovieCatalag.java .

Weightings are assigned to each interaction event type as shown in table 1. These were derived from the study of CVS records in real development projects [9], and are in line with related work by Fritz et al. [2] that identified the importance of the creator of code artefacts, and studies conducted by Zou and Godfrey [11] that suggested the need to distinguish between random and relevant view events. Thus viewing is weighted relatively lightly compared to creates and updates (weighted by the size of the update in terms of the absolute number of characters changed). Table 1. Interaction type weightings Interaction Type View Update Create Weighting factor 0.001 0.0001* δ 0.01 δ - Absolute update delta (magnitude of the update)

In CRI it is assumed that the size of an entity’s work context or the number of other entities that it exacts its presence on is proportional to its relative influence in the collaboration space. A use case implemented by several developers and artefacts is considered to hold more information about the state of a project than a use case associated with only a small number of developers and artefacts. This dimension is captured by the concept of sphere of influence (SOI). SOI is a general concept used to capture both geographic and semantic groupings, and provides a well defined boundary for interactions [5]. SOI indicates the region over which an entity exacts some kind of relevance (determined by interaction events) and is defined by its work context. The SOI ratio is used to represent the relative influence an entity exacts on the collaboration space. The SOI ratio of an entity is defined as the ratio of the number of unique entity instances directly associated with an entity (the size of its work context) compared to the number of unique entity instances in the whole collaboration space. For the motivating example the SOI ratio of Amy is 6/9 (entities in Amys work context / total number of entities 2 tasks and 7 classes). The concepts of development work context, interaction events, and SOI ratio form the basis of the CRI model. This research uses these constructs to rank requirements traceability links between use cases and code artefacts / developers. These traceability links show how the different entities that constitute a software project are contributing to the evolution of the project. They can also guide developers who need answers to trace related questions. CRI is a linear model that cumulatively builds the relevance values of entity instances as they are associated with interaction events and as their SOI ratios vary. This paper discusses how these cumulative relevance values are derived for the history mode. A more general application of CRI [9] also derives relevance values for the recent mode and can provide a perception of the relevance of the most recent

trace links that have been formed.

3.1. CRI History Mode In the history mode the ranking of entities associated with traceability links is computed by linearly combining the relevance values associated with an entity in a selected work context. The relevance gained as a result of an interaction event is dependent on the type of interaction event and the SOI ratio of the selected entity work context. More formally, the cumulative relevance x gained by an entity instance i in response to an interaction is given by equation 1, where t is the type of interaction (possible values shown in table 1), s the SOI ratio and n the total number of interactions associated with i. Thus, the relevance value for entity i after n interactions is based upon its previous value plus the value of the last interaction multiplied by the SOI ratio of i. Values are assigned to entities in a selected work context. A ranking of entities based on their relevance values forms the relevance list for that work context. x(n)i = x(n−1)i + t(n)a ∗ s(n)i

(1)

3.2. Implementation and Illustration The CRI model was implemented as a client server architecture, where the Eclipse IDE for each developer is a client and the model processing logic and storage of event data is performed on the server. The client monitors sequences of view, update, create and delete events executed within Eclipse (chosen because of its open plug-in architecture). Figure 4 is a snapshot of an Eclipse view of the visualisation component of CRI. Developers can open, activate and deactivate use cases through the popup menu labelled 3 in figure 4. The workflow requires that each time a developer is to carry out a coding activity, they log into CRI and activate an existing use case located in the central repository or create a new one. For each client workstation, only one use case can be active at a selected time, working on another use case requires that the developer activates the new use case which automatically deactivates the previous one. Similarly, an active code artefact is the current artefact being viewed, updated or created. Switching to another artefact automatically deactivates the previous artefact. This workflow enables cross cutting relations amongst artefacts, developers and use cases since, over their lifetime, and as they are used to achieve different aspects of a project, each can be associated with any number of other instances. As interaction events carried out by the developer are traced to the work context of an active use case and artefact on the server the cumulative relevance value of each entity instance involved in the work context is recalculated.

Figure 2. Monitored interaction trails used to achieve TickX across 25 timelines

Figure 3. Historical replay Two visualisation techniques for the ranking of entities based on their relevance values are presented. The first is an ordered list of trace links using varying colour intensity. Entities with the highest relevance values are at the top of the list. The colour intensity indicates the relative difference in relevance values of entities in the ranking. This visualisation also includes the capability to replay the historical evolution of each traceability attribute. The second visualisation uses network analysis to generate 3-partite trace graphs. All edges (representing interactions) are between entities in different subsets. The attributes of each edge are specified using the CRI value of entities in a context to determine its strength. Also, the size of each entity is proportional to an estimate of its Markov centrality, or global importance, in the graph. To illustrate how CRI can be used to obtain trace links with weighted relevance, assume that the interaction event trails shown in figure 2 were used to achieve TickX. Any selected timeline corresponds to at least one event associated with a use case, a developer, and a code artefact. For instance, at timeline 1, a create event associated with Account.java was executed by Amy while working on the

Purchase Tickets use case. Similarly, timeline 7 has two events: Ruben updated Cinema.java (absolute update delta 50) while working on Browse Movies, and Bill viewed Account.java as he worked on Purchase Tickets. Timeline 8 represents a view event carried out by Bill while working on the Purchase Tickets use case using MovieCatalog.java. Subsequent to this, timelines 1, 2, 5 and 7 are other events carried out by Amy and Bill using Account.java within the work context of Purchase Tickets. Thus, the SOI of Purchase Tickets at timeline 8 is 0.67 (the artefacts and developers in the Purchase Tickets work context equal 4 and the total artefacts and developers involved with TickX equals 6). The relevance value of MovieCatalog.java within the work context of Purchase Tickets at timeline 8 is 0.0007. The next event involving the use of MovieCatalog.java within Purchase Tickets is represented in timeline13, the relevance value gained as a result of this event is 0.0006 and the cumulative relevance value is then 0.0013. By the end of the trail in timeline 25 MovieCatalog.java has obtained a cumulative relevance value of 0.0069 within the work context of Purchase Tickets. Similar relevance calculations are carried out for ev-

responsible for the use case. Also, Account.java maintained a highly relevant position throughout the lifetime of Purchase Tickets. Figure 5 is a visualisation of a CRI-valued trace graph for TickX based on the events shown in figure 2. The button labelled 4 in figure 4 triggers the generation of the trace graph. This figure shows the relative high centrality that MovieCatalog.java, developer Ruben, and the Browse Movies use case have in the project. An advantage of this visualisation is that it is based on indirect links amongst entities and hence generalised and void of work context.

4. Discussion and Related Work

Figure 4. List visualisation of trace links

Figure 5. Trace graph for TickX ery other artefact or developer that has been associated with Purchase Tickets and the ranking generates an ordered list of entities that have trace links to Purchase Tickets. Figures 3 and 4 are ordered list visualisations of code artefacts and developers that constitute the trace links for Purchase Tickets and Browse Movies use cases based on event trails of figure 2. Similar lists can be generated and viewed for every entity that has been used to achieve TickX. This is done by selecting the entity of interest from the appropriate pull down menu of the use case, artefact or developer tabs shown in figure 4. Although MovieCatalog.java has been used to achieve both use cases, its relative relevance to Browse Movies is greater than for Purchase Tickets. Figure 3 demonstrates a playback across selected timelines of the evolution of trace links of artefacts (labelled 1a-5a) and developers (labelled 2b-5b) associated with the lifetime of Purchase Tickets. This is obtained by moving slider bar labelled 1 in figure 4. This replay demonstrates that the early phase of achieving Purchase Tickets involved Amy’s work on Account.java. Also, although Bill later became involved with Purchase Tickets, Amy remained more

The requirements traceability model and associated visualisations investigated in this paper can indicate the relevance of trace links between use cases, developers and code artefacts. For instance, figures 3 1a-5a suggest which code artefacts have been most important while achieving the use case Purchase Tickets. Also, figures 3 1b-5b indicates which of the developers may know most about the use case. The historical view of changes in relevance across labels 1a-5a and 1b-5b show how the relevance of developers and code artefacts has changed over the use case lifetime. The history replay technique helps address traceability related questions over the lifetime of a project, such as the key developers to ask questions about use cases as they evolve (see also related work by Gotel and Finkelstein [3]). Historical replay may also help understanding of how a software project has been achieved, which can help in software process improvement. Tracking the trend of relative relevance of entities during different work phases may also provide useful maintenance information, such as who was initially responsible for work on a use case. Also, trace graphs can help determine bottleneck entities in a project. For instance deleting artefacts, updating use case descriptions or removing developers with high information centrality in a software project could be detrimental to the success of a project. In the modelling of CRI, care must be taken to reduce the potential effect of spurious actions such as the accidental viewing of code artefacts. The low weighting factor of view events means that these only become significant with repeated viewing. The current implementation does not take account of the duration of a view event, which may contain potentially useful information - longer views indication greater significance. The obvious danger is that the developer is performing another task while the code artefact is open for viewing. While the weighting of update events based on absolute update delta enables for the measuring of negative updates, this approach does not accurately cater for cases where a specified number of characters were deleted and then re-

written. Neither does the current model distinguish between comments and program code, or potentially useful factors such as developer profile, group dynamics and the structural relations that exist between entities of the same type. Each of these factors can arguably improve the accuracy of CRI. Also, the CRI approach could easily be extended to support user stories instead of use cases, more fine-grained code artefacts (e.g. methods) and design artefacts (e.g. class diagrams). There are several related approaches to the automation of trace link creation and maintenance. These include the application of text mining and Information Retrieval techniques to recover trace links between software artefacts and their abstract model representations [4]. Here, trace links are generated by computing the similarity between a query and each artefact that comprises a software project. This approach requires the existence of a set of initial trace links between artefacts and their abstract models. An approach closely related to CRI is the creation and maintenance of trace links by monitoring the modifications made by users and by analysing change history [8]. Degree of interest models such as Mylar [6] can provide useful traceability insight, but only from the viewpoint of artefacts that are related to a single task. While approaches based on configuration management systems identify a time stamp and update delta for a checked-in code artefact, they have no explicit mechanism to generate the relevance of developers or use cases associated with the code artefact. The proposed CRI approach can provide useful insight into the traceability problem that derives from the multiple interdependencies that exist amongst the vast network of entities in current software projects. Existing systems tend to see traceability as a Boolean relationship (either there is a trace link, or there isn’t). The novelty of CRI is the shift to a fuzzy understanding of traceability, where the extent of relevance of one entity to another is seen as a position in a continuum. Finally, the underlying assumption in CRI has been that the model captures the relevance of developers and artefacts to a use case. This assumption requires empirical validation. There may be other interesting factors which cause increased interaction by a developer on a code artefact in the context of a use case, such as volatility in the requirements related to this use case. This may be highlighted by the centrality of entities in the trace graphs.

5. Conclusion and Further Work This paper presents a relevance indexing approach that enables software system developers to obtain answers to trace questions between requirements expressed as uses cases and code artefacts or developers such as: Which artefacts are most relevant to an identified use case or which

uses cases is a particular artefact or developer most relevant to? The measure of relevance is determined by monitoring the core interaction events during development of the software by developers while working in the context of a use case, together with the notion of sphere of influence which adds weight to influential entities. Potential for further work centres on refining the model design so that it takes account of same entity type relations and investigates the potential of finer grain measurement of viewing time and edit focus e.g. at the method level. CRI assumes a general view of development activity without exploring its specific nature, e.g. maintenance, debugging or refactoring. There is also an important question of whether developer roles e.g. manager, specialist, tester should be recognised within CRI. There is also a need to validate the model in realistic development scenarios - initial feedback suggests that the trace graph can become unwieldy with increasing project size.

References [1] D. Damian, S. Marczak, and I. Kwan. Collaboration patterns and the impact of distance on awareness in requirementscentred social networks. In Proc. RE’07, pages 59–68, Delhi, India, 2007. [2] T. Fritz, G. C. Murphy, and E. Hill. Does a programmer’s activity indicate knowledge of code? In In Proc. ESEC-FSE ’07, pages 341–350, Dubrovnik, Croatia, 2007. [3] O. Gotel and A. Finkelstein. Contribution structures [requirements artifacts]. In In Proc. RE ’95, page 100, York, England, 1995. [4] M. Grechanik, K. S. McKinley, and D. E. Perry. Recovering and using use-case-diagram-to-source-code traceability links. In In Proc. ESEC-FSE ’07, pages 95–104, Dubrovnik, Croatia, 2007. [5] C. Gutwin, S. Greenberg, and M. Roseman. Workspace awareness in real-time distributed groupware: Framework, widgets, and evaluation. In In Proc. HCI ’96, pages 281– 298, London, UK, 1996. [6] M. Kersten. Focusing knowledge work with task context. PhD thesis, UBC, British Columbia, Canada, 2007. [7] A. J. Ko, R. DeLine, and G. Venolia. Information needs in collocated software development teams. In In Proc. ICSE ’07, pages 344–353, Minneapolis, USA, 2007. [8] P. Mader, O. Gotel, and I. Philippow. Enabling automated traceability maintenance by recognizing development activities applied to models. In Proc. ASE’ 08, pages 49–58, 2008. [9] I. Omoronyia. Enhancing Awareness during Distributed Software Development. PhD thesis, Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK, December 2008. [10] J. Sillito, G. C. Murphy, and K. D. Volder. Asking and answering questions during a programming change task. IEEE Trans. Softw. Eng., 34(4):434–451, 2008. [11] L. Zou and M. W. Godfrey. An industrial case study of program artifacts viewed during maintenance tasks. In In Proc. WCRE ’06, pages 71–82, Benevento, Italy, 2006.

Use case to source code traceability: The developer ... - CiteSeerX

Use case to source code traceability: The developer ... - CiteSeerX

Suggest Documents

Traceability between Function Point and Source Code

Traceability between Function Point and Source Code

Recovering Use-Case-Diagram-To-Source-Code ... - Semantic Scholar

To code, or not to code: lossy source-channel ... - CiteSeerX

Preserving Use Case Flows in Source Code

developer.* Code Comprehension - developer.* Magazine

Experiments in the Use of XML to Enhance Traceability ... - CiteSeerX

Executable source code and non-executable source code ... - CiteSeerX

Improving Source Code Locality - CiteSeerX

Mining Source Code Descriptions from Developer ... - Semantic Scholar

Mining Source Code Descriptions from Developer ... - IEEE Xplore

Improving Source Code Locality - CiteSeerX

Source Code Plagiarism Detection - CiteSeerX

Error-correcting Source Code - CiteSeerX

Porting Source to Linux - NVIDIA Developer

Seeking the Source: Software Source Code as a Social ... - CiteSeerX

Tracking Code Clone for Software Traceability and Quality - CiteSeerX

On the Use of Discretized Source Code Metrics ... - Computer Science

Game Developer - December 2007 - Source

On the Detection of SOurce COde Re-use - UPV

On the Detection of SOurce COde Re-use - UPV

Test-to-Code Traceability using Slicing and ... - Semantic Scholar

A Probabilistic Approach to Source Code Authorship ... - CiteSeerX

Connecting Software Design Principles to Source Code ... - CiteSeerX