Seeing is believing: Linking data with knowledge - Semantic Scholar

1 downloads 0 Views 597KB Size Report
Jun 4, 2009 - which were inspired by artefacts in use at Rolls-Royce plc. Another tool that was ...... In: M.F. Costabile (ed.) Proceedings of the Working.
Original Article

Seeing is believing: Linking data with knowledge Aba-Sah Dadziea, *, Vitaveska Lanfranchia and Daniela Petrellib a

The Department of Computer Science, The University of Sheffield, Regent Court, 211 Portobello, Sheffield, South Yorkshire, S1 4DP, UK. b The Department of Information Studies, The University of Sheffield, Sheffield, South Yorkshire, S1 4DP, UK. E-mail: [email protected]

Corresponding author.

Abstract The analysis of data using a visual tool is rarely a task done in isolation, it tends to be part of a wider goal: that of making sense of the current situation, often to support decision-making. A user-centred approach is needed in order to properly design interaction that supports sense-making incorporating visual data analysis. This paper reports the experience gained in X-Media, a project that aims to support knowledge management (KM), sharing and reuse across different media in large enterprises. We report the user-centred design approach adopted and the design phases that led to the first prototype. A user evaluation was conducted to assess the design and how different levels of data, information and knowledge were mapped using alternative visual tools. The results show that a clear separation of the visual data analysis from other sense-making sub-tasks helps users in focussing their attention. Users particularly appreciated the data analysis across different media and formats, as well as the support for contextualising information within the broader perspective of KM. Further work is needed to develop more fully intuitive visualisations that exploit the richer information in multimedia documents and make the multiple connections between data more easily accessible. Information Visualization (2009) 8, 197 -- 211. doi:10.1057/ivs.2009.11; published online 4 June 2009 Keywords: user-centred design; information visualisation; knowledge sharing; ontologies; usability evaluation; human-computer interaction

Introduction

Received: 6 March 2009 Accepted: 8 April 2009

The classic case of the cholera epidemic around the Broad Street pump in nineteenth century London, the source of which the anaesthetist John Snow (1813–1858) discovered by geographically mapping deaths due to cholera over 10 days in late 1854,1,2 demonstrates the power of visualisation for analysis and problem solving: plotting cholera victims on a 2D spatial representation instead of a 1D temporal dimension supported Snow’s reasoning about the possible cause of the epidemic. The use of a map provided context for the factual evidence, supported data comparison and pointed out anomalies that needed to be explained before a hypothesis could be formulated.2 The map became a reasoning tool for Snow, who was looking for confirmation of his theory that the spread of the disease was through water and not air as was still believed at the time. A fundamental step performed by Snow when plotting the deaths on the map was that of transforming the data into information. Data are a product of observation, and information is a transformation of data into a more effective and usable form.3 Although data are somewhat reduced when transformed into information, for example, the temporal dimension was lost in Snow’s map, the descriptions created are richer and better support reasoning. Snow, for example, noted workers of a brewery close to the pump were not affected, prompting him to seek reasons for this exception.2

© 2009 Palgrave Macmillan 1473-8716 Information Visualization www.palgrave-journals.com/ivs/

Vol. 8, 3, 197 – 211

Dadzie et al

To understand the meaning of potentially related information, a further step is needed: that of transforming information into knowledge. Snow formulated a hypothesis and used other mappings of the data on the epidemic to collect further evidence for his theory.2 As this example shows, the step from information to knowledge is performed through reasoning. Knowledge, as a human characteristic, is individual and personal, related to facts or concepts and mediated through interpretation and judgement. Knowledge is ‘the result of cognitive processing triggered by the inflow of new stimuli’.4 Much debate has surrounded the hierarchical definition of knowledge as different from and superior to information and data (see Alavi et al 4 for an overview), the form knowledge has (tacit or explicit) and how it is shared with others.5 What is of interest from the perspective of this paper is which characteristic a system designed to support knowledge visualisation and management should have: ‘[they] may not appear radically different from other forms of information systems, but will be geared toward enabling users to assign meaning to information and to capture some of [users ’] knowledge in information and/or data’.4 We share the vision that data, information and knowledge are hierarchical and correspond to different layers of understanding. Different types of visualisations have then to be defined in order to support these different levels of thinking and understanding during sense-making activities. Sense-making has been defined as: ‘a motivated, continuous effort to understand connections [. . .] in order to anticipate their trajectories and act affectively’.6 As such all levels of data, information and knowledge have to be accessible to whoever is conducting an investigation. In complex situations it is not unusual that evidence comes from different sources: numeric data from multiple sensors, images captured under varying conditions and text generated for any of a number of purposes by different people. This multimedia material may be physically dispersed (for example, on widely accessible servers or locked away on individual computers), and may therefore need to be pulled together in a meaningful way in order to see the big picture emerging from a scattered plot. Although text may appear to provide explicit explanations that correspond to information or knowledge, any multimedia material may correspond to any of the levels of data, information and knowledge. Text could work as data if it records a factual event; numeric data can be transformed into information by a graphical display that illustrates patterns of similar occurrences. The first challenge for any visualisation tool that aims at supporting sense-making is to support a single visualisation across media, to abstract from data instances and move towards knowledge. As with all complex cognitive activity, sense-making can be decomposed into smaller tasks, for example, formulating a new hypothesis, questioning its validity, elaborating on the hypothesis with detail, comparing alternatives.7 These diverse tasks require different interactive support, not only in terms of visualisation

198

© 2009 Palgrave Macmillan 1473-8716

but also the need for manipulation and management. A second challenge is therefore to identify and design the most appropriate visualisation for the type of task being performed, keeping in mind that this is only one part of a larger and more complex activity. A third challenge is that sense-making in complex situations is rarely done on an individual basis, but tends to involve a team of experts engaged in collective discussions that lead to final decisions. Although the evidence gathering and analysis may be done by individuals, the results are presented to the larger team. Understanding human activities in such complex conditions is fundamental for the design of tools that effectively support users in collaborative and complex data analysis. This paper discusses the User-Centred Design (UCD) of a software environment to support teams in sense-making activities. A case that describes aerospace engineers investigating the root cause of an issue observed in one (or more) gas turbine engines is used throughout the paper as an example of complex data analysis and synthesis. A simulation of a typical case is presented where evidence is contained in cross-media legacy data and in new documents created during the investigation process. Working in partnership with end users during requirements definition, system design and evaluation, a set of visual data analysis and knowledge sharing tools were developed. Multiple user-centred techniques were employed, each feeding a different part of the system design; the design encompassed a number of visualisation solutions as well as a framework for decomposing the sense-making process into focussed tasks. A user evaluation confirmed the advantages a graphical overview provides to data analysis, knowledge creation and sharing. The paper is organised as follows: we continue with a review of the state of the art, looking at recent developments in information visualisation and updates in UCD methodologies. We then discuss a generic design framework and explain how it was applied in the X-Media8 project. We continue to look at the results of user workshops, which fed into the working prototype built for the first phase of X-Media. Finally we discuss the outcome of the series of meetings held with user representatives to ensure that the research and design ideas were translated into a prototype that communicated the potential for improved knowledge management (KM) to end users, while still allowing the technical and usability evaluations required to provide feedback to the research team and the project at large.

Information Visualisation as a Tool for Human Reasoning The power of information visualisation to support the analysis of large, constantly updating, complex data sets can be seen in the quantitative and anecdotal evidence gathered during the evaluation and use of myriad

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

information visualisation applications.9–11 Using visualisation to improve data analysis has become the focus of the emergent field of visual analytics,12,13 which looks not just at visualisation in isolation, but within the context of human factors and KM, putting the user in the driving seat in order to harness optimal interaction between the human and the machine, and obtain truly effective visual analysis of the large, complex data involved. Obtaining the optimal visualisation for any data set, in order to extract and share the knowledge the data contains, is however not simple; considerations that need to be made include the target end users, the expertise they bring to the use of computers and visualisation tools, expertise in their domain, and the resources available for collecting, processing and analysing data.14 Generating visualisations that provide an abstraction of a data set such that an intuitive overview is obtained, with options for analysis of Regions of Interest (ROIs) within the context of the overview and that cater to the goals of the end user, remain challenges for both visualisation research and even the most advanced visualisation tools. Building visualisations that support complex analysis and problem-solving in X-Media, in addition to enabling improved sharing of the knowledge that this analysis retrieves, requires, as aptly described in Shrinivasan and van Wijk15 , visualisations that encapsulate all three of the data, the navigation and the knowledge views such that users are able to move effortlessly between the three perspectives. The definition of a suitable, intuitive overview may vary significantly, even for the same data set, depending on the user’s information needs. This may mean a visualisation that displays all the data, which is prone to users getting lost within the detail of what may be superfluous information. Alternatively, this may be defined as a filtered view that displays only the most important elements within the data or a set of clusters that group data by specified categories, with the danger of hiding relevant information. In either case, this is a decision that requires a good understanding of users’ data analysis and KM needs 10,16 and in most cases, an iterative cycle of design, development, evaluation and sometimes even a revision of requirements. The system we built for the first phase of the X-Media project includes a number of tools that support the KM process for the Issue Resolution (IR) test bed. We focus in this paper on the work done in graph visualisation to provide an abstraction of the knowledge that users build up during the IR process, based on previous learning captured in a domain ontology and the links to evidence retrieved from legacy data, newly created documents and implicit knowledge captured to a persistent data store. Recent work in interactive (tree and network) graph visualisation includes TreePlus,16 which uses an enhanced layout to minimise the clutter that is common for dense data sets, by interactively expanding the tree from a single node to visualise ROIs. TreePlus places an emphasis on the exploration of ROIs in a graph and the readability of node labels in the ROI rather than the more commonly

© 2009 Palgrave Macmillan 1473-8716

addressed challenge of providing an overview of (large) data sets and the cross-linking that may occur within the data. Perer and Shneiderman17 employ principles similar to those we found to be useful for exploratory analysis in the network visualisation application, SYF, they describe for systematic and exhaustive analysis of social networks for domain experts. SYF makes use of information obtained during use of the system to guide the discovery of new paths in the same or related data sets. An important feature of SYF is retaining the flexibility required to allow users to explore alternative paths that may lead to new insight, while still reminding users about relevant regions of the network that have not been explored. Support for flexible annotation of ROIs in SYF provides a method for recording the insight revealed during exploration of the information space, aiding the sharing and re-use of the information retrieved. Jigsaw11 employs a visual analytic system to aid sensemaking during the analysis of large text corpora, by highlighting inter-connections between entities across documents. Jigsaw was developed to counter the cognitive challenges faced in trying to remember the relationships between entities extracted from data, by using visual indices to represent the information contained within documents. Jigsaw focuses on helping users determine which (text) documents to analyse next and where patterns exist within a data set. X-Media needs to provide support not just for where to find the next relevant piece of evidence within cross-media documents, but also counter-evidence for hypotheses, how each piece of evidence fits within the entire picture, and measures of uncertainty for the evidence in question, before constructing the information space on top of the domain ontology that provides the structure around which the investigation is carried out. An important feature of Jigsaw that is only partially implemented in the current phase of the X-Media framework is the use of entity extraction to identify inter-connections between documents. Entity extraction and other knowledge acquisition modules in X-Media for text, images, numeric and cross-media data are currently run offline over the legacy documents for semantic and similarity search modules; we are still in the process of discussing requirements for feedback to the machine learning algorithms, to determine how best to update the knowledge repository dynamically with information from transparent and explicit annotation (by multiple users) of the data within the X-Media system.

Designing User-Centred Knowledge Views Iterative UCD18,19 has been used extensively as a way to support the progressive evolution of a design from the user requirements definition through to system deployment. Although this process is effective for well-specified user activities, it may be too narrow to support the exploration of as yet to be uncovered research ground.

Information Visualization Vol. 8, 3, 197 – 211

199

Dadzie et al

information on users’ profiles, their working practices, the various levels and types of expertise users bring to their work, and the amounts and types of data accessed. The understanding of the users gained from the study was instrumental in writing a scenario of use20 that captures the processes carried out by users. We extend the traditional scenario to include the following facets that complement the narration (see Figure 2):

Figure 1: The iterative, user-centred design-evaluation cycle followed.

Especially for research that involves intelligent systems, such as that being done in X-Media, design ideas are often generated from research experience and creativity, rather than through direct interaction with users. Users are unlikely to be able to envision possibilities that are alien to their everyday experience and it is therefore the designer’s responsibility to formulate a vision in a way that is understandable to the end user. The iteration is therefore not limited to new versions of a prototype, but extends to cycles of refinement of ideas generated by designers and validated by users in different ways, such as in focus groups.14 Requirements specifications and design ideas were discussed at an abstract level among researchers in X-Media and at a practical level with users. Our goal throughout the UCD cycle was to ensure common understanding and interpretation of the requirements, to identify where the requirements map to users’ tasks, and to determine what technology could be used to improve current practice. Abstracting from the pragmatic principles of UCD, four phases that link design and evaluation can be identified: Rationale, Conception, Envisioning and Evaluation (see Figure 1). The Rationale phase explores the aims of the system under design as well as the end user requirements. The Conception then follows, to translate the abstract objectives into a concrete description that directly feeds the prototype in the Envisioning phase of the conceptual design. Finally, Evaluation can take different forms: from a team-based validation of the underlying ideas in the Conception (design) and Envisioning phases, to formal usability evaluation of stable prototypes, which we have recently completed and which we report, to field trials. In X-Media the overall aim is to support the creation, management and sharing of organisational knowledge by means of semantic web technologies. Direct observations and semi-formal interviews were used to gather

200

© 2009 Palgrave Macmillan 1473-8716

1. Rationale: describes the technical motivations identified via the narration; 2. Technology: describes existing functionality and/or new technology to be developed to provide underlying support for the system; 3. Questions: lists the questions raised by developers to be discussed among the X-Media partners, to communicate issues and look for answers to them. In particular, this column is useful for communicating assumptions made about the system’s capabilities; 4. Interface: provides descriptions of or links to snapshots of the interface. Our design ideas for visual knowledge sharing were embedded in a mid-fidelity prototype that was used to discuss potential solutions with end users and researchers in dedicated workshops. The prototype was very effective in communicating our design ideas and engendered further discussions as it helped the users to understand more fully the potential for enhancing their everyday work. New ideas for interaction were developed that built on working practices in place, such as the interactive tables provided to users to organise their thoughts and link them to the knowledge contained in legacy data, which were inspired by artefacts in use at Rolls-Royce plc. Another tool that was envisaged during meetings with users evolved into the Root Causes Tree. This visual analysis tool was refined during focus group sessions: the mid-fidelity prototype was used to show examples of ways in which tree graphs and semantic networks (node-link graphs) could be employed for data analysis. The ability of the dynamic prototype to aid the communication of ideas was reinforced by the results of the discussions held; previous attempts to convey the added power that the use of such graphs could bring to KM had not been as successful. The users related especially well to the graphs as they already make use of other graphical tools for hypothesis investigation.

System Modules and Overall Architecture Design rationale The user studies conducted in the initial stages of the project allowed us to gain an understanding of the IR process. IR starts with the identification and definition of an issue (see Figure 2). This is followed by the initial evidence collection, starting from the first set of

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

Figure 2:

Excerpts from a hypothetical instance of the issue investigation process.

documents received by the investigation team. Further search over legacy data is used to retrieve information on the same or similar issues. Team members begin to analyse the data and organise the evidence retrieved, a process which currently relies predominantly on the expertise distributed within the team. The ultimate goal is to identify evidence that leads to the most likely root cause of the issue. Regular meetings are held to discuss progress, review the analysis performed by individual team members, and confirm or discount potential hypotheses formulated. Although the entire issue investigation process involves a team, a lot of the analysis is performed on an individual basis, with team members meeting regularly to report findings and discuss hypotheses for the root cause of the issue. Personal networks are often used as routes to sources of expertise and knowledge. Supporting both collaborative and individual workspaces was therefore a very important feature of the interface design. We describe briefly here the integrated system, with an emphasis on the interactive elements that support the IR process. We then continue to discuss the support for visual analysis

© 2009 Palgrave Macmillan 1473-8716

that highlights relevant information and the relationships that occur between apparently disconnected pieces of evidence. The design rationale carried out resulted in user interaction to support 1. multiple levels of knowledge representation, from high-level summaries to organised analysis to scattered detail; 2. a clear breakdown of the knowledge captured, in order to support sense-making; 3. the contextualisation of information (contained within documents) and knowledge (contained within data elements recording users’ comments, conversations, open questions, requests and action lists) attached to points of relevance; 4. the use of cross-media documents and information and knowledge stored across multiple data elements; 5. the high-level analysis required by the knowledge workers within a simple interface with a low learning curve, and that is able to run on a typical end user portable or desktop computer.

Information Visualization Vol. 8, 3, 197 – 211

201

Dadzie et al

Figure 3: The Workbook and the Semantic Clipboard. The Workbook in the Analysis tab shows the dialogue (bottom, right) used to import resources – files or snippets of information, and set metadata to describe the data element created. The Semantic Clipboard at the bottom contains several data objects visualised using representative icons. (Please note the snapshots of the user interface have been edited for content.) While we focus on the contribution of the graphical analysis and knowledge sharing and presentation tools in this paper, a brief description of the additional tools that are used for the complete IR process is necessary to show the interdependencies between the tools and how each contributes to the full KM cycle. We use a breakdown of the process into four main stages that map to a collection of technologies developed to support end users: Status, Search, Analysis and Closure. The scenario describing IR in an aerospace engineering domain is used to illustrate the functionality developed to support KM.

Interaction with knowledge objects A core component for KM in X-Media is the creation of data objects to record explicitly the implicit knowledge acquired through experience and from other knowledge workers in a user’s social network. A simple visual encoding uses icons to describe the type of data contained in each object, supplemented by an editable, descriptive label using balloon text (which defaults to the file name or X-Media URI). A context menu is provided with options

202

© 2009 Palgrave Macmillan 1473-8716

for interacting with each data object and for retrieving the metadata stored for the data it references. The Semantic Clipboard is an always visible, temporary store used to share data objects within the information workspace (see the bottom region of Figure 3). Drag and Drop (DnD) functionality is used to move the icon representing each data object between the clipboard and the tools enabled to receive and process the knowledge contained. The result of this action differs based on tool usage, for example, dragging a data object onto a node in one of the graphs attaches this object as evidence for or against the root cause hypothesis that the drop target represents, while dragging a data object into an interactive table cell indicates that the information it contains provides evidence that complements the (free text) contents of that cell. The Analysis section of the tool contains the Workbook, which provides a permanent visual information space for storing data elements in the categories • documents, • conversations (also reusable as meeting records), • question and answer (Q&A) sessions (also used as action lists).

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

Figure 4: Evidence collection and setting the status of the investigation. Initial evidence collection phase using the Case Definition Form and the interactive Contrasts & Comparisons and Issue Summary tables in the Status tab.

The Workbook serves as a knowledge container, a repository that stores a reference to each data element in use in the system. The Workbook provides functionality for sorting and categorising these data in ways that are meaningful to users, to extract the knowledge contained for reuse throughout the system. A more comprehensive data overview is available than in the Clipboard as the larger window allows more metadata to be displayed permanently for each element; users are able to obtain more information at a glance about each data element.

retrieved from documents and other data objects attached to each cell. The Issue Summary table (bottom, right, Figure 4) records, in addition to a summary of the current issue, data from instances of previous, related events. Work is in progress to feed the information collected into a scatter plot that displays the history of the engine of interest in a timeline that may also serve as a geographically based plot (using airport locations). The provision of a timeline was seen to be especially valuable at the start of the IR process, as it supports inference into the evidence available from historical data for the investigation.

Forms and interactive tables Interactive graphs The Status area of the system uses forms and interactive tables to capture the status of the investigation. At the start of each issue investigation the Case Definition Form (see top, left, Figure 4) is completed by users, to collect the initial search criteria (for example, engine serial number, component descriptions, initial observations and symptoms of the issue). The information is used to retrieve forensic evidence from legacy corpora. Two interactive tables allow users to track information obtained throughout the course of the investigation and link this to evidence retrieved from data. The Contrasts & Comparisons (or Investigation Tracking) table (centre, Figure 4) allows users to capture their reflections on the analysis being performed, supporting this with evidence

© 2009 Palgrave Macmillan 1473-8716

Although the use of node-link graphs and trees is not new (Herman et al21 among others) there has been significant work over time to harness the power of insight the graphical visualisation gives to users through its ability to provide an overview of data and the relationships that occur within data. Current research is also looking at support for intuitive navigation through data and methods that overcome the difficulties associated with traversing graphs, namely, clutter and the tendency to get lost while navigating through very large data sets. Further, it is often necessary to provide highly customised visualisations to make full use of the power that visualisation brings to data analysis and knowledge retrieval

Information Visualization Vol. 8, 3, 197 – 211

203

Dadzie et al

Figure 5:

Design sketches capturing the analysis and reasoning required in the Causes Tree shown in Figure 6.

in different domains.10,22,23 We find that involving users throughout the design and development of interactive, visual analysis is crucial if useful and usable tools are to result that will indeed aid users in performing effective KM. Knowledge workers in exploratory mode are most effective when they have a good understanding of the support provided by analytical tools.11 A very important requirement was therefore to give users control over the generation of the structure of the information space and the representation of the knowledge built up during the investigation of hypotheses for the root cause of an issue. Following on from the Rationale phase, we designed a module for visualising the structure of the root cause analysis, by constructing first a Root Causes Tree using an extensible tree graph to which may be attached evidence for or against each hypothesis for the root cause of the issue. Although the tree graph allows a focus on each cause and the evidence attached to it, relationships between cause nodes due to evidence held in common are not easily recognised. We therefore developed an alternative visualisation, the Knowledge Graph, using a semantic network, to reveal the secondary relationships hidden by the tree visualisation and to highlight clusters of related knowledge. The interactive graph module makes use of the prefuse visualisation toolkit.24 The visualisations are built on the structure of the Root Causes Ontology, one of the sub-ontologies that form a part of the Rolls-Royce domain ontology, that captures the knowledge and experience built up by engineers during root cause investigation. The causes ontology provides a structured repository that captures the knowledge acquired through experience in the field, and is used to guide users in a converging but exhaustive investigation of all avenues that could lead to the determination of the root causes of issues raised. This knowledge source is especially useful to end users with low experience in root cause analysis. Structuring the visualisations on the spine of the ontology results in the implicit tagging of data objects attached to the ontology concepts represented by graph

204

© 2009 Palgrave Macmillan 1473-8716

nodes. These annotations also simultaneously classify data into related clusters. The structure provided by the ontology further provides some degree of uniformity in the presentation of the analysis performed by individual knowledge workers, aiding the ability to share the knowledge extracted, while still maintaining the flexibility required to allow users to pursue alternative paths during exploratory discovery.

The root causes tree The tree graph visualisation provides a hierarchical breakdown of the root cause analysis, with each tree node representing a potential root cause of the issue under investigation. Figure 5 shows design sketches capturing the requirements from knowledge acquisition to the presentation of the knowledge retrieved to users. Before constructing the tree users would have recorded explicitly the knowledge gained from the evidence presented at the start of the investigation, using the forms and tables available for this purpose. This preliminary understanding of the analysis provides the basis needed to construct and populate the Causes Tree. The interactive build is constructed from a single root obtained by selecting the most appropriate concept from the domain ontology. Subsequent cause nodes may follow suggestions from the ontology, or users may create new concepts that describe potential root causes not previously recorded. Figure 6 shows an advanced stage in the construction of a Root Causes Tree based on users’ understanding of the knowledge built up during an issue investigation, to create the visual information space that encapsulates the analysis performed and the knowledge accumulated. Context is provided for the evidence that contributes to the analysis by attaching data objects to relevant cause nodes. Supplementary information may be recorded using comments attached to nodes; an example is shown in Figure 6. Graph node labels and simple colour coding highlight areas of interest and indicate the likelihood of involvement that a node may be the root cause of an issue. Node text

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

Figure 6: The Root Causes Tree. A user's interpretation of the knowledge space created during an issue investigation is visualised using the Causes Tree. Evidence (from documents and conversation objects) is attached to several nodes, each with a level of confidence. An icon indicates that a comment is attached to the node Build (on the far right).

Figure 7: The Knowledge Graph. The Knowledge Graph visualises the knowledge space represented by the Causes Tree in Figure 6 such that inter-connections between cause nodes and associated evidence are highlighted. colour defaults to black, changing to orange if at least a single piece of evidence is attached to the node, and fading to grey when the node is explicitly discounted as a potential root cause of the issue. Colour is also used to indicate the degree to which each piece of evidence contributes positively or negatively to the assertion that a node is the root cause of an issue. Green indicates that the evidence for a cause is clearly not involved in the issue being investigated, changing gradually to amber when the likelihood of involvement cannot be determined, then to red when the evidence point to the cause clearly being involved in the issue.

The knowledge graph The main advantage of the Causes Tree is the structured view on the knowledge accumulated as the investigation

© 2009 Palgrave Macmillan 1473-8716

proceeds. One limitation, however, is that the tree graph displays only the relationships between connected node pairs and the evidence attached to each node. Although this may also be seen as an advantage, allowing users to focus on restricted areas of interest without the clutter that may occur for large, highly interconnected graphs, the Causes Tree does not provide a complete overview of all the relationships that occur in the knowledge base and that contribute to the root cause investigation. An alternative to this is the Knowledge Graph which uses a semantic network to visualise the connections between each piece of evidence and all cause nodes the evidence is attached to. This highlights the inter-relationships that may occur between causes, as illustrated in Figure 7. This is an option not available in tree graphs, as the layout hides the inter-linking (outside immediate parent–child relationships) that occurs within a data set (noted also by

Information Visualization Vol. 8, 3, 197 – 211

205

Dadzie et al

Lee et al16 among others). A benefit of the force-directed layout in the Knowledge Graph is the natural clustering of related nodes that simultaneously highlights outliers by pushing them away from the central region of the graph. An important requirement for the usability of graphs is the maintenance of a stable layout; the use of a forcedirected layout in the Knowledge Graph, however, means variations in the layout for each run of the spring algorithm. This is one advantage that tree graphs have over spring-based graphs.16 Recognising this, the option to disable the continuous laying out of the graph is provided; to optimise the layout when new nodes are added to the graph the spring forces may be re-enabled. Two additional options are available for working with what may, especially for a large graph, appear to be an overwhelming display of nodes. Clicking on a node centres the focus on that node, after which the number of levels displayed from the focus may be varied to hide nodes that fall outside a specified depth. Dynamic query sliders allow users to construct (up to four) simple AND queries to highlight data of interest, providing (multiple) foci within the overview. The Knowledge Graph employs the same colour coding scheme for nodes as in the Causes Tree. Cause node sizes are weighted based on the likelihood of involvement in the issue being investigated, providing an additional visual cue for analysis. Updates to either graph are reflected in the alternative view, in order to support seamless switching between the alternative visualisations.

Usability Evaluation In line with the UCD process followed, a formal usability evaluation was carried out at the end of the first phase of the X-Media project. We describe the methodology followed, present a profile of the users and an analysis of the results of the evaluation.

able to discuss actions to take based on the information available, and how best to make use of the interface to incorporate the new understanding they obtained as the analysis progressed. The participants switched roles and the control of the mouse between tasks in order to allow the observation of all users interacting with the system. This also controlled for any bias due to computer or other relevant expertise. A consent form and a pre-evaluation questionnaire were used to collect demographic information about the users. Two evaluators observed the users to capture qualitative (subjective) data on the use of and participants’ reactions to the system. A user satisfaction questionnaire was used to collect subjective feedback from the participants after the performance of each task and at the end of each session. Finally, brief, informal discussions were held with the participants after the evaluation sessions to obtain more information about their immediate reactions to the system and where they saw the potential for added value to their current work practices. The user tasks The two tasks were designed by usability experts (our research group) and domain experts (at Rolls-Royce) in order to obtain a setting that represented a realistic IR case, working from legacy data for a closed case. This ensured that the tasks were • targeted to both the intended audience and the evaluation goals, focusing on the aspects of the interface that would provide the most value to the work of the target end users; • realistic in the context of use; • technically correct and challenging enough to be able to effectively simulate a real IR case; • complex and open-ended enough for the participants to engage with both the tasks and the system.

Evaluation methodology

The two tasks follow:

Before carrying out the formal usability evaluation a series of lab-based stress tests were performed, to ensure that the system developed could support multiple, simultaneous use while still maintaining interactive response. A pilot was carried out with three IR experts on site at Rolls-Royce in Derby, England, to test the system and the evaluation procedure. The formal evaluation was carried out at the same location, and involved 12 users from three departments, Design, Development and Service, working in pairs to complete two tasks. The three user groups allowed us to examine potential relationships between users’ expertise or department and their approach to the analysis, as well as their understanding and the value they felt the system would bring to their normal work. Carrying out the evaluation in pairs allowed a setting that reflected more closely IR investigations carried out in the field: the participants were

Task 1: You have just joined an established issue investigation team. An IPT (Investigation Process Team) meeting is to start in 20 min. Use the X-Media system to review the status of the investigation before the meeting. Task 2: At the IPT meeting new evidence has been introduced. Following the meeting you have been asked to add the new evidence to the root causes tree as you feel is appropriate. This may require expanding the current tree and/or adding your own personal view to the current investigation record. You have 20 min for this activity.

206

© 2009 Palgrave Macmillan 1473-8716

The tasks, although organic, were designed to encourage the use of different aspects of the interface. Task 1 was aimed at the evaluation of the forms and tables in the Status tab of the system, which had been pre-populated

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

Table 1: The range of specifications for the client machines. While the use of high-end machines with large screens and very high resolution is ideal in the use of visualisation applications it was important to perform the evaluation in an environment that matched users’ normal working environments and resources. The client-server setup used for the evaluation, while disconnected from the main LAN (local area network) for security reasons, mirrored, as far as was practicable, the typical set-up of computing resources for data analysis at Rolls-Royce CPU Intel Intel Intel Intel

Core Core Core Core

2 2 2 2

Duo Duo Duo Duo

T2400 @1.83 GHz T7300 @2.00 GHz T7200 @2.00 GHz T5600 @1.83 GHz

RAM

Hard disk

Resolution

O/S

1 GB 2 GB 1 GB 2 GB

120 GB 150 GB 120 GB 160 GB

17in, 1680 ∗ 1050 13in, 1280 ∗ 1024 15.4in, 1200 ∗ 800 17in, 1440 ∗ 900

Windows Windows Windows Windows

XP XP XP Vista

Table 2: Server specification CPU

RAM

Hard disk

O/S

Dual Intel Xeon 2.33 GHz Quad Core

16 GB

8 ∗ 73 GB

MS Windows Server

with background information that simulated the investigation in process: documents (presentations, reports, photos and tables) and other data elements (conversation and Q&A sessions), distilled into knowledge, that could be used to prepare a newcomer to the team to analyse the deeper levels of knowledge captured in the Root Causes Tree. The setting created was intentionally incomplete, containing also some contradictory evidence, in order to simulate a realistic case and provide a good measure of participants’ understanding of the tasks, the case presented and the system being evaluated. The system set-up The user interface described was run using copies of the JavaTM application on client machines (all laptops – see Table 1), each of which was connected to a single server (see Table 2) using a wired local area network (LAN) via a single hub. The server hosted the X-Media Kernel in a JBoss container (see https://www.jboss.org). The clients performed user validation with the Kernel via web services over HTTP at the start of each session, and all subsequent calls to the Kernel, such as document access and data element creation, instantiated and used web services as required. Interaction with the system was logged transparently, capturing events such as the opening of tool windows, actions carried out using button commands on forms, the creation of new nodes in either graph, and the transfer of data elements between windows using the DnD facility. Analysis of results User profiles At the present time only service engineers regularly perform formal issue investigation, while designers participate on occasion as experts. Development engineers have some experience of the process as they may carry out IR in the context of a new engine being tested.

© 2009 Palgrave Macmillan 1473-8716

Ages and years of experience were equally distributed between the three user groups. However, the approach to carrying out the tasks was found to vary based on users’ current roles and experience; some service engineers, for example, citing prior experience – ‘I know it cannot be it’ – as a reason for discounting a branch of the Causes Tree as leading to the root cause of the issue. While we acknowledge that the number of users who evaluated the system is too low to obtain statistical validity, the importance of performing the evaluation with real end users – grounding the evaluation within the context of use14,23,25,26 – provided more valid feedback on the value of the technologies developed and the new approach to KM that we are studying, than we would have obtained with a much larger user set of what would be non-domain experts and in an environment that did not match that of the end users.

Getting started and gaining understanding (Task 1) The first task required the participants to study the (pre-populated) information contained in the Status tab, then start the exploration of the Root Causes Tree. The interaction logs provide information on context switching: all but one user pair made use of all the tools in the Status and Analysis tabs, and one pair performed a number of searches across the repositories available. We did not analyse participants’ searching activity as the search tools were not assessed as part of the evaluation. The participants quickly became engaged with the task and the KM tools appeared to become transparent to them, especially those for whom IR is a regular component of their normal work. The participants got involved in completing the task, trying to resolve the issue, rather than just actively evaluating the system itself, despite being informed that this was a fictitious case. This provides a good degree of validation about the potential value for improved KM for the target user group. Participants’ responses to the questions for this task were compared with the expected (as provided by a

Information Visualization Vol. 8, 3, 197 – 211

207

Dadzie et al

domain expert), and show a good understanding of the current status of the investigation. The responses were also largely consistent across all participants; however, individual differences were noted in the more open-ended questions, for example, to the question: From your experience, is the root-cause tree exhaustive, that is, are all possible causes listed? If not, which ones should be added and where? Why? There are responses on opposite ends of the spectrum, with one being: I cannot at present determine this in the time available to look at the evidence buried within the software. and another: I can determine fairly readily from the assigned ratings what is the current most likely root cause. Overall, participants found it easy to complete the task. The general impression was that the material provided was neither difficult nor easy to assess, from which we are able to conclude that the task was well balanced. Although the participants reported that the time given to complete the task (20 min) was not enough to obtain a good enough understanding of the data in order to make a confident judgement about the potential root cause of the (fictitious) issue, responses to more probing questions indicated a good understanding of the data provided and the issue as a whole, confirmed also by the results for the second task. Recording intuition and incorporating new knowledge (Task 2) The second task aimed to assess the support X-Media provides for adding and/or aggregating new knowledge to an existing body of knowledge through interactive editing of the Causes Tree. As for Task 1, the interaction logs were analysed to obtain information on the participants’ behaviour while carrying out this task. The causes ontology provides a structured knowledge base from which to draw the Root Causes Tree; however, recognising that the ontology is neither absolute nor complete, the option is provided to create new nodes that fall outside the concepts defined, or to build sub-trees from the ontology that grow from a parent other than that defined in the ontology. This provides both the structure important for effective KM, as found by Perer and Shneiderman17 and the flexibility necessary to support the recording of new insight that the (visual) analysis helps to reveal. A fair balance between structure and flexibility that provides good support for analysis appears to have been obtained, as participants gained a good understanding of the structure of the information space quickly, enabling them to extend and continue to populate the base tree provided (see Figure 6). At least one new root cause node was defined by each set of participants, and at least one pre-existing node

208

© 2009 Palgrave Macmillan 1473-8716

was edited (new evidence attached, confidence values for previously attached evidence edited, and/or root cause nodes discounted as contributing to the issue under investigation). One pair created a new action list and attached this to a node in the tree, in addition to the new pieces of evidence provided for the evaluation that had to be imported into the system. Two sets of participants attached (free-standing) comments to nodes in the graph, to provide extra information about actions taken. Grouping by expertise, the only pair that created new nodes based only on suggestions from the ontology were designers, the group least likely to be familiar with the task. Confidence values set varied through the range of possible values (0–100 per cent), and half of all participants also included comments (free text annotation) to the pieces of evidence as they were attached to the tree. For the most part participants who accepted the default confidence value (50 per cent – ‘Not sure ’) indicated the need to investigate the evidence further before re-setting the value. In several cases the participants went back to edit the confidence values for different nodes in the tree. Overall, the service engineers appeared to explore the largest number of nodes across the tree, creating new nodes, attaching new evidence and editing confidence values. Interestingly, where participants created new cause nodes (not suggested by the ontology) they largely appeared to describe similar concepts, with only slight differences in terminology. Relatively significant differences, however, occurred in where on the Causes Tree a node that appeared to describe the same or a very similar concept was placed. Although the consistency in terminology used for node labels implies a degree of similarity in the understanding of the structure of the knowledge space, the variation in placement of what appeared to describe similar concepts may be because of the commonly occurring problem of interannotator disagreement. To limit the occurrence of this phenomenon stricter adherence to the structure provided by the ontology may be required. This could, however, result in a lower degree of flexibility, which is important to encourage the insight revealed during exploratory analysis. An optimal solution may be to restrict editing of the tree depending on the focus node. A simple example is the concept corrosion, which can only be caused a restricted set of initiators, and which in turn can only result in a restricted set of effects; it would be reasonable to restrict extending this node to only those suggestions provided by the ontology. However, for a more openended concept such as Foreign Object relaxing restrictions for extending the node would be the better option. Additional discussions with end users is necessary to determine how best to further develop the visual analysis tool and the evolution of the backing ontology. The graphical features used to communicate the semantics of the Root Causes Tree appear to have been well understood. Visual cues identified by participants include: ‘no sub-branches’ [for a node], [node] ‘coloured in

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

Figure 8:

Overall measures of satisfaction.

Figure 9:

Participants' assessment of system learnability.

grey’, ‘no support data’ [for a node], features that participants found aided their understanding of the issue and the organisation of the evidence from which knowledge was extracted. The (compound) confidence value for each cause node is dynamically calculated using a formula provided by a domain expert, taking into account the weights (userspecified confidence) of the data elements attached to each node. This was correctly perceived by the participants as a critical feature; however, the actual confidence values displayed were widely criticised as difficult to interpret and potentially misleading. Participants found that the values displayed for the compound evidence sometimes conflicted with their personal judgement on which causes should be investigated. Overall, the Causes Tree was seen to have high potential value for the IR process, with two participants even asking if it was available for immediate use in the field. In contrast, only one participant appeared to appreciate the benefits provided by the Knowledge Graph, indicating it as their favourite aspect of the system, saying: [it] gives a useful overview of the extent of the investigation and interconnection which states precisely the intended purpose of the semantic network. Apart from this exception, the Knowledge Graph was the most criticised tool, with seven

© 2009 Palgrave Macmillan 1473-8716

participants describing it as confusing, having ‘too many links’ and being ‘complex’. Participants noted greater difficulty recognising the underlying data structure, generally preferring the clearer structure provided by the tree graph. This indicates that the functionality for centering the focus on selected node and reducing the number of connections shown from the node was not discovered and made use of. It should be noted that while the Knowledge Graph was mentioned during the introductory presentation the features for exploring the graph were not described in detail, as the focus was on the use of the Root Causes Tree. The comment ‘nothing more than tree’ points to the need to communicate more clearly to users the advantages the semantic network provides, that is, its ability to display inter-connections between the knowledge structure provided by the ontology and the data elements attached to concepts defined. The potential also exists to use the Knowledge Graph to browse the contents of the user’s entire knowledge space or the larger knowledge base, revealing additional, relevant knowledge users may not have previously discovered.

Measures of satisfaction An analysis of the user satisfaction questionnaire shows that 84 per cent found the system stimulating, 52 per cent

Information Visualization Vol. 8, 3, 197 – 211

209

Dadzie et al

Figure 10:

Participants' assessment of task flow and control over system.

thought the system was easy to use, and 83 per cent found it easy to learn to use. Half of the users found the system easy to explore, and just over half reported carrying out the tasks as straightforward. Figures 8 and 9 summarise participants’ responses. Questions addressing the task flow assessed the design that split the system into collections of tools organised into four main areas. The design appears to successfully convey a separation of the tasks (see Figure 10): three quarters of the participants found it easy to very easy to determine where to start from, and just over half found it easy to determine what to do next. The placement and location of data elements and comments on the Causes Tree was judged to be easy to very easy by three quarters of the participants. Half of the participants recorded positive judgements about the use of the Semantic Clipboard. Assessing participants’ feelings of control over the use of the system was a very important part of determining the success of the design and the potential for improved KM. Just over half of the participants felt it was easy to make contributions to the information space that demonstrated their understanding of the knowledge it contained. However, half of the participants stated they did not feel they had a good understanding of or control over the setting of confidence levels. 17 per cent of the participants did not commit to an answer to this question, and a quarter of the participants were moderately positive about control over setting confidence levels.

210

© 2009 Palgrave Macmillan 1473-8716

Comments on the most negative aspect of the system included the confidence weighting, stating: could be misleading adding weights to the possible cause; confidence ratings risk misleading conclusions; confidence algorithm needs correcting. Overall, most participants found the system as a whole to have a lot of potential for improved KM for both their every day tasks and also for IR specifically. Providing context for information by explicitly attaching evidence and commentsto nodes in the Root Causes Tree was judged to be the most positive feature of the system. The Knowledge Graph and the confidence weighting system were, however, noted as aspects that would require further research and development in order to bring them closer to users’ requirements for KM and sense-making.

Conclusions This paper has looked at the use of an enhanced UCD process for the design, development and evaluation of a visual analysis system to support KM in complex domains as part of the X-Media project. We describe the discussions and interaction with end users and between the research partners that led to the definition of requirements for KM and effective sense-making. The result is novel,

Information Visualization Vol. 8, 3, 197 – 211

Linking data with knowledge

cutting-edge technology that supports effective, intuitive sense- and decision-making in large organisations. We report our research into alternative methods for representing knowledge visually, that collectively create a visual, semantic information space for structured, effective and timely knowledge generation and management: 1. employing semantic web technologies; 2. by allowing user control over interactive visualisations that provide support for exploratory and directed analysis of large, complex, cross-media corpora mapped to a backing domain ontology; 3. using a working prototype that integrates different technologies for KM, designed following a UCD methodology. The integrated framework developed has been evaluated with users, returning constructive and very positive results, providing the grounding for a strong move towards adoption in the target end user communities. The next stage of the X-Media project will revisit the requirements specification and design, learning also from the results of the user evaluation, to guide further research into visual, user-centred KM.

Acknowledgements The research described in this paper is funded by the X-Media project, sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number ISTFP6-026978. Thanks to the other members of the X-Media team who contributed design ideas and technology that was integrated into the working prototype. Our appreciation goes especially to the project team at Sheffield University who went through several iterations of multi-user testing of the prototype before the formal evaluation, and to Andy Harrison and Tarrence Kennedy whose contributions were invaluable, as well as the other users at Rolls-Royce involved in the various discussions and who took part in the evaluations of the working prototype.

References 1 Gilbert, E.W. (1958) Pioneer maps of health and disease in England. The Geographical Journal, 124(2), 172–183. 2 Tufte, E.R. (1997) Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics Press. 3 Ackoff, R.L. (1989) From data to wisdom. Journal of Applied Systems Analysis 16: 3–9. 4 Alavi, M. and Leidner, D. (2001) Knowledge management and knowledge management systems: Conceptual foundations and research issues. MIS Quarterly 25(1): 107–136. 5 Nonaka, I., Umemoto, K. and Senoo, D. (1996) From information processing to knowledge creation: A paradigm SM in business management. Technology In Society 18(2): 203–218. 6 Klein, G., Moon, B. and Hoffman, R. (2006a) Making sense of sensemaking 1: Alternative perspectives. IEEE Intelligent Systems 21(4): 70–73.

© 2009 Palgrave Macmillan 1473-8716

7 Klein, G., Moon, B. and Hoffman, R. (2006b). Making sense of sensemaking 2: A macrocognitive model. IEEE Intelligent Systems 21(5): 88–92. 8 The X-Media Project. (2006) http://www.x-media-project.org. 9 Card, S., Mackinlay, J. and Shneiderman, B. (1999) Readings in Information Visualization: Using Vision To Think. San Francisco, CA: Morgan Kaufmann Publishers Inc. 10 Fry, B. (2007) Visualizing Data. 1st edn., Sebastopol, CA: O’Reilly. 11 Stasko, J., Görg, C. and Liu, Z. (2008) Jigsaw: Supporting investigative analysis through interactive visualization. Information Visualization 7(2): 118–132. 12 Keim, D., Andrienko, G., Fekete, J.D., Görg, C., Kohlhammer, J. and Melançon, G. (2008) Visual analytics: Definition, process, and challenges. In: Kerren, A., Stasko, J.T., Fekete, J.D. and North, C. (eds.), Information Visualization: Human-Centered Issues in Visual Representation, Interaction Berlin/Heidelberg: Springer, pp. 154–175. 13 Thomas, J. and Cook, K. (eds.) (2005) Illuminating the Path: The Research and Development Agenda for Visual Analytics. New York, NY: IEEE CS Press. 14 Carpendale, S. (2008). Evaluating information visualizations. In: Kerren, A., Stasko, J.T., Fekete, J.D., North, C. (eds.), Information Visualization: Human-Centered Issues in Visual Representation, Interaction. Berlin, Heidelberg: Springer, pp. 19–45. 15 Shrinivasan Y.B. and van Wijk J.J. (2008) Supporting the analytical reasoning process in information visualization. In: M. Burnett, M.F. Costabile, T. Catarci, B. de Ruyter, D. Tan, M. Czerwinski and A. Lund (eds.) Proceeding of the Twenty-sixth Annual SIGCHI Conference on Human Factors in Computing Systems, New York, NY: ACM, pp. 1237–1246. 16 Lee, B., Parr, C.S., Plaisant, C., Bederson, B.B., Veksler, V.D., Gray, W.D. and Kotfila, C. (2006) TreePlus: Interactive exploration of networks with enhanced tree layouts. IEEE Transactions on Visualization and Computer Graphics Special Issue on Visual Analytics 12(6): 1414–1426. 17 Perer A. and Shneiderman B. (2008) Systematic yet flexible discovery: Guiding domain experts through exploratory data analysis. In: J. Bradshaw, H. Lieberman and S. Staab (eds.) Proceedings of the 13th international Conference on Intelligent User Interfaces, New York, NY: ACM, pp. 109–118. 18 Sharp, H., Rogers, Y. and Preece, J. (2007) Interaction Design: Beyond Human--Computer Interaction. New York: John Wiley & Sons. 19 Harper R., Rodden T., Rogers Y. and Sellen A. (eds.) (2008). Extending the research and design cycle. Being Human: HumanComputer Interaction in the year 2020. Cambridge, England: Microsoft Research, pp. 58–63. 20 Carroll, J. (1997). Scenario-based design. In: Helander, M., Landauer, T. and Prabhu, P. (eds.), Handbook of Human-Computer Interaction Elsevier, pp. 384–406. 21 Herman, I., Melançon, G. and Marshall, M. (2000) Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6(1): 24–43. 22 Fekete, J.D., van Wijk, J.J., Stasko, J.T. and North, C. (2008) The value of information visualization. In: Kerren, A., Stasko, J.T., Fekete, J.D. and North, C. (eds.), Information Visualization: Human-Centered Issues in Visual Representation, Interaction. Berlin/Heidelberg: Springer, pp. 1–18. 23 Plaisant C. (2004) The challenge of information visualization evaluation. In: M.F. Costabile (ed.) Proceedings of the Working Conference on Advanced Visual Interfaces. New York, NY: ACM, pp. 109–116. 24 Heer J., Card S.K. and Landay J.A. (2005) Prefuse: A toolkit for interactive information visualization. In: W. Kellog and S. Zhai (eds.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY: ACM, pp. 421–430. 25 Hornbæk, K. (2006) Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies, 64: 79–102. 26 Isenberg P., Zuk T., Collins C. and Carpendale S. (2008) Grounded evaluation of information visualizations. In: E. Bertini, A. Perer, C. Plaisant and G. Santucci (eds.) Proceedings of the 2008 Conference on BEyond Time and Errors, pp. 1–8.

Information Visualization Vol. 8, 3, 197 – 211

211

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.