Understanding engineering email: the development of a ... - CiteSeerX

Res Eng Design (2010) 21:43–64 DOI 10.1007/s00163-009-0075-4

ORIGINAL PAPER

Understanding engineering email: the development of a taxonomy for identifying and classifying engineering work James Wasiak Æ Ben Hicks Æ Linda Newnes Æ Andy Dong Æ Laurie Burrow

Received: 1 April 2009 / Revised: 24 July 2009 / Accepted: 27 July 2009 / Published online: 21 August 2009 Ó Springer-Verlag London Limited 2009

Abstract It is widely believed that email is increasingly becoming the medium where in collaborative engineering work is done; yet, this assumption has not been properly examined. Thus, the extent of engineering information contained in emails and their potential importance within the context of knowledge management is unknown. To address this question, a study was undertaken with a large aerospace propulsion company to investigate the role and characteristics of email communication in engineering design projects. This paper describes the development of a taxonomy and classification method for achieving an understanding of email content and hence its use. The proposed approach is based on relevant techniques for analyzing communication and design text. The method codes the content of e-mail based on a hierarchical scheme by assigning email to categories and sub-categories that denote what topics the email is about, for which communicative purpose it has been sent, and whether it shows evidence of engineering work. The method is applied to a corpus related to the full life cycle of an engineering design project. Metrics for validation are discussed and applied to a sample case. Exemplar findings are presented to illustrate the type of investigations the method supports—including J. Wasiak B. Hicks (&) L. Newnes Department of Mechanical Engineering, Innovative Design and Manufacturing Research Centre, University of Bath, Bath, UK e-mail: [email protected] A. Dong Design Lab, Faculty of Architecture, Design and Planning, University of Sydney, Sydney, Australia e-mail: [email protected] L. Burrow Converteam Ltd, Rugby, UK

eliciting knowledge about project performance and identifying and accessing engineering knowledge. Finally, lessons from the development of the method, including a discussion of iteratively adaptive variants used to arrive at the final outcome, are discussed. Keywords Knowledge management Information handling behavior Content analysis Email Engineering work

1 Introduction Over the past decade, email has progressed from being a personal information communication technology to one that is centrally managed, archived and critical to daily business operation. A number of trends have impacted on knowledge-based firms and particularly engineering corporations, causing communication and information technologies to become an indispensable asset. Engineering companies often operate as part of a ‘‘virtual organization’’ under the paradigm of concurrent engineering with globally distributed partners, suppliers and customers. Of particular relevance to the context of this research project, a study of 1,400 UK companies showed that 75% of aerospace corporations practiced concurrent engineering (Ainscough and Yazdani 2000). As a result, these types of firms are aligning their communication with the management needs of a virtual culture (Burn and Barnett 1999; Lindquist et al. 2008). Investments in commercial tools to manage engineering data using product lifecycle management (PLM) systems are forecast to exceed US$30 billion by 2011 (2008). One key aspect of collaboration and information sharing in engineering design is keeping the geographically and temporally dispersed stakeholders in

123

44

the project aware of the information about a common designed object. Information about the same designed object must be shared and maintained between a numbers of parties, who each needs different views on the same information (Bouikni et al. 2008; Maher et al. 2007). There is also a trend for aerospace companies to provide traceable information support for their products through life, adding a regulatory requirement for management of product information over long time periods (e.g., (Marsden 2002) and the Radio Technical Commission for Aeronautics (RTCA) recommendation DO-178B for Software Considerations in Airborne Systems and Equipment Certification). For these reasons, among others, the need to communicate and manage large volumes of information in a timely and expeditious manner has never been more important. This paper focuses on email as the medium of information communication and management. Despite engineering firms’ investments in PLM systems, email remains one of the most widely used (in terms of distinct information units) asynchronous information sharing tools, with estimates that up to 10 trillion personal emails are sent annually (Gantz et al. 2007) and that half of employees spend 30% of their time working with emails (AIMM International 2006). Email has become central to working processes in enterprises; large archives of emails are kept in order that information can be usefully retrieved and reused (Fisher et al. 2006). There are huge difficulties, however, in searching through such an ‘‘overload of information’’ and identifying what is and is not of relevance (Eppler and Mengis 2004). Email is, unfortunately, one of the more idiosyncratically used information tools, constraining the potential space of management features in email programs (Wattenberg et al. 2005). Whittaker and Sidner (1996) coined the term ‘‘email overload’’ to describe email as being functionally overloaded, used for everything from transferring files and organizing meetings to complex discussions, personal data archiving and contact management. Messages then require an array of responses including replying, taking action or simply filing. More recent research confirms that email is still used to handle a range of corporate processes, relating to customers, contracts and strategy (AIMM International 2003), and, for many workers, fills the role of a task management tool (Bellotti et al. 2003). What has changed dramatically is that the volume of email in archives has grown substantially; one study of corporate users at Microsoft Corporation found a tenfold increase in 10 years (Fisher et al. 2006). In engineering, established information and knowledge management strategies, including product data management (PDM) and PLM systems, have evolved to support the management of formal engineering information and data—including documentation, CAD models, meeting

123

Res Eng Design (2010) 21:43–64

minutes, formal correspondence and computational analysis. However, they currently fail to provide effective means for managing personal information, such as logbooks (McAlpine et al. 2006 and email, despite their core function in daily operations and their potential knowledge reuse opportunities. Within the context of this work, management includes the activities of capturing, representing, organizing, maintaining, visualizing, reusing, manipulating, sharing, communicating and disposing of information (Larson 2005; Treasury Board of Canada 2005). Of particular concern, from an organizational perspective, is that such project centric information is not generally handled by other core business information systems, such as Enterprise Resource Planning (ERP), Supply Chain Management (SCM) and Customer Relationship Management (CRM). As a consequence, there exists almost no means for the management and reuse of this potentially important personal information. We acknowledge, however, that formal correspondence with customers will be captured through the CRM system. In addition, there is a legacy issue of historical emails stored in archives, the content of which is largely unknown except at a project or personal level (Hicks et al. 2008). These issues have been recognized by the industrial partner for this research project. The industrial partner designs, manufactures and supports electromechanical power systems across the globe and has a particularly strong presence in the navy and marine sectors. They highlight a number of key areas for improved knowledge management in which email is seen to play a part: (1) in supporting the way that employees network, share and acquire information and expertise; (2) in improving the management of records (information sources) such as determining what to keep, how to best organize it, and when to reuse it; and (3) in mining legacy archives for ‘knowledge’. Pragmatically, they would like to manage email as they manage other product data with their existing PDM and PLM systems. That is, their aim is to make collective knowledge accessible, minimizing duplication of work, repetition of mistakes and expediting design activity. However, while their engineers claim that email is central to their work, it is not known to what extent they actually do their work through email and what type of work is done. There are available technological solutions that could improve the efficacy of email in engineering organizations. A significant body of research has centred on developing tools to aid with the management and processing of email. Management of email includes task management (Bellotti et al. 2005) and automated filing (Koprinska et al. 2007). Processing support includes automated email summarization (Dalli et al. 2004) and hierarchical organization using automatically generated ontologies (Yang and Callan 2008). More novel contributions have included using


mappings of communication networks to identify expertise (Kim 2002) and visualizations of conversation histories (Viegas et al. 2006). While these may be of benefit to engineering firms, it is first necessary to gain a substantiated understanding of how email is used in engineering design projects, and whether the content of the email is of any value to mine, particularly since engineering information is generally organized according to internal standards (Hicks et al. 2008) and email could also be organized similarly as part of efforts to codify information for knowledge management (McMahon et al. 2004). Despite the development of email management tools, research into how people use email for collaboration is still somewhat limited. One study of distributed collaboration showed email being used predominantly for group management (Graveline et al. 2000). Information–visualization-based studies identified organizational structures to find ‘‘key players’’ (Puade and Wyeld 2007). While these studies revealed important insights, there is an underlying presumption that the email is the work or that the volume of information exchange is indicative of the work being done. At present, there is little or no research to support such a presumption and in particular, whether or not email is becoming a medium in which work is done. Methodologically, studies of email user behavior have relied on user interviews (O’Kane and Hargie 2007), surveys (Wilson 2002; Renaud et al. 2006) and direct observation (Bellotti et al. 2003). These methods generally disregard the textual content of emails, which, we contend, offers an opportunity to better understand emails and their interaction with engineering (or other) task-related activities. For example, if an email contains a mathematical calculation of the dynamic stress in a component, then the email is itself a work product rather than an accompaniment to work and certainly more than an information exchange. The focus of this paper is therefore on the development of an approach to analyze the textual content of emails, and, in particular, to ascertain how email is used as a medium for work within engineering design projects. This paper first reviews and considers the applicability of existing methods that could form the basis for understanding the content of engineering email. Informed by this review, an approach is proposed to code email content, the process and principles of which are discussed in this paper. The development and evaluation of the approach and its underlying taxonomy is based on an application of the coding scheme to 300 design project emails. The contentcoded emails are subjected to a two-level analysis, one at the level of the codes themselves, and a secondary analysis with respect to the project context of the email including date, sender and project background. Prototypical results are presented to highlight the understanding that can be elicited and in particular that which could benefit

45

engineering project management and strategies for knowledge management.

2 Research in the use of email Email exchanges present an opportunity to explore both processes of communication and how the information is used in collaborative work. A significant body of empirical design research has explored many facets of such topics, but none specifically in the context of email. The following sections discuss existing methods for exploring email use. From the design studies community, the application of content analysis-based techniques including textual analysis of transcripts and documents, diary studies and interviews is presented to consider how the content of design text is currently being studied. The applicability of all these methods to the investigation of email use is explored, and the findings used to inform the development of a methodology for analyzing email content. Within the context of software development and email management tools, a number of aspects of email use have been explored. The motivation for these studies was driven by the need to understand why people use email and how they use email to accomplish tasks associated with their work. These studies commonly employ qualitative research techniques such as user interviews, surveys, and direct observation. Surveys have been used to explore the effectiveness of email compared with alternative means of communication, such as phone, face-to-face, and letter (Wilson 2002), and to gauge the effectiveness of employee time spent working with email (Jackson et al. 2006). Interviews have been used to explore similar topics (O’Kane and Hargie 2007) and to investigate the ways in which users process their email inboxes (Bellotti et al. 2003). While both offer a direct insight into explanations on how and why users utilize email in the ways that they do, interviews can potentially be more open ended, capturing more detail than surveys. Interviews tend, however, to be more time consuming and logistically complex for all parties. Surveys and interviews are well suited to gather users’ opinions, but can only provide user-reported perceptions of their behaviors and activities. These may be emotionally influenced or misremembered. To substantiate the findings from surveys and interviews, direct observations of users can record statistical information, such as what proportion of time is dedicated to certain activities previously reported by users through interviews. To explore the efficiency of email use, Renaud et al. (2006) made simple observations, such as measuring the amount of time spent on email sessions, to supplement their interviews. More detailed observations have included

123

46

video recordings of 60-min sessions with users working through their inboxes, while verbalizing their actions to understand personal email processing behaviors (Bellotti et al. 2003). The primary disadvantage of observing users is that it can be time consuming, particularly when a study over a prolonged period is required. The sample period over which observations are made must be appropriately sufficient in order to be representative of the working day of the user. Where the research interests are motivated by the design of email management systems, these have relied heavily on statistical measures of email messages, such as the number of messages in inboxes, their size, date, and filing status (Whittaker and Sidner 1996; Mackay 1988). More recent studies have tended to use several data-collection methods. Prior to designing a task management tool, Bellotti et al. (2003) investigated how users process their emails. Mailbox statistics on characteristics such as threading and addressing were examined to support observations and interviews. A further application of statistical data is the generation of network maps, identifying communication paths between different users. Communication patterns between different parties might offer insight into where knowledge is generated. Tyler et al. (2005), among others, present a technique for generating these. Network maps can be used to identify and investigate communities of practice in specific organizations, both of which are precursors to guiding social practices of knowledge management. A summary of the application of the various techniques to several research areas is briefly presented in Fig. 1. A common thread among the research reviewed (Fig. 1) is that they explore email as part of work and particularly the day-to-day aspects of work. That is, these studies focus on how users perform information handling and management tasks through email rather than how email itself is a form of work. O’Kane et al. (2007) present one of the few such studies to examine at the role of email within the context of a specific working environment. The areas of social interaction, knowledge and information management, contact management, content formation (mitigating ambiguity), and user effectiveness were identified. Methodologically, O’Kane’s approach is restricted to interviews of users, although these are comprehensively analyzed, rather than the content of the email itself. The investigation in this study aims to consider email from a project perspective with reference to the perspective that the email is the work, that is, that email embodies work, rather than that email is the accompaniment to the work. In order to understand how email embodies work, it is contended that the textual content of emails should be examined. To date, the only research regarding the body of emails (main

123


textual content) has been directed towards automated classification tools (e.g., Koprinska et al. 2007). In part, this is due to the limited availability of email corpora; logistical and legal access issues, along with commercial sensitivity pose obvious hurdles. For the purpose of this work, email attachments are not examined as it is assumed that the content of the email will state the nature of the attachment as well as providing an indication of why it has been sent. The rationale for this is based largely on the authors’ own experience and preliminary evaluation of email corpora. Notwithstanding this, investigating the relationship between email and attachments could provide important insights for enhancing records and information management by capturing additional context and knowledge regarding, for example, design documentation. The content of the messages should enable an understanding of what emails are used for and, when interrelated with the timeline and details of the engineering design project, what drives the email ‘production’. This study takes the time frame of an engineering design project, which runs for a period of more than 6 months. By sampling content across email archives for a complete engineering design project, the study of a typical engineering project of several years’ duration and distributed across a number of sites becomes practicable. These messages can also provide a wider and more in-depth picture of email use than can be gained from any individual user. The perspective that the content of engineering email embodies work is based on prior design research in understanding design activities and the information handling behavior of engineers. 2.1 Understanding design through language Empirical studies exploring the process of design have been a major thread of research for some time (Cross et al. 1992; Gero and Mc Neill 1998; Stempfle and BadkeSchaub 2002). Methodologically, these studies rely on the examination of language, stemming from the realization that much of what happens in designing happens through language-based communication. A method that has been commonly utilized for this purpose is protocol analysis. Protocol analysis is based on the well-supported assertion that an individual’s verbal utterances provide evidence of their cognitive reasoning (Ericsson and Simon 1993). During the experimental study, the analyst elicits thinking processes from verbalizations made by the subject (designer) while performing a specified task. The verbalizations are recorded and transcribed. This transcript is then coded in accordance with a scheme, linking elements of text to key concepts, thereby converting text into data. The concepts identified by the coding process are then used to form the basis of secondary analysis. In Stempfle

Res Eng Design (2010) 21:43–64 Fig. 1 Existing research techniques for investigating different facets of email use

47

Area of investigation

Research method [This Research]

Interaction or role of email within workplace

Processing emails; prioritisation and task management

[8] [5] [6] [7]

Efficiency / productivity of email use

User interviews

[5] [6] [4]

Effectiveness of email compared with other communication tools

Email content classification / text analysis

[5]

Mailbox statistics User observation

[3] [2]

User surveys [1] [2] [9]

Inter-user networks from email traffic

[1] (Jackson et al., 2006)

[6] (Whittaker and Sidner, 1996)

[2] (Renaud et al., 2006)

[7] (Mackay, 1988)

[3] (Wilson, 2002)

[8] (O'Kane et al., 2007)

[4] (O'Kane and Hargie, 2007)

[9] (Tyler et al., 2005)

[5] (Bellotti et al., 2003)

and Badke-Schaub (2002), for example, the frequencies of design processes are identified as well as how they evolve within conversations and across the entire design scenario. Huet (2006) studied structured design review meetings from an engineering process in a real life setting by coding transcripts using a predefined scheme. This coding scheme categorized the type of communication (clarifying, debating), who was speaking, and which area of the design artifact it impacted. Further details on the topic of discussion were annotated. This method captured the different elements of design knowledge generated during review meetings and was able to show how the knowledge evolved. The point is that language, or to be more precise what designers speak or write about what they are doing, is seen as a reliable form of data and is a form of design knowledge representation frequently used by design researchers. This is a trend upheld in other subject areas where the analysis of language use is increasingly the means of primary data, rather than surveys, questionnaires, interviews or staged experiments.

The particular point we would like to draw out is that empirical design research has always dealt with the things designers say or write as data for analysis. The linguistic residue is treated as a window onto other phenomena, i.e., concept generation, decision-making, analysis or evaluation. There is a tacit presumption that the language embodies the work being done. The problem has been scaling up methods in protocol analysis to long-term design projects. It should be noted that design communication has been studied in real and large-scale settings. Rather than analyzing the content of conversations, Eckert (2001) reports using interviews and observations to achieve this. This is a somewhat more practical approach than recording, transcribing, sampling and analyzing conversation content in such settings. However, as we become more interested in the ‘natural’ activities of designers in context, contentbased methods that can make use of engineering content such as email are needed to look for explanations of design activities based on the kinds of data that are routinely generated as part of designing.

123

48

Some characteristics of this project place limitations on the direct transferability of protocol analysis to email corpus investigation. The settings for such studies (e.g., Stempfle and Badke-Schaub 2002) have tended to be contrived design-orientated situations, lasting a few hours and involving half a dozen participants, usually engineering students. In contrast, industrial engineering projects often last for a number of months or years and involve the participation of many players including managers, administrators, technicians, external suppliers or contractors and customers. It is asserted that to properly understand how email is used in design projects, analysis of real projects should be undertaken. Although parallels can be drawn between contrived and real settings, the analysis of an engineering corpus from industry represents a significant change of scale. One implication is the difficulty in coding large volumes of text, which is in any case a highly laborintensive process, particularly if the elements coded are small, such as clauses rather than paragraphs. 2.2 Information handling Several studies on the information handling behavior of engineering designers confirm that the information they access and process embody their design work. In these studies, the interest in the use of information artifacts to support the design process (McMahon et al. 2004; Aurisicchio 2005) is unrelated to uncovering what engineers do by having them do engineering. Nonetheless, the premise of the research is that the information processed and accessed represents external knowledge, and is thus another form of representation of the designer’s knowledge that the engineer does not write or speak of. Methods to understand engineering information access have included the use of content analysis (Lowe 2002) as well as interviews, diary studies and observations (Aurisicchio 2005). In the architecture domain, the various information artifacts produced are studied as coordinative artifacts, which are design practices grounded in the use of material artifacts (Schmidt and Wagner 2004). What these studies have found is that much of the knowledge is not about the product per se. Lowe (2002) studied how engineers used information by classifying phrases within information they accessed as being technical terminology, contextual and product, resource or business related. The phrases collected within each of these groups were then assessed to explore what typified them. This approach was applied to engineering documents, such as reports, drawings, letters and emails. The emails examined were shown to contain significant contextual information and company-specific acronyms. They tended to be written in an informal style. Lowe’s method focused on distinguishing between the attributes of different

123


information sources. A disadvantage of Lowe’s method is that it requires coding on a very detailed level, which is consequently time consuming and impractical for analyzing a large corpus. Although not directly transferable, Lowe’s method does support the notion of using content to examine emails and their use. In order to learn how designers acquire and reason about the information they access, Aurisicchio (2005) asked engineers to compile a diary detailing each time they searched or requested information. In a similar way to Lowe (2002), Aurisicchio applied a coding scheme to surmise key characteristics of information seeking behavior, in this case from diaries and observations, rather than a document set. To obtain more detailed findings, Aurisicchio shadowed engineers for time periods of several hours each to observe their information use habits. Interviews were also conducted by Aurisicchio to gain a more complete understanding of how engineers operated, to contextualize and aid understanding of the collected data. The point is that to understand why work becomes embodied into information, it is necessary to appreciate the context within which work gets done. Without an understanding of context, it is very easy to misinterpret what designers are actually doing, or to misinterpret why certain data is or is not contained in their information products. In summary, language and the meaning of words encode designers’ knowledge and perspectives. What they speak and write about their design work and the information they access while working can embody their engineering design work. While the assumption that designers’ verbalizations, conversations, and external information products are reliable data sources of their work, email has only been presumed to embody engineering work. Yet, extant methods to understand the content of language in engineering work need to be modified to handle the informality and scale of emails. 2.3 Implications It is asserted that investigation of email use in engineering projects cannot be readily achieved in contrived settings. The variety of participants, locations, design complexity and timescales encountered in real projects cannot be easily represented by a laboratory scale experiment. In summary, there are two broad methodological approaches; studies that have tended to focus on the needs and views of the user of email relied on methods such as interview and survey. To investigate both communication and information in engineering design, a number of authors have analyzed text using coding schemes, and approaches that identify concepts relating both to subject matter and processes (activities) have been adopted. Supplementary data, gathered from sources such as interviews and


observations, has been used for further analysis and is similarly appropriate for this study. This second approach is the one adopted in this research project, because a more holistic, project-orientated view is required, demanding a broader data collection method. Textual classification of email content, which has not been adopted in research to date, is contended to be appropriate. The record of the design process created by emails provides an opportunity to explore them in this way.

3 Research method The previous sections reviewed existing methods and highlight those methodological elements that are most appropriate for this study—and particularly those which have been successfully applied in engineering design studies of similar context (e.g., Lowe 2002). In addition to these largely manual approaches, it would be remiss to neglect the potential of computational text analysis. In particular, there are machine-learning-based approaches to code emails by purpose using speech–act theory (Leuski 2004) or activity (Dredze et al. 2006) using semantic message similarity and the sender/recipient information. However, the limited size of the target corpus, the need to markup the corpus (in any case) for supervised classification, and the exploratory nature of this research project necessitate a manual approach. A methodical approach for analyzing text would provide an effective way of identifying consistencies, themes and patterns within collections of email messages (Patton 2002). In comparison with the methods previously discussed, an analysis of the content of emails has the major advantage of being unobtrusive. By examining emails retrospectively, it is possible to gain a direct insight into email usage without risk of influencing user behavior. It is also logistically simpler, enabling the study of large, complex, long-term engineering projects where real-time observation would not have been practical due to resource limitations. The proposed methodology, shown in Fig. 2, involves two key components: the collection of primary data via the coding of emails and the collection of secondary data from other sources, the latter of which is key to the analysis, corroboration and explanation of the former.

49

of this research and is discussed in Sects. 4 and 5. Once the method was validated, a regular sample of emails from one project was coded and analyzed. The codes are reported by frequency to reveal how commonly email is used for certain functions. The codes are then aggregated by: 1. 2. 3.

Communicative processes to reveal how commonly email is used for engineering work. Time and project lifecycle to reveal how the use of email changes throughout the project. Actor to reveal how different individuals use email and if there is any commonality when actors share similar roles (e.g., project manager).

In all cases, mailbox statistics are used to normalize the numerical values. It is important to recognize the limitations of classifying the textual content of email. Although it allows a level of inference, such as why an email was sent, it does not provide the full context. Recall also that interactions outside of the email correspondence were not included, which for collocated team members is likely to be significant. Furthermore, the time spent with each email, effort committed, and feelings of the user cannot be easily captured. Hence, for the purpose of this exploratory study, the coding of the content made use of additional data collection through, for example, semi-structured interviews. This provided project background, timescales and key factors to corroborate the findings of the textual analysis and the proposed method. 3.2 Secondary data collection The purpose of the secondary data is to contextualize the primary data, enabling a reasonable level of interpretation of the findings. General perceptions of email use can also be ascertained. This approach to support findings is similar to the one adopted by Aurisicchio (2005). Required details for the secondary data include establishing the parties involved, the roles of the actors, the objectives of the project and common acronyms. Three sources of secondary data are adopted: interviews with project participants, project documentation and the understanding of email use as presented in existing literature.

3.1 Primary data collection

3.2.1 Semi-structured interviews

The development of the coding scheme required primary data. For this reason, a set of emails were sampled and coded from a number of engineering design project corpora gathered from industry. The development proceeded by coding a sample of these emails across the duration of an industry engineering design project, which forms the core

The method used semi-structured interviews (Wengraf 2001) to ensure that all necessary topics are covered and provide sufficient scope to delve into greater depth into areas most relevant to the interviewee. Interviews were conducted with members from the project team from a variety of roles to gain as broad a perspective as possible.

123

50


Fig. 2 Research method for study of email use in engineering projects

Primary Data Collection via Coding Coding Scheme Development Apply the coding scheme to emails. (Detailed in Sections 4 - 5) These should be from ‘real’ industrial engineering design projects. The sampling should be significant and cover the duration of the project. Secondary Data Collection Semi-Structured Interviews With project participants from a variety of roles

Review Project Documentation Including reports, planning charts and organizational diagrams

Analysis • • •

Frequency of coding Change of frequency of coding over project Change in frequency of coding of individuals

The questions explored the technical, financial and contractual management of the project, the role of the actors in these activities, and their attitudes and experiences relating to email use within the company.

verifies the suitability of the developed scheme, following which this is presented. The issues of scope, reliability and validity are considered throughout. 4.1 Initiation

3.2.2 Project documentation Project documentation was used to provide an overview of the project at different stages during its life. Reported events of a project were compared with the actual events as described in emails. This provided a perspective of whether useful information is contained in emails that might otherwise be lost. For this reason, both superseded and final versions of documents are of value. Of particular interest were project schedules that provide an overview of key phases in the project. Minutes and reports from stage review meetings provided further detail. In addition, organizational charts were used to determine the links between different actors and parties in the project and the various lines of reporting.

4 Method for coding scheme development The adopted methodology for developing the coding scheme follows four stages, depicted in Fig. 3. The initial stage grounds ideas and directs the scheme to answer the necessary questions. The development stage applies and tests these ideas before refining them. The evaluation stage

123

As recommended by Lincoln and Guba (1985), the scheme was developed with a predetermined guiding theory to ensure that it identified relevant characteristics so as to enable suitable analysis and findings. Three concept groupings were established which posed questions that the coding scheme should answer; these were based on the literature reviewed in Sect. 2. What topics does the email discuss? The variety of subjects discussed in emails is an important element in understanding their use. The topics of conversation have been annotated alongside an existing coding (Huet 2006). The domain of interest such as the product, resources or business has also been identified by other codings (Lowe 2002). Why is the email being sent? A number of the coding schemes have been highly focused on processes, sometimes disregarding the subject matter. This includes thought processes (Stempfle and Badke-Schaub 2002), reasons for information transactions (Aurisicchio 2005) and purposes for communicating (Huet 2006). It is asserted that there is always a rationale for sending an email, and, as such, this should be considered.


51

The three dimensions are developed in parallel 1. What topics the email discusses 2. Why the email was sent 3. How the content is expressed

Initiation

Initiation

Develop Three Tier Scheme

Ground in Existing Work Use existing coding schemes and terminology from engineering design literature to construct an initial scheme.

Refine Scheme Based on the evaluation, propose an extended / refined coding scheme.

Evaluate Scheme Proposed scheme assessed with respect to key metrics – inter-coder reliability, stability etc

Development

Development

Apply scheme Several researchers apply proposed scheme to a sample of emails supplied by the industrial partner.

4.1.2 Evaluation

Application

Evaluation

Expert Appraisal

Final Refinement

Proposed Final Scheme

Final Scheme

Final Scheme

The research team, academics and engineering project team meet to critique the scheme.

Evaluation

Three dimensions are integrated and the proposed scheme applied to a sample of 300 emails from a large engineering project.

Final refinements made to scheme

Terminology from existing coding schemes and taxonomies relating to each of the above three questions was identified in engineering design literature. Based on these, several candidate structures for coding schemes were proposed by the research team. The team consisted of postgraduate students and academics with varying levels of industrial experience. The team members were all UK/ USA nationals with English as their first language. The researchers applied the candidate schemes both independently and jointly to sample emails. These emails were derived from a variety of engineering projects to ensure general applicability of the scheme. Following their application, the proposed schemes were evaluated in relation to their scope, reliability and validity. Researchers compared their results and on this basis, proposed variations and refinements to the schemes. Amended schemes were further applied and evaluated until a consensus was reached that a single suitable scheme had been developed. Throughout this development process, terminology from existing literature was used to ensure academic rigor in the grounding of the work.

Fig. 3 Process of coding scheme development, evaluation and finalization

How is the content expressed? Having identified why information has been created, what that information is and, by virtue of an email’s meta-data, when and by whom it was sent, the remaining question posed begins how. A parallel is drawn here with Eckert (2001) who, in exploring designer behavior, identified that it is important to understand how information is generated and created. This how is already partly known, the answer being via email, but the way in which this is written could reveal more. For this reason, consideration is also given to how the content is expressed.

Once suitable schemes were developed for the elements of what, why and how, these were brought together to form an overall scheme. This scheme was then applied to emails from a single engineering project by three of the researchers. Exemplar results showed frequencies of terms and how their use varied with time. All three researchers marked up a number of these emails, allowing their intercoder reliability to be calculated. Patton (2002) suggested that three parties are best placed to judge the success of content analysis using a coding scheme. These are the developers, independent experts and someone with knowledge of the corpus. Hence the researchers, other academics and industry partners involved in the project attended a review meeting. A number of criteria relating to the scope, reliability and validity were then considered. Informed by the review, final minor adjustments were made to the scheme, arriving at a final coding scheme. 4.2 Ensuring coding quality The development of a coding scheme should be through a carefully considered approach. Three interrelated facets, scope, reliability and validity (influenced by Krippendorff 1980) are considered for the purpose of this work.

4.1.1 Development

4.2.1 Scope

Given these facets for the coding scheme, the next step consisted of treating each one of these facets individually.

Scope is used to describe the breadth and depth of the concepts that a coding scheme encompasses. A key

123

52


objective of coding (email) content is to explore the presence of recurring themes (Patton 2002), which is achieved by describing characteristics of the text though a collection of labeled concepts. Labels that are either too broad or too narrow will reveal little. Codes may be developed in hierarchies or subsets to minimize such risk. Assessing whether a coding scheme captures the necessary relevant concepts that are present in a text is problematic for a number of reasons. A paragraph may provide necessary contextual information for understanding a concept, but isolating which portion to assign to the appropriate label is not clear-cut. Furthermore, one portion of text might relate to more than one coded concept. It is therefore impossible to quantifying whether a document is fully coded or that a coding scheme is complete. To overcome the latter, Patton (2002) proposes that researchers, observers and dataset providers should judge the scheme for its credibility, logicality and perceived inclusiveness. Ultimately, the scope of a coding scheme should be sufficient for the purpose of the proposed analysis.

entire case. This requires sampling to be evenly distributed and sufficient in size.

4.2.2 Reliability

5.1 What: development

A reliable code will repeatedly produce the same results under the same conditions, regardless of how valid or useful these may be (Krippendorff 1980). Testing reliability requires some amount of duplication. A facet of this is inter-coder reliability, which measures agreement between two (or more) coders in their application of a coding scheme to a dataset (Kurasaki 2000). As well as demonstrating the general reliability of the scheme, it allows more than one person to bear the burden of coding text, without unduly influencing results. The differing backgrounds, skill sets, training and perspectives of coders will influence their coding decisions (Patton 2002). Researchers’ mark-up strategies may evolve throughout the process and it is recognized that the point in time at which an article is coded may influence outcomes (Lincoln and Guba 1985).

This aspect of the coding scheme was intended to address the issue: What topics do emails discuss? An initial structure was based predominantly on work by Huet (2006) and Lowe (2002). Their schemes were appealing since, although coming from two different and relevant perspectives, their terminology largely overlaps. Huet’s scheme was designed to capture dialogue, which emails constitute a form of (Kim 2002), whereas Lowe’s scheme has been applied to engineering documentation and correspondence beyond design. The first terms used similarly by both Huet and Lowe are product and process to describe the artifact and act of designing it, respectively. Notably, these terms are also used by other authors (e.g., Ahmed and Wallace 2003). Huet and Lowe also used the term resources, encompassing physical, human or financial aspects. Only one disparity existed, where Huet refers to external factors which influence the design, Lowe refers to business issues. It was felt that these are similar, and bear an overlap with one another. This collection of terms, brought together in Table 1, and applied independently and cooperatively to a sample of emails by three researchers before findings were discussed. It was found that the term product was clear to apply and occurred in a significant proportion of emails. As a point of ontology, it was noted that the product may be non-physical, such as an item of software. The term process was heavily used, and was felt to cover a broad range of concepts, which required differentiation between them. This arose because the design process was

4.2.3 Validity The validity of results is determined by their quality, their representation of true facts (Krippendorff 1980), and whether they meet their purpose. Results which are consistent with existing knowledge, or match the expectations of the study are both indicators of validity (Krippendorff 1980). It is also desirable for the results to yield a new depth to understand and to meet the purpose of the study (Patton 2002). Krippendorff also suggests that a coding scheme should be pragmatic, logical and straightforward to apply. For the findings of a coding study to be valid, it is also necessary for the sample to be representative of the

123

5 The coding scheme in detail The coding scheme describes three aspects of email; what topics emails discuss, why they are sent and how their content is expressed. This section describes, in detail, how we arrived at the criteria for coding emails along these three aspects, including relevant background literature, iteration stages, and the underlying reasoning for iterations in the coding scheme. To avoid repetition and ambiguity (arriving from changes during development), the definitions for terminology within the coding scheme are included only once with the final version of the scheme at the end of this paper. In presenting the detailed development of the coding scheme, we hope to make it clear how the researchers interpreted the content of email to assign them to the appropriate codes.


53

Table 1 Initial coding scheme for what topics were discussed Product

Process

Resources

External

Business

The artifact

The design process

Resources used in process

Influences on process

Business issues (possibly affecting process)

not explicitly mentioned to form a topic of discussion as such, while any verb or action might be interpreted as a process. It was proposed that, for the purpose of analyzing emails, it would be clearer to identify project-related activity. The external and business terms were shown to be similar, and it was observed they reflected influences by the project on the design, and from the company on the design. A small proportion of emails were found to discuss resources explicitly. It was viewed, however, that these would be better considered as a facet of the project or company rather than a group in their own right. Based on these findings, it was proposed that the terms process, resources, external and business would be replaced with the terms project and company. Identifying that an email is discussing the project, rather than the artifact which it produces, was felt to be intuitive. Similarly, company-related dialogue, abstract from a specific project, is conceptually distinct. A second tier of coding was proposed, such that product, project and company, formed three categories, within which a set of further terms existed. To select and develop these terms, existing literature was considered. Definitions and concepts surrounding the product have been thoroughly researched and described within engineering design. It was hence possible to select a suitable set of pre-existing terms to identify product emails. Li and Ramani’s (2007) taxonomy was selected for this purpose. Similarly, structured or grouped terms were not found directly for the project or company facets. Using the core academic texts in the areas of engineering project management and organizational management (Field and Keller 1998; Gray and Larson 2000; Smith 2007; Haberberg and Rieple 2001) suitable terms were identified and grouped. The new scheme, an early version of which is shown in Table 2, was then incrementally developed through application to emails by the researchers. The hierarchical approach enabled the top-level categories of product, project and company to be better defined by the features that comprise them, which, as we shall show later, improved the coder’s reliability. It was also appreciated that there is a trade-off between the reliability of a scheme and its detail. Having two levels made this compromise easier, allowing one level to be more reliable and concise, the other to explore greater detail. This revised layout and terms were found to be more intuitive and distinct to apply, improving reliability. It was

Table 2 Development of coding scheme structure for what topics were discussed Product

Project

Company

Team

Stakeholders

Function

Risk

Financial resources

Material

Cost

Tools/methods

Environment

Schedules

Human resources

Performance

Contracts

Physical resources

Manufacturing

Deliverables

Knowledge resources

Time

Practices and procedures

also felt that the findings could be more easily interpreted. The terms within each of the categories were then developed, influenced by characteristics of emails from a variety of projects. Note again that the final version of the scheme, incorporating aspects of what, why and how is presented at the end of the paper. 5.2 Why: development This dimension of the coding scheme was intended to address the issue: Why was the email sent? As a first step towards identifying purposes of sending messages, terminology was compiled from both empirical and theoretical studies of design communication, which are summarized in Table 3. Schemes which had been intended for both formal interactions, such as meetings (Huet 2006) and informal interactions, normal working (Aurisicchio 2005) were reviewed. The breadth of these ranged from design specific orientation (Valkenburg and Dorst 1998) to general engineering practices (Sim and Duffy 2003). This array of terms was then applied to a sample of emails, as per the method, and findings discussed between the researchers. The collection of terms was found to be suitably diverse, capturing a variety of communication purposes, without omitting any. It was felt, however, that the process of applying the terms was not particularly pragmatic, given their variety and a lack of defined relations between them. As a result, terms that were redundant due to overlap were eliminated and the remaining terms presented within more structured groups. It was envisaged this would also make the scheme more reliable to use. It was observed in the emails that a number of the communication purposes related to transactions; these could be further divided into two groups. Information

123

54


Table 3 Terminology identifying purposes for which communications occur Ref.

Term grouping

Ref.

Term grouping

Aurisicchio (2005), Huet (2006) and Lusk (2006)

Informing, sending information and receiving information

Huet (2006), Sim and Duffy (2003), Stempfle and Badke-Schaub (2002) and Lusk (2006)

Managing, management (control, planning coordination)

Eckert and Stacey (2001)

Requesting Information

Eckert and Stacey (2001)

Justification

Valkenburg and Dorst (1998)

Reflecting

Aurisicchio (2005) and Stempfle and BadkeSchaub (2002)

Analysis

Aurisicchio (2005), Huet (2006), Sim and Duffy (2003), Stempfle and Badke-Schaub (2002) and Pahl and Beitz (1996)

Evaluating

Huet (2006), Stempfle and Badke-Schaub (2002), Pahl and Beitz (1996) and Eckert and Stacey (2001)

Decision-making and decision

Aurisicchio (2005), Stempfle and Badke-Schaub (2002), Pahl and Beitz (1996) and Eckert and Stacey (2001)

Solving and solution generation

Huet (2006) and Lusk (2006)

Problem solving and resolving problems

Stempfle and Badke-Schaub (2002) and Pahl and Beitz (1996)

Goals

Pahl and Beitz (1996) and Eckert and Stacey (2001)

Constraints and constraint negotiation

Aurisicchio (2005) Eckert and Stacey (2001)

Confirmation Conflict resolution

Aurisicchio (2005) Eckert and Stacey (2001)

Comparison Negotiating clarification

handling transactions include requests for information, and sending and receiving information. Management-related transactions involve directing or requesting other people to take action of some form, beyond a purely information handling process. The observation of information transactions was less than straightforward, given the incomplete representation emails give. For example, it was difficult to ascertain whether information in an email had been previously requested. It might have appeared in another part of the thread that was not detailed, or have occurred through another communication channel. The remainder of communicative purposes was found to relate to problem solving behavior, such as generating solution ideas and giving evaluations. Given the significance of problem solving in design activity (e.g., Simon 1969; Goel 1995) and the ambiguity in existing literature as to how well email might support this (compare Wilson 2002 with Lusk 2006), such a grouping enables valuable insight. The three categories proposed: information and management transactions and problem-solving behavior may be further supported by the work of Medland (1992). He identifies four types of communication relating to: delegation (cf. management transactions) awareness and reporting (cf. information transactions) and problem handling (cf. problem-solving behavior). Based on the aforementioned rationale, overlapping terms were eliminated and remaining terms were grouped into transactions, with the subsets of information and management, and problem-solving behavior. The rationalized version is presented in Table 4. Acknowledging that it

123

may not be possible to determine whether information has been freely initiated, or sent in response to a request, the term informing encompasses both. Facets of management transactions were found to provide an unnecessary level of detail and complexity, especially given that the activities involved, such as planning, were captured by the what dimension of the scheme. The scheme (Table 4) allowed details of interest to be captured, as well as presenting more general characteristics through the use of groupings. Management and information distribution have been identified as key uses of email in generic contexts (Wilson 2002) as has problem solving to design activity. For completeness, it should be noted that more fundamental analysis methods, such as speech–act theory (Bach and Harnish 1979), were considered. Although there is the potential to provide more detailed analysis, considerable time, effort and training are required on the part of the researcher. It was not felt that the added benefit was a worthwhile compromise for either initial research into email use or contributing a useable scheme for wider engineering research. 5.3 How: development This dimension of the coding scheme was intended to address the issue of: How is the content of emails expressed? with the intention of finding if anything could be learned from the language used and towards the characteristics of discussion. A number of established approaches were explored, the foremost of which was the process for interaction analysis

Res Eng Design (2010) 21:43–64 Table 4 Three proposed groupings of communicative purposes

55

Transactions

Problem-solving behavior

Informing

Exploring

Clarifying Confirming

Information transactions

Requesting information Managing including control, planning, directing

Debating Identifying goals Identifying constraints

Management transactions

Solutions Evaluating Decision-making Reflecting

developed by Bales (1950, 1951). Bales used content analysis (via coding) to explore communication within small working groups. His method showed how types of phrases changed as discussions evolved towards decisions. It also showed how certain players dominated the group. Although Bales developed this technique to analyze colocated groups, it has been used to examine computermediated communication between virtual groups (Hiltz et al. 1980; Reid et al. 1996) and to compare this with faceto-face working. In the engineering design domain, Bales’s approach has been favorably reported in comparison with other methods for examining meeting communication

(Gorse and Emmitt 2003) and has been subsequently applied (Gorse and Emmitt 2007). Bales’s coding scheme, Fig. 4, captures both social and task-related characteristics. Both of these facets are particularly relevant to studies of designers, who work together solving-problem-based tasks. Three types of positive socio-emotional reactions are described (group A) along with their negative equivalents (group D). Three types of question and matching responses (groups C and B) describe information relating to the task. A number of other techniques to explore how communication is expressed were considered. Among these, Huet

Fig. 4 Bales categories for analyzing interaction processes, adapted from Bales (1950)

123

56

(2006) applied an approach proposed by Conklin (2003), which labels each sentence as a question, answer, statement, or feeling. It was proposed that a collection of such elements would comprise communication purposes (why), such as clarifying. This mark up has the advantage of being straightforward to apply. Also considered were methods to understand the subjective content of design text (Dong 2006). Previous application of this method has been used to show how the subjective content of design meetings influences knowledge generation (Dong et al. 2009) and the formation of shared understanding (Kleinsmann and Dong 2007). While the linguistic analysis technique is much more rigorous, it is more suitable for the detailed analysis of the relation between linguistic phenomena and activity. Nonetheless, this framework assisted the coders in identifying words that relate to Bales’s categories. Having applied the techniques, Bales’s method was found to be the best-suited and sufficient without modification. The combination of social and task analysis of Bales’s method, along with its previous application to computer communication and design environments, gave support to its adoption. It was also found to be no more time consuming to apply than other methods, while providing a rich level of detail. It was noted that the difference between what and why parts of the scheme was clear, but the distinction between why and how was less obvious. The how content is expressed, coded using Bales’s approach, should analyze discussion at the clause level, abstract from the overall purpose or purposes of the message. These should be identified on a paragraph basis as a why. It was recognized that the purposes of emails (why) might be constructed through components of how they were expressed; however, it was not considered relevant or beneficial to formally and theoretically link these.

6 Evaluation of scheme and approach The what, why and how sub-schemes were applied to an email corpus in order to obtain exemplar findings and demonstrate the suitability of the scheme. A review meeting was held, attended by the research team, five academics from a number of universities, and representatives of the company from which the corpus was obtained. The suitability of the scheme was critically appraised in relation to its scope, reliability and validity. 6.1 Scope Was the coverage broad enough and were any relevant characteristics failing to be captured? The use of the

123


three aspects of what, why and how was felt to give broad and suitable coverage. No missing themes were identified either during the final application or by the review panel. The capture of interactions through why and how was viewed to be a major asset of the scheme. Was the level of detail sufficient, or should coding elements be broken down further? The two tier approach of the what part of the scheme was felt to be beneficial in capturing finer detail and similarly for the groupings used with the why part of the scheme. It was found that although the scheme had sufficient detail to annotate specific portions of text, the view of what the whole email contained was most appropriate. Were there terms that overlapped unnecessarily? There was initially a considerable overlap between the product, project and company categories. Although this was noted to be useful, because emails could relate to more than one of these, it was felt that there was still too high an overlap caused by ambiguity. Improvements to definitions and clarification of terms within each category were undertaken to resolve this. 6.2 Reliability Was the reliability between coders sufficient? 0.7 was the suggested threshold. Lombard et al. (2002) found that inter-coder reliability is widely underreported, and question the validity of any such work without it. They identify a number of different indexes for measuring inter-coder reliability, noting that simple percentage agreement does not account for coincidental agreements. Acceptable values for agreement indexes are acknowledged to be subjective, but it is suggested that indexes of 0.9 or higher are almost universally acceptable, above 0.8 suitable for most circumstances, and 0.7 sufficient for exploratory research. The figures for inter-coder reliability using Cohen’s kappa (k) (Cohen 1960) gave k [ 0.81 for the top-level codes (e.g., product, company and information transactions) and k [ 0.7 for the sub set terms (e.g., cost, feature and decision). It was also noted that the proportional use of each term was very similar between researchers. This was considered to be sufficient considering the exploratory nature of the study. Was the scheme consistently applied, or did researchers’ coding behavior change as time passed? To measure the stability of coding, two batches of different emails were coded several days apart, and the proportional use of each term compared, s. The stability of top-level codes was found to be s [ 0.84. Subset terms, because of their less frequent occurrence, were more difficult to meaningfully evaluate. Despite this, stability values of s [ 0.7 were


achieved. Although coders had perceived that they felt that their use of the coding scheme could vary throughout application, this suggests it was not the case. To optimize coding consistency and inter-coder reliability, it was suggested that in a large-scale application, coders should regularly re-train by coding emails together. It was also suggested that emails should not be marked up in chronological order, as such bias might affect observations of how email use changes with time. 6.3 Validity Were the conceptual groupings identified in the coding scheme recognized to be ‘truthful’? The grounding of terminology and structure within existing literature assured the conceptual rigor of the scheme. Peers agreed that the concepts that the scheme captured were well founded. Was the scheme reasonably efficient and effective to apply? The scheme was generally pragmatic and experienced coders were able to code an average email in 10 min. The most time-consuming process was found to be coding the elements of how content was expressed, due to its mark up at the sentence level. This facet of the scheme was also observed to be more valuable for exploring the micro rather than macro-perspective of email use. For this reason, it was proposed that in later case studies a large number of emails should be coded with the what and why to obtain an overall picture, and a smaller subset with how to learn about the intricacies of communication. Were the trends reflective of expectations, based on previous studies and knowledge of the specific project? The findings generally aligned with the company’s expectations and knowledge of the project. For example, a significant proportion of emails discussed ‘company’ type information, unsurprising as this was an in-house project. A large proportion of emails related to sharing information and to management, as reported by Wilson (2002). Were the trends shown found to offer new insight? A key insight was that a low proportion of emails showed problem solving, and, of these, very few showed any decisionmaking. This confronted an assumption by the company that important decisions were being recorded in emails, and raised the possibility that rationale is being lost. Were the results appropriate to answer the research questions? It was shown that emails relating to the product peaked toward the end of the project schedule. This demonstrated the possibility of using emails to monitor the progress of projects, and that changes in email use over time can be captured by means other than mailbox size.

57

7 Guide to the coding of emails The final version of the coding scheme is now presented in Table 5 along with guidance and examples as to its application. Generally, the coder should bear in mind any contextual information that can be reasonably ascertained from the document. This might include the text in the subject field and anything known about the sender and recipient(s), i.e., is the email inter, or intracompany. Coders should make reasonable inferences from what is presented but not assumptions that are unfounded. The first stage in the coding of ‘What topics does the email discuss’, uses the scheme presented in Table 5. Having read the email, the coder should, based on the predominant themes of the message, determine whether it generally relates most to the product, project or company. If it significantly relates to more than one category, then more than one code may be allocated. Following this, the coder should identify further details from within the selected category. In the example illustrated in Fig. 5, an email related mainly to the product and discusses features and materials. Applying the what aspect of the scheme first gives the coder a chance to reflect on the overall nature of the message. The next stage, marking up the why, requires a further level of interpretation. Table 6 presents the coding scheme; terms are listed as either problem-solving behavior or transactions. The coder is required to first identify which bottom level terms are demonstrated within the email, such as exploring or clarifying. This is opposed to identifying the top-level categories and then their facets, as was undertaken when marking up the what aspect of the scheme. Further processing can then surmise which types of transaction are demonstrated (information or management) and whether there is evidence of problem solving. Coders should highlight an appropriate portion of text for the coding term they are allocating. This may be a paragraph or a few words alone. The quantity of words tagged by a particular label does not necessarily represent their significance, and assumptions should not be drawn on this basis (Lincoln and Guba 1985). If this was the case, coders would have to give much consideration as to where to start and finish their selection of text, overlapping concepts also present an issue. Instead, the coding determines which terms do and do not occur within the email. Highlighting specific passages aids the coding process and enables a return to the message later for further review. Figure 6 demonstrates the coding of purposes in a portion of email text. The final element of the scheme is coding how content is expressed, through the application of Bales’s (1950)

123

58


Table 5 Coding terminology and definitions identifying what topics email discuss What topics does the email discuss? i.e., what subject matter it relates to Product the output of the project; it may be a physical artifact or software

Project the domain within which the product is created

Company the sponsors or facilitators of a project

Functions: things the product must do, e.g., be fast

Risk: assessing likelihood and weighting implications

Stake holders: such as share holders, customers, directors and their culture and politics

Performance: how well the product achieves its functions

Plans: management of phases, activities and tasks

Economic issues: costs and efficiency, market and product selection

Feature: the quality or characteristic with which the function is achieved

Team: team selection, development

Human resources: people, availability, allocation, training, replacing

Operating environment: objects that interact with the product

Quality management: quality, standard or expectations

Physical resources: ranging from offices to equipment

Materials and components: materials and component selection and characteristics

Cost: financial arrangements at the level of the project, rather than specific component costs

Financial resources: cash, assets, borrowing

Manufacturing: consideration of manufacturing, assembly and transport

Time: durations or deadlines. Any link or reference to time

Knowledge resources: current ability and stored information

Cost: consideration of costs particularly unit costs

Manufacture: arranging manufacture, planning manufacture, in the context of the project

Tools and methods: specific testing and modeling techniques

Ergonomics: user interaction with product

Delivery: the delivery or provision of a specific component or sub-system

Practices and procedures: accumulated by the company, often developed through experience

Specification: formal requirements definition for the product/design. Or requirements for sub/super components of the product

Contracts: legal arrangements involving two or more parties setting out what is required from the project, often specifying costs and time Milestones and deliverables: targets to be achieved, or which have been achieved, related to formal stages within the project Documentation or knowledge resources: reference to general documentation resource, most likely PROMIS or mention of knowledge management process specific to the project Administration: general administration related to the project, but not distinctly captured by one of the other terms above

method. Table 7 presents the 12 possible tags that can be applied to each clause or sentence; these are grouped into socio-emotional and task categories, with all terms having a polar opposite. The scheme is applied at the term level by the coder, with the groupings used to aid navigation of the scheme and later analysis. Examples in Fig. 7 show both social responses, such as ‘showing antagonism’ and task responses such as ‘giving orientation’. These labels are applied to sentences (presented as paragraphs for clarity in the figure).

8 Results A selection of results from the analysis of emails from one of the engineering projects studied are now presented to demonstrate the potential of the method and to provide some insight into how much engineering work is embodied in

123

email. If engineering email embodies a significant amount of work, we would expect to find a high proportion of emails about the product (what emails are about) and a high degree of problem-solving behavior (why emails are sent). For the engineering project email surveyed, we did not find evidence to support these assertions. Reasons for this might include the type of design activity (new, adaptive or variant), the team composition, working practices, level of geographical distribution and prior experience. While detailed investigation of these aspects is beyond the scope of this paper, some background information is included to provide the reader with some further insight. The project lasted 3 years and involved the design of the control system for an electromechanical drive system. The project team consisted of software, control systems and mechanical engineers, a project manager, a project administrator, two chief engineers and the customer (an internal business unit). Team members were geographically distributed between a UK site and two


59

Fig. 5 Coding what topics the email discusses: the general theme is discussion of the product, and terms highlighted within this category are components, materials and functions

Table 6 Coding terminology and definitions identifying why emails are sent, with respect to transactions and problem solving behavior Why are the emails sent? i.e., their purpose Problem-solving behavior

Transactions

Goal setting: identifying where the design is, and where it needs progressing to

Information transactions

Informing: sharing, presenting or distribution information with others. No response is required. It is passive

Constraining: imposing boundaries with requirements and desirables

Requesting information: direct request to another party to provide information, or further information. Including explicit responses to requests for information

Developing solutions: it may encompass one or more of the following stages: searching, gathering, creating and developing solutions. Presentation of solutions for comment is also encompassed

Clarifying: clearing up misunderstandings (both requesting and in response). Asking for explanations, resolving a general lack of clarity

Evaluating: Judging the quality, value and importance of something

Management transaction

Decision making: considering key factors from evaluation and possible compromises to form decision

Confirming: confirming or requesting confirmation of something Managing: includes arranging, directing and instructing. Implies action (such as a response) needs to be taken. Including process management outside of the organization, e.g., prompting arrangements/meetings with third parties

Reflecting: reflecting upon a design/product decision or process already adopted or occurred. Reflecting may question whether a new of further problem now exists Debating: discussing opposite views Exploring: discussing possibilities and ideas, invoking suggestions. A return is expected from the recipient

sites in mainland Europe. The corpus consisted of 739 emails sent by 63 people to 158 recipients. In addition to email tools, project members had access to standard video conference

facilities and online conferencing tools. These were used to supplement the traditional design review meetings used as part of a stage-gate design process.

123

60


Fig. 6 The coding of why emails are sent; the purposes of confirming and managing are highlighted

The proportion of emails relating to the product, project and company is shown in Fig. 8, which also includes the percentage of emails that relate to two or more of these topics areas and overlap. The project to which the diagram relates concerned the development of an in-house system. This might be one reason for the large proportion of company-related email content. Despite this, a slightly higher proportion of the emails were seen to relate to information associated with the product rather than the project itself. Previous studies have suggested that the major purposes for emails are to manage and inform. This was supported by the findings in Fig. 9. To a far less extent, around one-fifth of emails in the project concerned contained discussion of an exploratory nature that is associated with generating design ideas. It is possible that engineering work in email is limited to the engineers alone. For this reason, emails were aggregated with respect to roles and by the topic of the email

sent. The results reveal that, as might be expected, the topics of email users send and receive reflects their roles within projects. In Fig. 10, the emails of project directors can be seen to involve more project than product discussion. Those more concerned with the detail of the engineering, the engineers and their managers, both discuss more product information, although their emails still contain significant project-related information On the basis of these three data points, we could conclude that engineering email will include engineering design work, but to a rather limited degree, and is likely to be limited to email by ‘working’ engineers rather than project managers and other stakeholders. Despite the rather limited percentage of emails embodying engineering work, some of the most insightful findings the methodology can identify relate to changes over time of why emails were sent. Such analyses of changes over time present an opportunity to provide realtime evaluation of project progress in comparison with expectations. In Fig. 11, the number of emails containing content relating to problem solving throughout different phases of a project can be seen. These concur with user accounts of the project, where peaks in problem solving in emails align with specific events where difficulties or pressures were encountered in this project. The relatively low level of management throughout the project suggests that this project was well-coordinated (Coates et al. 2004). It is also interesting to note that the rate of change of the problem-solving emails follows that of the information emails. Their correspondence reaffirms that engineering design is an information intensive activity (Hicks et al. 2002) which is heavily dependent upon the ability of engineers’ to access an adequate amount of appropriate, accurate and up-to-date information (Hicks 2007). The graph implies that to produce one unit of ‘design content’ information could require 2–3 times as much ‘supporting’ information about the project or the company. This finding, in concert with the finding of the overlap in email content

Table 7 Coding terminology and definitions used to identify characteristics of how email dialogue is expressed How is email content expressed? Socio-emotional terms

Task-related terms

Positive reactions

Negative reactions

Sharing

Requesting

Shows solidarity raises other’s status, gives help, reward

Shows antagonism deflates other’s status, asserts or defends self

Gives opinion, evaluation, analysis, expresses feeling or wish

Asks for opinion, evaluation, analysis, expression or feeling

Gives suggestion direction, implying autonomy for other

Asks for suggestion direction, possible ways of action

Shows tension release jokes, laughs, Shows tension asks for help, shows satisfaction withdraws out of field Agrees, shows passive acceptance, understands, concurs, complies

123

Disagrees, shows passive rejection, Gives orientation, information, formality, withholds resources repeats, clarifies

Asks for orientation information, repetition and confirmation


61

Fig. 9 The proportion of emails relating to each of the four most common communicative purposes; most emails had more than one purpose

Fig. 7 The coding of how email content is expressed using terminology from Bales

Fig. 10 The product versus project bias of email collections for individuals with different project roles

containing product and project information in Fig. 8 suggests that efforts to more closely integrate product data management with project management are needed (Mesihovic et al. 2004).

9 Conclusions Fig. 8 The distribution of topics of discussion within a sample of emails. Almost 80% of emails are related to the company

The research reported in this paper addressed the need for a methodology to investigate the content and role of email in

123


40

Information Management Problem Solving

35 30 25 20 15 10 5

Jun 06

Mar 06

Dec 05

Sep 05

Jun 05

Mar 05

0

Dec 04

number of emails sent in 60 day period

62

Fig. 11 The change in the level of problem solving email at different phases in the project

engineering design projects. The paper presents the development of a method to classify the content of email, thereby resulting in a taxonomy and a vocabulary to codify and organize engineering knowledge contained in emails. Design studies have used methods of participant interview, and observation in limited or contrived settings, to study information use and communication. Textual sources, such as meeting transcripts and documents, have been analyzed using coding approaches to gather evidence. In the email domain, it was found that previous work had investigated email from a perspective of efficiency and improvement to the working practices of the user. Such research had relied mainly on statistical data and gathering of user opinion via interview and survey. An organizational perspective, considering email within wider knowledge management strategy and as a project support tool, has been largely overlooked. It was proposed to develop a method that analyzed the textual content of emails to offer insight as to their role in engineering projects. This offered the advantage of being able to analyze information gathered over an extensive time period from a real engineering project distributed over a number of sites in a practical manner. This approach also involved user interviews and analysis of project documentation to provide validation of the methodology and to explain the trends observed. To enable the classification of email content, a taxonomy and coding scheme were developed. This was targeted to identify: what topics emails discuss, why they were sent, and how their content is expressed. To achieve this, several researchers iteratively applied coding schemes to sample emails from several engineering projects, reviewed their suitability, and then proposed and reapplied with refinements. Integrating existing coding schemes and domainspecific terminology into the development process grounded the development approach. The concepts of scope, reliability and validity were used in the process of developing and

123

evaluating the coding scheme. The scope was considered in terms of the level of detail the scheme went down to and its breadth in covering all of the necessary concepts. The reliability in applying the scheme related to both inter-coder agreement and also consistent application over time. It was acknowledged that a balance between the scope and reliability is needed for the results to be valid but also for the scheme to be practicable. The scheme also needed to be conceptually accurate and produce a level of concurrence with existing knowledge about the design project studied; it was also important for the scheme to offer new insight and to produce findings relevant to the research aim. A selection of results was presented to illustrate the utility and potential of the method including the elicitation of knowledge about project performance and implications for identifying and accessing engineering knowledge. This research contributes a robust technique for the analysis of email use within engineering design projects. Furthermore, the methodology followed could be re-applied to develop new coding schemes for the investigation of email within alternative domains. The textual analysis of emails has been shown to offer an alternative way of investigating email use from a holistic perspective. It compares well with the more traditional methods of user survey, interview and observation. Content from long term, distributed and complex projects can be easily sampled, while the findings reflect on email use from a project, rather than an individual, perspective. The proposed taxonomy and methodological approach provide the basis for investigating and characterizing email use within the context of engineering organizations and engineering teams. This ability is a prerequisite for improving knowledge management, and in particular: improving the way that engineers network, share and acquire information and expertise; improving the management of records (information sources) such as determining what to keep, how to best organize it, and when to reuse it; and, mining legacy archives for ‘knowledge’. In pursuit of these aims, the authors intend to apply the method to a number of email corpora to characterize overall content of email in engineering projects and to understand the implications of trends and changing content over the project engineering phase. There also exist opportunities to examine the suitability of automated classification and language processing techniques, based on the proposed taxonomy, to provide real-time knowledge extraction and improved records management. Furthermore, the taxonomy could be inverted in order to provide the basis for tailored email clients that better support the communication and information sharing needs of engineers and engineering teams.

Res Eng Design (2010) 21:43–64 Acknowledgments The work reported in this paper has been undertaken as part of the EPSRC Innovative Manufacturing Research Centre at the University of Bath (grant reference GR/R67507/0). The work has also been supported by a number of industrial companies and engineers. The authors gratefully acknowledge this support and express their thanks for the advice and support of all concerned. In particular the authors would like to thank Laurie Burrow, Hamish McAlpine and Craig Loftus who contributed to the development of the candidate schemes.

References Ahmed S, Wallace K (2003) Indexing design knowledge based upon descriptions of design process. International conference on engineering design ICED 03, Stockholm AIMM International (2003) Email policies and practices: an industry study conducted by AIIM International and Kahn Consulting, Inc. Industry Watch AIMM International (2006) Email management: an oxymoron? An industry study conducted by AIIM International and Tower Software. Industry Watch Ainscough M, Yazdani B (2000) Concurrent engineering within British industry. Concurr Eng 8:2–11 Aurisicchio M (2005) Characterising information acquisition in engineering design, engineering department. Cambridge University, Cambridge Bach K, Harnish RM (1979) Linguistic communication and speech acts. MIT Press, Cambridge Bales RF (1950) A set of categories for the analysis of small group interaction. Am Sociol Rev 15:7 Bales RFSF (1951) Phases in group problem solving. J Abnorm Soc Psychol 46:485–495 Bellotti V, Ducheneaut N, Howard M, Smith T (2003) Taking email to task: the design and evaluation of a task management centered email tool. Association for Computing Machinery, Ft. Lauderdale Bellotti V, Ducheneaut N, Howard M, Smith I, Grinter RE (2005) Quality versus quantity: e-mail-centric task management and its relation with overload. Hum Comput Interact 20:89–138 Bouikni N, Rivest L, Desrochers A (2008) A multiple views management system for concurrent engineering and PLM. Concurr Eng 16:61–72 Burn J, Barnett M (1999) Communicating for advantage in the virtual organization. IEEE Trans Prof Commun 42:215–222 Coates G, Duffy AHB, Whitfield I, Hills W (2004) Engineering management: operational design coordination. J Eng Design 15:433–446 Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measure 20:37–46 Conklin J (2003) Dialog mapping: reflections on an industrial strength case study. In: Kirschner P, Buckingham Shum S, Carr C (eds) Vizualising argumentation: software tools for collaborative and educational sense making. Springer, London Cross N, Dorst K, Roozenburg N (1992) Research in design thinking. Delft University Press, Delft Dalli A, Xia Y, Wilks Y (2004) FASIL email summarisation system. Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics, Geneva Dong A (2006) How am I doing? The language of appraisal in design. In: Gero JS (ed) Design computing and cognition ‘06 (DCC06). Kluwer, Eindhoven Dong A, Kleinsmann M, Valkenburg R (2009) Affect-in-cognition through the language of appraisals. In: Mcdonnell J, Lloyd P

63 (eds) About: designing—analysing design meetings. Taylor and Francis, London Dredze M, Lau T, Kushmerick N (2006) Automatically classifying emails into activities. Proceedings of the 11th international conference on intelligent user interfaces. ACM, Sydney Eckert C (2001) The communication bottleneck in knitwear design: analysis and computing solutions. Comp Support Cooperative Work CSCW 10:29–74 Eckert CM, Stacey MK (2001) Dimensions of communication in design. 13th International Conference on Engineering Design (ICED’01), Glasgow Eppler MJ, Mengis J (2004) The concept of information overload: a review of literature from organization science, accounting, marketing, mis, and related disciplines. Inf Soc 20:325–344 Ericsson KA, Simon HA (1993) Protocol analysis: verbal reports as data. MIT Press, Cambridge Field M, Keller L (1998) Project management. International Thompson Series Press, London Fisher D, Brush AJ, Gleave E, Smith MA (2006) Revisiting Whittaker and Sidner’s ‘‘email overload’’ ten years later. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. ACM, Banff Gantz J, Reinsel D, Chute C, Schlichting W, Mcarthur J, Minton S, Xheneti I, Toncheva A, Manfrediz A (2007) The expanding digital universe: a forecast of worldwide information growth Through 2010. IDC, Massachusetts Gero J, Mc Neill T (1998) An approach to the analysis of design protocols. Design studies 19:21–61 Goel V (1995) Sketches of thought. MIT Press, Cambridge Gorse CA, Emmitt S (2003) Investigating interpersonal communication during construction progress meetings: challenges and opportunities. Eng Construct Arch Manage 10:234–244 Gorse CA, Emmitt S (2007) Communication behaviour during management and design team meetings: a comparison of group interaction. Construct Manage Econ 25:1197–1213 Graveline A, Geisler C, Danchak M (2000) Teaming together apart: emergent patterns of media use in collaboration at a distance. Proceedings of IEEE professional communication society international professional communication conference and Proceedings of the 18th annual ACM international conference on Computer documentation: technology and teamwork. IEEE Educational Activities Department, Cambridge Gray CF, Larson EW (2000) Project management: the managerial process, (1 Nov 2002), 2nd revised edition. McGraw-Hill Inc.,USA, ISBN-10:0071213406 Haberberg A, Rieple A (2001) The strategic management of organisations. Financial Times/ Prentice Hall, London Hicks BJ, Culley SJ, Allen RD, Mullineux G (2002) A framework for the requirements of capturing, storing and reusing information and knowledge in engineering design, Int J Inf Manage 22(4):263–280. ISSN 0268-4012 Hicks BJ, Dong A, Palmer R, Mcalpine HC (2008) Organizing and managing personal electronic files: a mechanical engineer’s perspective. ACM Trans Inf Syst 26:1–40 Hicks BJ (2007) Lean information management: understanding and eliminating waste. Int J Inf Manage 27(4):233–249, May 2007. ISSN 0268-4012 Hiltz SR, Johnson K, Rabke AM (1980) The process of communication in face to face vs. computerized conferences: a controlled experiment using Bales Interaction Process Analysis. Proceedings of the 18th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics Morristown, Philadelphia Huet G (2006) Design transaction monitoring: understanding design reviews for extended knowledge capture. Department of Mechanical Engineering, University of Bath, UK

123

64 Jackson TW, Burgess A, Edwards J (2006) A simple approach to improving email communication. Commun ACM 49:107–109 Kim S (2002) User modelling for knowledge sharing in e-mail communication. Southampton Kleinsmann M, Dong A (2007) Investigating the affective force on creating shared understanding. 19th international conference on design theory and methodology. ASME Press, New York Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inf Sci 177:2167–2187 Krippendorff K (1980) Content analysis: an introduction to its methodology. Sage Publications, Beverly Hills Kurasaki KS (2000) Inter-coder reliability for validating conclusions drawn from open-ended interview data. Field Methods 12:179– 194 Larson RR (2005) Information life cycle, a model of the social aspects of digital libraries. http://www.sims.berkeley.edu/ courses/is202/f98/Lecture2/index.htm Leuski A (2004) Email is a stage: discovering people roles from email archives. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Sheffield Li Z, Ramani K (2007) Ontology-based design information extraction and retrieval. Artif Intell Eng Design Anal Manuf AIEDAM 21:137–154 Lincoln YS, Guba EG (1985) Naturalistic inquiry. Sage Publications, New Delhi Lindquist A, Berglund F, Johannesson H (2008) Supplier integration and communication strategies in collaborative platform development. Concurr Eng 16:23–35 Lombard M, Snyder-Duch J, Bracken CC (2002) Content analysis in mass communication: assessment and reporting of inter-coder reliability. Hum Commun Res 28:587–604 Lowe A (2002) Studies of information use by engineering designers and the development of strategies to aid in its classification and retrieval. Department of Mechanical Engineering. University of Bristol, UK Lusk EJ (2006) Email: Its decision support systems inroads—an update. Decision Support Syst 42:328–332 Mackay WE (1988) More than just a communication system: Diversity in the use of electronic mail. Proceedings of the CSCW 89 Conference on Computer Supported Cooperative Work. ACM, Portland Maher ML, Rosenman M, Merrick K (2007) Agents for multidisciplinary design in virtual worlds. AI EDAM 21:267–277 Marsden W (2002) Aerospace for materials: the quality and quantity of materials data generated and available within the aerospace industry is without parallel, because aerospace components operate under extreme conditions (information management). Adv Mater Process 160:37–39 Mcalpine H, Hicks BJ, Huet G, Culley SJ (2006) ‘‘An investigation into the use and content of the engineer’s logbook’’. Design Stud, Springer 27(4):481–504, July 2006. ISSN 0142-694X Mcmahon C, Lowe A, Culley S (2004) Knowledge management in engineering design: personalization and codification. J Eng Design 15:307–325 Medland AJ (1992) Forms of communications observed during the study of design activities in industry. J Eng Design 5:243–253 Mesihovic S, Malmqvist J, Pikosz P (2004) Product data management system-based support for engineering project management. J Eng Design 15:389–403

123

Res Eng Design (2010) 21:43–64 O’Kane P, Hargie O (2007) Intentional and unintentional consequences of substituting face-to-face interaction with e-mail: an employee-based perspective. Interact Comp 19:20–31 O’Kane P, Palmer M, Hargie O (2007) Workplace interactions and the polymorphic role of e-mail. Leadersh Organ Dev J 28:308– 324 Pahl G, Beitz W (1996) Engineering design: a systematic approach. Springer, Berlin Patton MQ (2002) Qualitative research and evaluation methods. Sage Publications, New Delhi Puade OA, Wyeld TG (2007) Visualising collaboration: qualitative analysis of an email visualisation case study. Information Visualization, 2007. IV ‘07. 11th International Conference. IEEE, Zurich Reid FJM, Malinek V, Stott CJT, Evans J (1996) The messaging threshold in computer-mediated communication. Ergonomics 39:1017–1037 Renaud K, Ramsay J, Hair M (2006) ‘‘You’ve Got E-Mail!’’ Shall I Deal With It Now? Electronic mail from the recipient’s perspective. Int J Hum Comp Interact 21:313–332 Schmidt K, Wagner I (2004) Ordering systems: coordinative practices and artifacts in architectural design and planning. Comput Support Coop Work (CSCW) 13(5–6):349–408. doi:10.1007/ s10606-004-5059-3, ISSN 0925-9724 Sim SK, Duffy AHB (2003) Towards an ontology of generic engineering design activities. Res Eng Design 14:200–223 Simon HA (1969) The sciences of the artificial. MIT Press, Cambridge Smith NJ (2007) Engineering project management. Blackwell, Oxford Stempfle J, Badke-Schaub P (2002) Thinking in design teams—an analysis of team communication. Design Stud 23:473–496 Teresko J (2008) Growing the PLM market—strong PLM growth will propel market to exceed $30 billion by 2011. Industry week, http://www.industryweek.com/ Treasury Board of Canada (2005) Framework for management of information: the information lifecycle. http://www.cio-dpi.gc.ca/ Tyler JR, Wilkinson DM, Huberman BA (2005) E-mail as spectroscopy: automated discovery of community structure within organizations. Inf Soc 21:133–141 Valkenburg R, Dorst K (1998) Reflective practice of design teams. Design Stud 19:249–271 Viegas FB, Golder S, Donath J (2006) Visualizing email content: portraying relationships from conversational histories. Association for Computing Machinery, Montreal, 10036-5701 Wattenberg M, Rohall SL, Gruen D, Kerr B (2005) E-Mail research: targeting the enterprise. Hum Comp Interact 20:139–162 Wengraf T (2001) Qualitative research interviewing: biographic narrative and semi-structured methods. Sage Publications, London Whittaker S, Sidner C (1996) Email overload: exploring personal information management of email. The 1996 conference on human factors in computing systems Wilson EV (2002) Email winners and losers. Commun ACM 45:121– 126 Yang H, Callan J (2008) Ontology generation for large email collections. Proceedings of the 2008 international conference on Digital government research. Digital Government Society of North America, Montreal

Understanding engineering email: the development of a ... - CiteSeerX

Understanding engineering email: the development of a ... - CiteSeerX

Suggest Documents

Understanding the Value of Software Engineering ... - CiteSeerX

Understanding Email Writers: Personality Prediction from ... - CiteSeerX

Understanding Email Writers: Personality Prediction from Email ...

Development of engineering competencies: A feedback ... - CiteSeerX

Investigating the Development of Understanding and ... - CiteSeerX

Participatory Groupware Development: Email Interaction ... - CiteSeerX

THE APE THAT USED EMAIL: UNDERSTANDING E ...

A holistic view on the development of engineering ... - CiteSeerX

Understanding Software Development Processes ... - CiteSeerX

WHOLENESS, UNDERSTANDING, AND DEVELOPMENT - CiteSeerX

A model for engineering faculty development. - CiteSeerX

SERUM - Software Engineering Risk: Understanding and ... - CiteSeerX

Program understanding in databases reverse engineering - CiteSeerX

Anatomy of a Phishing Email - CiteSeerX

The engineering of engineering education: curriculum development ...

Engineering Development of Superconducting RF Linac ... - CiteSeerX

Email consultations in general practice Email ... - CiteSeerX

Email consultations in general practice Email ... - CiteSeerX

A Understanding of the Temporal Stem - CiteSeerX

Development of a Software Engineering Ontology for Multi ... - CiteSeerX

The Cost of Email Interruption - CiteSeerX

The Development of a Framework for Understanding the UX of ...

Understanding Human Resource Development: A

Understanding Donation Behavior through Email - Ingmar Weber