Supporting Competitive Intelligence at DuPont by ... - World Scientific

9 downloads 44592 Views 740KB Size Report
Dec 28, 2015 - Weka, Rome, MySQL, Solr), CIntell automatically harvests, ¯lters, ... categories, and publishes an email newsletter to the global food safety team. .... inbox. For retrieving structured content from APIs (such as Salesforce.com), it.
March 3, 2016

11:06:36am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Journal of Information & Knowledge Management Vol. 15, No. 1 (2016) 1650004 (14 pages) # .c World Scienti¯c Publishing Co. DOI: 10.1142/S0219649216500040

Supporting Competitive Intelligence at DuPont by Controlling Information Overload and Cutting Through the Noise David P. Donohue* and Peter M. Murphy† DuPont Experimental Station 320/186, 200 Powder Mill Road P. O. Box 8352, DE 19803, Wilmington *[email protected][email protected]

Published 28 December 2015

Abstract. To overcome the problems of managing too much information and curating for the valuable content, DuPont's research, business, regulatory, manufacturing, legal, and marketing teams increasingly rely on the corporate library's competitive intelligence (CI) team to keep up with the latest Key Intelligence Topics (KITs) a®ecting their strategic goals and their decision-making processes. To meet the growing demand for CI news with constrained resources, the library CI team and the software application team designed and built CIntell, a user-friendly collection of technologies and services to harvest, store, curate, and publish secondary CI information. Using exclusively open source technologies (including Weka, Rome, MySQL, Solr), CIntell automatically harvests, ¯lters, de-duplicates, tags, classi¯es, and stores public and subscribed secondary information in a structured database including news, research publications, patents, government reports, and web information. The CIntell web-based user interface facilitates searching, reviewing, organising, curating, and publishing CI news of interest to a project's owners. Implementation of CIntell has more than tripled the CI newsletter productivity of the library CI team and reduced the news clutter by more than half compared to using traditional alerting tools and sporadic DIY searching. Keywords: Competitive intelligence; knowledge management; case study; information technology; open source software.

1. Introduction: DuPont's Competitive Intelligence Needs For more than 200 years, DuPont has brought innovative science and engineering to the global marketplace through products, materials, and services in diverse industries including agriculture, automotive, apparel, building and construction, chemicals, electronics, energy, food and nutrition, government and public sector, health care and medical, marine, mining, packaging, printing, plastics, rail, safety and protection, and transportation. The competitive intelligence (CI) library team provides timely information to internal clients with diverse responsibilities so they can make e®ective decisions, whether at their desk or on the go, including marketing, business, regulatory, and legal news, patents, and scienti¯c publications. The following four personas exemplify the range of CI needs across the DuPont enterprise 1650004-1

March 3, 2016

11:06:36am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

for harvesting, archiving, curating, and publishing timely CI information. As the personas illustrate, the diverse demands on the DuPont CI library team have not permitted extensive analysis of the CI information nor participating in many of the CI or decision-making processes for any team or business. The key value of the CI library team has been harvesting and curating vast amounts of information to deliver the few important, timely, critical, and necessary data to make good quality decisions throughout the company. In DuPont, CI responsibilities are often merely one of many responsibilities for a business analyst or a research scientist. By leveraging the skills of the CI library team, analysts and scientists are freed from searching for CI news and instead can focus on integrating that CI news into their team's work°ow. Alexis follows the textile industry, with particular attention to the safety and protective clothing markets for personal protective equipment (PPE) with DuPont's Kevlar r , Nomex r , and Tyvek r brands, as well as competitive brands in the PPE market. Safety standards for PPE clothing vary for ¯re¯ghters, law enforcement, industrial workers, and other hazardous environments, requiring ¯bres and fabrics that are strong, comfortable, and provide unique features of cut resistance, heat shielding, chemical impermeability, and other protection. Alexis needs timely CI news on product introductions, customer speci¯cations, industry standards, technical innovations, and incidents of injuries related to improper or inferior PPE. Alexis and the PPE CI team prefer to monitor the harvested and curated CI news with their RSS Readers. Dale supports a broad range of businesses goals and research projects by summarising energy issues in the Asia-Paci¯c region, especially sustainability, production, consumption, and economics. Trends in government regulation and subsidies, NGO advocacy, energy generation, transportation, storage, and renewable sources a®ect investment decisions across the DuPont enterprise including solar power, wind energy, improving fuel e±ciency, building materials, cellulosic ethanol, and other biofuels. Dale needs timely CI news that provides a current overview and a reliable forecast for the future of energy in China, Japan, Korea, Indonesia, India, Philippines, Australia, and other countries in the AP region. After reviewing the CI news each day, Dale publishes the critical information in a collaboration website for the team, which is read in many di®erent languages. Morgan monitors food safety issues, especially recalls, certi¯ed tests, and testing technology for E. coli, salmonella, and listeria bacterial contamination. Society's reliance upon large food manufacturing operations and the widespread food recalls due to contamination are increasing the need for fast, accurate pathogen testing across the global food industry. Morgan needs timely CI news on competitive threats in the food pathogen testing industry, especially the certi¯cation of testing methods, the implementation of government regulations, any announcements of food recalls due to bacterial contamination, and technical innovations in food safety testing. Morgan curates the CI news each week, organises the best 20 or so stories into a few categories, and publishes an email newsletter to the global food safety team. 1650004-2

March 3, 2016

11:06:36am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont

Taylor analyses market trends for lightweight materials in the transportation industry. The demands to improve fuel e±ciency, to increase payload capacity, to increase speed, and to reduce pollution are driving e®orts to develop high-performance materials that signi¯cantly reduce structural weight without compromising durability and sustainability. Taylor needs CI news across the transportation industry including performance requirements, product innovations, government standards, and investments in manufacturing capabilities. Periodically, Taylor reviews and analyses the harvested CI information, then publishes a report summarising a few signi¯cant ¯ndings in preparation for critical decisions and strategic reviews. 2. Background Literature and Commercially Available Software CI is critical to support strategic decision-making and planning, and to identify business opportunities and threats (Yap et al., 2013). E®ective CI consists of the techniques and analyses for understanding the past, present, and future behaviours of competitors, suppliers, customers, governments, and other forces a®ecting the business climate. (Prescott, 1995; Ettorre, 1995; Francis and Herring, 1999) Many reports document the positive business outcomes for e®ective CI programs. Secondary information (both free and fee-based) from the web is a critical component of a CI program (Abraham, 2003; Anica-Popa and Cucui, 2009). Paradoxically, increasing CI information harvesting can reduce knowledge, understanding, and effective decision-making by causing a shift in resource allocation from analysis and implementation to searching and curating (Morris and Edmunds, 2000). Information overload and nuisance stories continue to be a serious problem for CI professionals who use the web as a source of secondary CI information (Nordstrom and Pinkerton, 1999; Tu and Hsiang, 2000). As web content, news publishing, and sources of CI information continues to escalate, the information overload problem will become even more acute. Valuable CI information is surrounded by useless and nuisance data, which has been characterised as information pollution or noise (Correia, 2013). E®ective ¯ltering or curating the harvested CI information is increasingly important as the amount of available and harvested information proliferates. Desouza (2001) claimed that companies that e®ectively manage CI information without su®ering from \information overload" are leaders in their industry. A longstanding CI goal is to get only the essential and important information to clients in a timely way (Foulds, 1992). Ine±cient and ine®ective CI news delivery systems share common °aws: wasting readers' time, missing important information, and failing to identify critical CI information in a timely manner. Increasingly sophisticated web mining and text analytics are being integrated into CI programs both to broaden the scope of information harvested and to identify only the critical knowledge to support strategic decision-making. Cucui et al. (2010) concluded that \web mining techniques can be used to improve the quality of gathered data . . . from the web, 1650004-3

March 3, 2016

11:06:36am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

in order to improve capabilities of business decision-making process and organisational performance." Sophisticated software, a®ordable data storage, and the vast amounts of information on the web facilitate the extraction, correlation, and integration of diverse knowledge from many sources into an operational and actionable knowledge base (Ponis and Christou, 2013). Information Systems are an essential component of a corporate CI system, including advanced search engines, structured databases, and intelligent agents (Guimaraes and Armstrong, 1997). Morris and Edmunds (2000) concluded that combinations of information specialists and software are necessary to overcome information overload in a corporate environment. Also, Fowler and Hammell (2011) demonstrated that \. . .a hybrid intelligence/multi-agent system … which collates, compares and contrasts input from several traditional data mining applications . . . yields far more accurate results than any one application acting on its own." Fleisher (2007) concluded that the information specialist plays critical roles in managing secondary CI information, including ¯nding, gathering, reviewing, curating, validating, analysing, and distributing CI to intelligence-seeking clients. CIntell was ¯rst developed at DuPont as a basic, in-house news aggregator in 2005, combining simple, automated information gathering and human curation. In subsequent years, as the CIntell development team served more internal customers, a comprehensive list of requirements for an advanced CI system was compiled (Fig. 1). The CIntell development team used these criteria to inform a search strategy to identify commercially available software to meet the requirements. After a year-long investigation in 2010–2011, the team identi¯ed several commercial products that met the majority of the requirements. In the course of piloting these products, it became clear that at least one of the key requirements could not be met by any available vendors at the time. Meanwhile, the CIntell development team had considerable in-house development expertise in building these systems. Further, we recognised that much future customisation would be required by customers. Thus, the team decided to build a CI information and knowledge management system rather than buy an existing software package. 3. CIntell Software Architecture CIntell was written in Java. It was built on numerous open source Java products (Fig. 2). CIntell has two main software components. First, the CIntell Agent runs daily to retrieve, ¯lter, tag, and store new content from target content sources. Second, the CIntell User Interface permits curators to search, organise, and publish the most relevant subset of that content. As a service o®ering, CIntell is organised into distinct projects. Each project caters to a distinct DuPont team, focused on research, development, regulatory, legal, business, or marketing. Each project has a distinct set of tables in the CIntell MySQL database. These tables store the content harvested for that project. Each project within CIntell has its own distinct con¯guration, also stored in the MySQL database. This project con¯guration 1650004-4

March 3, 2016

11:06:37am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont



Clearly segregated project areas



Secure, without the possibility of proprietary content being made available on the web



Daily harvest of content from targeted sources o Public and subscribed content publishers o RSS o Non-RSS sources o Upload of spreadsheets



Custom taxonomies



Fully support all languages and character sets



De-duplicaƟon



User Interface for curaƟng content



Publishing to a variety of internal DuPont interfaces o email newsleƩer

o RSS feeds

o Atlassian Confluence

o SalesForce.com

o MicrosoŌ SharePoint

o Windows file share

o web pages (in an iframe) •

(as .docx files)

Future flexibility to extend the informaƟon tracked, to integrate with other DuPont interfaces or content sources, and to conduct data analysis and visualisaƟon Fig. 1. Functional requirements for next-generation CI service for DuPont.

CIntell Agent •

Rome RSS parser (Rome Feed Parser, 2014).



Web-Harvest web scraper (Web-Harvest, 2010)



MySQL Database (MySQL, 2014)



Apache Commons libraries (Apache Commons, 2014; Apache Lucene and Solr



Lucene search and tagging



Weka text classificaƟon

Search Engine, 2014; Apache Tomcat, 2014)

CIntell Interface •

Apache Tomcat web server (Apache Tomcat, 2014)



JQuery javascript framework (JQuery Javascript Framework , 2014)



Solr search engine Fig. 2. Open source software used by CIntell. 1650004-5

March 3, 2016

11:06:39am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

speci¯es which sources to pull information from, a taxonomy of terms that characterise the domain space, and other information relevant to where content will be published. When the CIntell Agent runs, it works through its list of active projects. For each project, the CIntell Agent retrieves the queue of sources from the MySQL database. The CIntell Agent then follows the six steps illustrated in Fig. 3. (1) Harvest — The CIntell Agent contacts each source, retrieving the latest records from that source. For the majority of CIntell sources, the Rome RSS parser harvests semi-structured content from RSS or Atom feeds. The CIntell Agent also supports harvesting from unstructured, non-RSS sources. To harvest records from web search engines, it employs Web-Harvest. Web-Harvest is an open source product, which can extract structured information from web pages. Web-Harvest can convert an unstructured web page into a list of document objects, each with ¯elds of data such as title, description, date of publication, etc. CIntell can also harvest records from sources which only provide content updates via email, which is another unstructured content source. To do this, CIntell employs Java Mail for programmatically parsing content out of an email inbox. For retrieving structured content from APIs (such as Salesforce.com), it makes Java API calls (Fig. 4). Regardless of the means of harvesting, each new article is inserted into the MySQL database, complete with key ¯elds such as title, description, and URL link to the full text.

Fig. 3. Information °ow for the CIntell Agent. 1650004-6

March 3, 2016

11:06:42am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont



public sources



subscribed content sources



news



news aggregators



scholarly (PubMed, etc.)



scienƟfic journals



regulatory (government websites)



patent databases

Fig. 4. Types of information CIntell can access.

(2) Filter — For the majority of the records, obtained from RSS or Atom feeds, the CIntell Agent ¯lters out any records matching a pre-de¯ned Boolean query term. The query syntax used by the ¯ltering engine is identical to that used by the tagging engine, described below. (3) De-duplicate — The CIntell Agent ¯lters out any new records which are duplicates of existing records in the database. Since duplicate records might come from multiple sources, the CIntell Agent employs a de-duplication algorithm that looks at a simpli¯ed hash of the title, along with other record metadata such as date of publication. Various de-duplication strategies are possible. CIntell preserves near duplicate stories since multiple versions of a signi¯cant news story or technical topic can provide greater overall insight. (4) Tag — Another way that CIntell extracts structured information from its less-structured (textual) content sources, is by tagging articles against a 1650004-7

March 3, 2016

11:06:44am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

project-speci¯c taxonomy. The CIntell Agent applies these tags to each record from the project taxonomy, based on keyterms found in the record. The CIntell Agent uses Apache Lucene to test each tag from the project taxonomy, against each new record. This tagging engine uses the identical search syntax as does the Solr search engine within the CIntell Interface. Consequently, CIntell users may devise and test new query terms for tagging or ¯ltering, by searching for those terms within the CIntell Interface. (5) Summarise — The CIntell Agent uses Essential Summariser (2014), a proprietary tool for auto-generating shorter summaries from larger sections of text. Each CIntell project can set how many lines of summary to generate for each record it harvests. This step uses advanced text mining techniques, to pull key lines out of the full text of the article. (6) Classify — The CIntell Agent uses its Weka-based text classi¯cation model (Hall et al., 2009), plus any boosting rules stored in its taxonomy, to make an automatic determination as to whether the record is Relevant or Trash. To employ this auto-classi¯er, the CIntell user must ¯rst manually de¯ne a suitably large example set of approximately 200 records in each of the two designations (Relevant or Trash). Each project may use the auto-classi¯er, which uses the SGD text classi¯cation algorithm, to build a model based on the manually classi¯ed examples. During harvesting, this text classi¯cation model automatically classi¯es each new record as Relevant or Trash (or neither). The CIntell Interface makes it clear which records have been auto-classi¯ed in this way versus those for which a human has made such Relevant/Trash classi¯cation. Classi¯cation is a critical feature in CIntell to streamline curation prior to publication. (7) Store — The CIntell Agent stores new records in the MySQL database, and indexes those records into the Solr index. Each project stores its records separately since di®erent teams may harvest the same document, yet reach di®erent conclusions about the record's importance to their individual goals.

4. News to Knowledge: CIntell Information Roles, Responsibilities, and Work°ows The central administrative role for any CIntell project is its Curator. Figure 5 provides an overview of the work°ow performed by the Curator role. The Curator role is responsible for con¯guring a CIntell project, and for its day-to-day operation. First, the Curator sets the CIntell project up with the optimal set of sources, taxonomy terms, and publishing strategy, to best meet the strategic needs of the project team. Once a CIntell project is up and running, the Curator periodically reviews content that the CIntell Agent has harvested, manually classi¯es each record as Relevant or Trash, and updates the harvest and tagging strategies. Depending on the circumstances and the project requirements, the roles and responsibilities of the Curator role can be shared in di®erent ways between the complementary talents of the library CI team and the project owners. Generally, the library CI team has more 1650004-8

March 3, 2016

11:06:44am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont

1. Interface with the DuPont project team, to determine their CI informaƟon needs. 2. Configure searches and content feeds, which will harvest CI informaƟon to monitor. 3. Add taxonomy of concepts of relevance to the project, with rules for automaƟcally tagging records with these concepts. 4. Contact CIntell soŌware team to launch CIntell Agent. 5. Import any CI records from other databases. 6. Confirm that CIntell Agent is running properly. 7. Classify records as Relevant or Trash. 8. Review records that are harvested, marking these as Featured, Relevant or Trash. Add or remove tags as necessary. 9. Revise taxonomy and harvesƟng strategy, to best meet the needs of the CI team. 10. Publish selected records as email newsleƩer, RSS Feed, or to the team’s collaboraƟon space. Fig. 5. Work°ow in setting up and running a CIntell project.

expertise in information science, and the project owners have more subject matter expertise and credibility within their team. When setting up a new CIntell project, members of the CIntell team, the library CI team, and the project owners discuss the goals of the CIntell project including the competitors, the customers, the industries, any brand names, trade associations, and government agencies of interest (Fig. 5(1)). The scope of the CIntell project may include technical, patent, marketing, business, and regulatory information. Often the project owners have some simple alerts and DIY searching that they are already using to monitor CI news. These sources can be integrated into the new CIntell project, whether from publically available or fee-based, subscribed sources of information. Since RSS feeds are a common way of publishing content, the Curator will search for the best CI news from the best sources, and save each search strategy as an RSS feed. For each feed, the URL for that RSS Feed is added to a table of sources for the new CIntell project (Fig. 5(2)). Filtering rules can be added to each RSS Feed, which can \¯lter in" or \¯lter out" records based on keywords found or not found in the article (Fig. 3(2)). CIntell's ¯ltering capability allows targeted harvesting from general information RSS Feeds, e.g. public blogs, government agencies, and trade association publications. RSS feeds are not available for every important source of CI news. CIntell's harvesting capability does allow for other ways of gathering information (Fig. 3). Some sources of CI information are still only available as email alerts or by directly 1650004-9

March 3, 2016

11:06:45am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

viewing a webpage. Project owners and CI librarians gather the list of these email alerts and webpages when setting up their new CIntell project. CIntell has the capability to parse email alerts into individual news stories. Another harvesting feature of CIntell is to use a collection of key search terms and API's to automatically conduct multiple searches in many di®erent databases. Setting up a new CIntell project involves determining which searches and key search terms should complement the RSS feed harvesting. Often CIntell project owners have already gathered some records through other searches. These records may be in a bibliographic database (e.g. EndNote) or in spreadsheets on their computers. CIntell allows importing of a Microsoft Excel spreadsheet for a collection of records, which is a valuable way for project owners to transition from a scattered collection of unstructured and unorganised records to a common, searchable, structured, collaborative CIntell database (Fig. 5(5)). An important feature of CIntell is the tagging of CI records to organise the database for easy recall and publishing (Fig. 5(3)). The search strategy and the taxonomy for a project will have much in common, though record tags are often poor search terms. A hierarchy of tags allows records to be grouped by competitor, customer, technology, etc. For each tag, one or more keywords are con¯gured in a Boolean logic query. For each harvested record, CIntell searches for the query in the record. When the query is found, the tag is automatically applied to the record (Fig. 3(4)). Tags also facilitate automatic record classi¯cation as Relevant or Trash, which facilitates curation prior to publishing. When the CIntell Project is fully con¯gured with its RSS feeds, search terms and sources, email alerts, taxonomy, and associated ¯ltering and tagging rules, the CIntell Agent is turned on (Fig. 5(6)). The CIntell Agent activates at a speci¯ed interval, usually daily. The Curator's role, which is shared between the CI librarians and the project owners, is to monitor what is harvested and to assign a relative importance to the records harvested: Featured, Relevant, or Trash (Fig. 5(7,8)). As the Curator reviews the harvested records, she can adjust the harvest strategy, the collection of sources, the RSS feeds, and the harvesting ¯lters to automatically limit the nuisance records, reduce the information overload, and target valuable CI information as the project's and team's goals evolve (Fig. 5(9)). Interviews with experienced CIntell users have consistently concluded that a re¯ned harvesting strategy reduces useless information by at least half, and improved the systematic gathering of critical CI information. The Curator uses the CIntell User Interface to oversee the day-to-day operations of a CIntell project. Figure 6 illustrates some of the features of this interface. The CIntell User Interface is an AJAX-powered web application that supports advanced search for retrieving records based on search terms, the project taxonomy (tags), dates, and CIntell-speci¯c designations like Relevance/Trash or publication history. More importantly, the CIntell Interface enables the Curator to annotate records within the project database, and route them to end users by numerous means. For example, the Curator can add or remove tags from the project taxonomy, classify a 1650004-10

March 3, 2016

11:06:45am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont

Fig. 6.

The CIntell user interface.

record as Relevant or Trash, generate email newsletters, or publish selected subsets of content to a variety of DuPont content repositories. The Curator can perform a number of bulk actions against a large subset of records, including deletion, tagging, classifying, publishing, and capturing machine-generated summaries. After the Curator has classi¯ed several hundred records to Relevant and to Trash, she can opt to utilise the auto-classi¯er (Fig. 5(7)). The CIntell Agent will then use the Weka machine learning engine and the SGD text classi¯cation algorithm, to build a classi¯cation model based on the Relevant and Trash example sets created by the Curator. The CIntell Agent uses this model to classify each new record as Relevant or Trash, or neither. In the CIntell Interface, an icon appears beside those records which were manually classi¯ed, distinguishing them from those which were auto-classi¯ed. While the Curator can override any auto-classi¯cation, this CIntell feature has proven to be a valuable time-saver for CI professionals who routinely only have time to read the most important CI information, but who also want an archive of CIntell information for their business or R&D team that they can search as new KITs emerge and when strategic CI reviews are conducted. CIntell o®ers multiple means of publishing, so CI information may be delivered in the form most useful to its intended audience (Fig. 1). The Curator may publish a selection of records based on key terms, taxonomy tags, relevance, and other criteria. The Curator can publish content manually, or can set email newsletters to be 1650004-11

March 3, 2016

11:06:46am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

published automatically, at the desired interval and day of week. The most popular publishing feature is the email newsletter. Each CIntell email newsletter is con¯gured (i) to contain a subset of content harvested from each record, (ii) for an email recipient list, and (iii) to publish either automatically (daily, weekly, monthly) or manually. The typical CIntell newsletter begins with just the CI news headlines (with an embedded URL to the complete CI news story) arranged in logical groups selected for each readership: customers, competitors, government and NGO activity, technology, etc. The CIntell newsletter then repeats the same stories in the same groups with a few lines of summary for each CI news story. The response to the CIntell newsletter layout has been universally positive from the CI community and other newsletter recipients throughout DuPont. By receiving the news in logical groups and in increasing levels of detail, the reader can quickly scan all their CI news, then focus on the particular stories that catch their interest. After the records have been curated, CIntell automatically summarises, formats, organises, and publishes CI email newsletters (Fig. 5(10)). Before CIntell, library CI team collectively spent about 136 h manually harvesting, curating, and publishing 18 newsletters each week. Improved training, implementing best practices, and utilising CIntell has resulted in library CI team collectively spending about 40 h harvesting, curating, and publishing over 60 CI newsletters and alerts each week with a combined readership of over 1800 (Murphy, 2014). While many of these CIntell newsletters are fully automated alerts, the manually curated CI newsletters bene¯ted from the automation of harvesting, classifying, and publishing. Manually publishing newsletters required an average of 32.8 min per newsletter to perform the duties of curating, collating, formatting, and publishing email newsletters. Using CIntell, this was reduced to an average of 6.7 min, an 80% reduction in the overall time spent delivering CI newsletters. Since email newsletters can vary in length and complexity, a Six Sigma project quanti¯ed the improvement in just collating, formatting, and publishing email newsletters per CI news story. Manual collating, formatting, and publishing email newsletter averaged 1.20 min per story (28 newsletters, 786 stories, std. dev. ¼ 0.29 min). With CIntell, collating, formatting, and publishing email newsletter averaged only 0.33 min per story (12 newsletters, 279 stories, std. dev. ¼ 0.15 min), for a 73% time savings (p < 0:005). Customer feedback has been overwhelmingly positive on the quality, content, and style for the enhanced CI news delivery (Murphy, 2014). In our experience, regular interviews with CI newsletter and project owners provide a much more valuable way than mere surveys to monitor our internal clients' satisfaction with the library team's CI news delivery. A 10–15 min call a few times a year allows newsletter and CIntell project owners to feel responsible for the quality, content, style, and scope of their CI newsletter. These interviews also allow the CI library team to keep alert to the important KITs and unexpected surprises that are critical to the CI newsletter readerships. The general consensus across the few dozen CI newsletter owners 1650004-12

March 3, 2016

11:06:46am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

Supporting Competitive Intelligence at DuPont

interviewed is that CIntell harvesting and CI newsletter delivery have reduced information overload and nuisance CI news by at least half. 5. Conclusions DuPont's software applications team and library CI team designed and built CIntell, a user-friendly collection of technologies and services to harvest, store, classify, curate, and publish secondary CI information. CIntell has allowed research, business, regulatory, manufacturing, legal, and marketing teams to manage information overload and to reduce nuisance information by at least half while more systematically gathering valuable CI news from assorted free and subscribed sources. CIntell has more than tripled the productivity of the library CI team for publication of curated CI news, and provided a diverse variety of CI news delivery options. CIntell e®ectively uses open source technologies and custom-designed features, which has allowed it to meet DuPont's diverse CI information needs, without compromising budget constraints, business requirements, or security, resulting in more e®ective decision-making across the DuPont enterprise. Acknowledgments The authors wish to acknowledge the innovative work of their colleagues at DuPont, who were instrumental in designing and building the CIntell service: Anupam Bhattacharya, Mark Cornthwaite, Nancy Linwood Lewis, Boyd Reed, Gillian Reynolds, Amarendra Kumar Thakur, and Myrna Thomas.

References Abraham, A (2003). Business intelligence from web usage mining. Journal of Information & Knowledge Management, 2(4), 375–390. Anica-Popa, I and G Cucui (2009). A framework for enhancing competitive intelligence capabilities using decision support system based on web mining techniques. International Journal of Computers, Communications & Control, 4, 326–334. Apache Commons (2014). Available at http://commons.apache.org/. Accessed on 9 April 2014. Apache Lucene and Solr Search Engine (2014). Available at http://lucene.apache.org/. Accessed on 28 April 2014. Apache Tomcat (2014). Available at http://tomcat.apache.org/. Accessed on 8 May 2014. Correia, CC (2013). Are your intelligence e®orts information overloaded? Competitive Intelligence Magazine, 16(1), 37–42. Cucui, G, I Cucui and I Anica-Popa (2010). Using web mining technologies to improve competitive intelligence capabilities: A historical perspective, Transformations in Business & Economics, 9(19), 461–471. Desouza, KC (2001). Intelligent agents for competitive intelligence: Survey of applications. Competitive Intelligence Review, 12(4), 57–63. Essential Summarizer (2014). Available at https://essential-mining.com/summarizer/index. jsp?ui.lang=en. Accessed on 14 May 2014. 1650004-13

March 3, 2016

11:06:46am

WSPC/188-JIKM

1650004

ISSN: 0219-6492

FA1

D. P. Donohue and P. M. Murphy

Ettorre, B (1995). Managing competitive intelligence. Management Review, 84(10), 15–19. Fleisher, CS (2007). Using open source data in developing competitive and marketing intelligence. European Journal of Marketing, 42(7/8), 852–866 Foulds, S (1992). Electronic distribution of business intelligence. Competitive Intelligence Review, 3, 79–81. Fowler, CA and RJ Hammell II (2011). A hybrid intelligence/multi-agent system approach for mining information assurance data. In 2011 Ninth International Conference on Software Engineering Research, Management and Applications. Washington, DC, USA: IEEE Computer Society. Francis, DB and JP Herring (1999). Key intelligence topics: A window on the corporate competitive psyche. Competitive Intelligence Review, 10(4), 10–19. Guimaraes, T and C Armstrong (1997). Exploring the relations between competitive intelligence, is support, and business change. Competitive Intelligence Review, 9(3), 45–54. Hall M, E Frank, G Holmes, B Pfahringer, P Reutemann and I Witten (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 10–18. JQuery Javascript Framework (2014). Available at http://jquery.com/. Accessed on 12 May 2014. Morris, A and A Edmunds (2000). The problem of information overload in business organisations: A review of the literature. International Journal of Information Management, 20, 17–28. Murphy, PM (2014). Dealing news to support competitive intelligence. Online Searcher, 38(4), 10–15. MySQL (2014). Available at http://www.mysql.com/. Accessed on 12 May 2014. Nordstrom, RD and RL Pinkerton (1999). Taking advantage of internet sources to build a competitive intelligence system. Competitive Intelligence Review, 10(1), 54–61. Ponis, ST and IT Christou (2013). Competitive intelligence for SMEs: A web-based decision support system. International Journal of Business Information Systems, 12(3), 243–258. Prescott, JE (1995). The evolution of competitive intelligence. International Review of Strategic Management, 6, 71–90. Rome Feed Parser (2014). Available at http://rometools.github.io/rome/. Accessed on 18 April 2014. Tu, HC and J Hsiang (2000). An architecture and category knowledge for intelligent information retrieval agents. Decision Support Systems, 28, 255–268. Web-Harvest (2010). Available at http://web-harvest.sourceforge.net/. Accessed on 17 February 2010. Yap, CS, MZA Rashid and DA Sapuan (2013). Strategic uncertainty and ¯rm performance: The mediating role of competitive intelligence practices. Journal of Information & Knowledge Management, 12(4), 1–14.

1650004-14