Procedia Manufacturing 3 ( 2015 ) 4115 â 4120 ... Leadersrequire a deeper understanding of the broader social and civil context in ... require new, data-driven tools that aid in selecting the best course of action through .... Social media data is ... graph database implementation that enables the analytics VICTRE and CA ...
Available online at www.sciencedirect.com
ScienceDirect Procedia Manufacturing 3 (2015) 4115 – 4120
6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences, AHFE 2015
A context-aware decision support tool for assessing and mitigating drivers of civil instability Ryan S Mullins, Adam Fouse, Robert McCormack, Stacy Lovell Pfautz Aptima, Inc., 12 Gill St Suite 1400, Woburn, MA, USA 02140
Abstract Military planners and decision-makers face a number of challenges with the shift towards operating within diverse, multidimensional, and unconventionalenvironments. Leadersrequire a deeper understanding of the broader social and civil context in which operations occur, including the underlying factors that contribute to instability and the drivers of conflict. This understanding is often derived through the analysis of textual data.Both traditional and non-traditional sources – such as news articles, blog entries, and tweets – represent a vast amount of data that can be brought to bear on problems ranging from measuring the progress of missions to forecasting important changes in the environment. The volume and velocity of this data requires processing tools that can help users understand the concepts and events being discussed. Additionally, these data are inherently ambiguous, and the automated processing techniques necessary for aggregating and analyzing data may introduce further uncertainty. The validity and veracity of data, sources, assumptions, and conclusions must be carefully considered prior to action. Planners and decision-makers require new, data-driven tools that aid in selecting the best course of action through interactive exploration and assessment processes.In this paper, we describe an ongoing research and development effort to create a context-driven, web-based tool that aids planning and decision-making by providing a more comprehensive understanding of the civil component of operational environments. This tool allows users to rapidly find, organize, and assess complex data across multiple phases of civil information management. This includes: (1)researching and assessing civil vulnerabilities; (2) developing plans to address identified vulnerabilities; and (3) tracking ongoing trends and progress towards goals and objectives.Our tool assists the user in each phase by collecting, processing, and recommending data and analyses that are contextually relevant to their task.By offloading data collection, supporting data organization, and providing personalized recommendations, our tool allows users to focus their efforts on verifying, interpreting, and assessing the informationneeded to recommend the best course of action. © 2015 2015 The B.V. This is an open access article under the CC BY-NC-ND license © The Authors. Authors.Published PublishedbybyElsevier Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of AHFE Conference. Peer-review under responsibility of AHFE Conference
Keywords:Machine-supported reasoning; Cross-cultural decision making; Context-aware systems; Topic modeling;Knowledge management; Civil affairs
2351-9789 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of AHFE Conference doi:10.1016/j.promfg.2015.07.984
4116
Ryan S. Mullins et al. / Procedia Manufacturing 3 (2015) 4115 – 4120
1. Introduction The modern operating environmentis characterized by increased uncertainty as the global community faceschallenges such as fulfilling the basic needs of populations (e.g., access to food and water), friction within urban megacities, and evolving threats from non-state actors. These dynamic environments present a high potential for large-scale social unrest, disruption, and disorder. Decision-makers need to understand how multiple, interconnected dimensions - including social and civil components - combine to drive instability and conflict. Commanders and decision-makers derive their understanding of these components partly from information provided by Civil Affairs units. Civil Affairs (CA) is the practice of using military elements to provide civil and humanitarian services to a nation during peacetime, crises, or emergencies [1]. In the course of Civil Affairs Operations and Civil-Military Operations, CA teams (CATs) engage local, provincial, and national communities to assess and affect public opinion and welfare. Their goal is to promote stability and improve the quality of life in these communities and regions by reducing their vulnerabilities to driver of civil unrest. To effectively engage these communities, CATs must understand the context in which they will be operating. Often, this contextual understanding is synthesized from written reports of in-situ observations as well as open-source information,such as news and social media.This synthesized knowledge is captured in CA Area Studies pre- and post-deployment [2]. This process of collecting, mining, processing, analyzing, maintaining, and delivering civil information and analysis products is known as Civil Information Management (CIM). CA forces face several challenges related to CIM and developing their understanding of the civil component of the operating environment. For example, CATs often operate in remote, isolated, or unstable nations, where up-todate information is sparse and unreliable. The most reliable information available is found in engagement reports written during prior deployments, which are often stored as PowerPoint slides and PDFs in emails, where they remain largely inaccessible. Information gaps frequently occur between deployments, leading to loss of situation awareness that must be supplemented with information from news and social media outlets. The amount of available data grows at a faster pace each day, CIM can quickly lead to information overload. Without a comprehensive “civil picture”, it becomes increasingly difficult to understand the various underlying factors that contribute to instability, or determine if the actions CA forces are taking to address those vulnerabilities have a positive impact. To address these challenges, CA soldiers require interactive, intelligent tools that help them quickly and easily access the information they need.Aptima is currently developing Visualizations for Integrating, Communicating, and Tracking Reasoning Electronically (VICTRE) – a web application for capturing, processing, and recommending contextually relevant information to CA soldiers. This paper presents an overview of the techniques employed by VICTRE to support more efficient and thorough CIM processes. 2. Context-aware systems The tools needed to support CA soldiers and operations require machines to understand the human context in which information in used. Defining context is a difficult task, with many possible interpretations based on domain specific needs [3-5]. In this research, we define context as the explicit and implicit information about the relationships and interactions between entities and the environment that may impact interpretation or decisions [6]. Machines require explicit functionality in three key areas to understand and make use of human context: (1) modeling and reasoning about entities and topics in information; (2) capturing and modeling user interactions with information; and (3) semantically representing knowledge in a meaningful way for both human and machine agents. 2.1. Topic modeling The amount and variety of data that CA soldiers face when conducting area studies can be staggering. Advanced text processing methods can help reduce this burden by providing a quantitativeframework in which information can be synthesized from the corpus of documents. To this end, we employ Latent Dirichlet Allocation (LDA), which creates a statistical model of topics referenced in the text [7]. LDA improves on previous topic extraction methods, such as Latent Semantic Analysis (LSA) [8] and Probabilistic Latent Semantic Analysis (PLSA) [9], through the use
Ryan S. Mullins et al. / Procedia Manufacturing 3 (2015) 4115 – 4120
4117
of a generativemodel that results in more reasonable and human-readable topics.Topics in this context are formalized as probabilistic distributions overs words and features found in the documents.In practice, topics provide a gist of a document’s contents. For example, one might state that a document is about protests and elections.The LDA model provides a likelihood that the topic is mentioned in this topic, which enables searching, filtering, and sorting of documents based on shared content. In addition, topics can be used within other models to identify and track temporal – by relating likelihood values to the publication dates associated with source documents – and spatial – by rectifying topics representing named places with spatial databases of such places, like GeoNames – trends.These relationship-enriched topics provide some insight into the opinions, attitudes, and priorities of a population. 2.2. User activity and recommendations User activity data allows machines to build a model of users behaviors, and infer how the user does work in various contexts [10]. Such data has been extracted for a variety of purposes in the literature. The modeling of user activity data as it relates to usability [11] and intent inference [12] is of particular interest to this research.The primary data collected to create these models are time-stamped interaction data, for example a mouse click on a web page. Mouse click and movement data on their own can identify which portions of the interface are being used at a given time, therefore showing where users are doing their work with the application. However, these data lack the necessary relationships to understand when and how information will be useful in the context of that work. Recommendation systems merge interaction data with references to the information being used during those interactions [13, 14]. Workflow models provide a statistical representation of the information needs of an individual over time given some tasking [15]. These models are fed both the available information and interaction data, which allows the machine to understand when, where, and how information could be utilized. Information is recommended to the user when the user enters a state flagged by the models. 2.3. Semantic knowledge representation Ongoing interest in the Semantic Web and rapid investment in the Internet of Things has spurred research on computational representations of semantic knowledge. The World Wide Web Consortium (W3C) publishes data formats such as the Resource Description Framework (RDF), Web Ontology Language (OWL), and JavaScript Object Notation for Linked Data (JSON-LD) that can be used to describe concepts and the relationships between those concepts. Semantic data is often modeled as statements about entities, events or concepts and their relationships in the form of subject-predicate-object expressions, known as triples[16]. These triples are stored within a database known as a triplestore. While RDF is logically a graph, triplestores tend to not store data internally as a graph, which makes it more difficult to efficiently query. Researchers are investigating the use of new storage and representation formats, such as NoSQL databases, that can help deal with the volume, variety, velocity, and veracity of RDF data [17]. Theoretically, the key strength of ontology-based models is the ability to use semantics to dynamically discover hidden relationships between entities and documents. However, when the data is being drawn from disparate documents and datastores, these relationships can be lost due to spelling variations, different entities sharing the same name, and lack of communication among the technologies that produce the RDF from the raw documents. This challenge, termed entity resolution [18], arises in the absence of the Unique Name Assumption, when each name does not have an unambiguous mapping to a unique entity. While the ontology’s relational structure attempts to reduce the user’s manual process of resolving entity mentions across RDF statements, users are still burdened with identifying gaps in knowledge that form when new, possibly inconsistent, statements are added about existing entities and relationships.
4118
Ryan S. Mullins et al. / Procedia Manufacturing 3 (2015) 4115 – 4120
3. The VICTRE tool VICTRE is a tool designed to support CA forces as they plan and retrospect on their operations. These activities revolve around CIM, which includes tasks such as creating and briefing CA Area Studies [2]. These studies provide a structured view into an area and the current state of the social and civil infrastructure. CA soldiers gather information related to each of the areas in this taxonomy, develop courses of action (COAs) to affect some subset of these areas, and brief these COAs pre- and post-deployment. Each COA is designed to engage the community in a way that reduces their susceptibility to civil vulnerabilities. The information needs and complexities of the Area Study and COAs require significant effort and time from CA soldiers. VICTRE aims to reduce the required effort through a mix of automation, visualization, and human-centered interaction. 3.1. Natural language processing and topic modeling Arguably the most difficult task for soldiers creating Area Studies is finding and processing information relevant to the operation. VICTRE uses a natural language processing (NLP) pipeline to automate the extraction of key information from large volumes of textual information. The primary sources of information used by CA soldiers are reports from prior engagements and deployments and web-based research. These data provide insights into the overarching themes and trends related to civil vulnerabilities in the region from official sources. Social media data is another data source that is seeing increased use by CA soldiers, specifically because it provides a view into the unofficial, unfiltered opinions of the populous. The VICTRE NLP pipeline is capable of processing these three types on information and performing both entity extraction and topic modeling. Entity extraction is used to pull out references to people, places, and concepts from the text. These entities form the basis for linking raw data. Our topic modeling system uses LDA to extract and model the topics discussed in all three data types. The topic-modeling capability updates the models at regular intervals to capture changes in conversational focus over time. The LDA-generated models are used by the NLP pipeline to assist the entity extraction tools. New source materials are run through these models to identify the topics of discussion therein. Identified topics are used to support the entity extraction, enabling some reconciliation of ambiguous entities based on the similarity of topics in the source data. 3.2. Semantic knowledge graph VICTRE represents extracted information in a semantic knowledge graph. This graph is a central repository of information available to the CA soldiers, including raw data (e.g. prior Area Studies, news articles, tweets), the data extracted by our NLP pipeline, and the data generated by soldiers using the tool (e.g. annotations, tags). A multiattributed property graph links these data together.In this type of graph, relationships linking entities can be attributed with properties in the same way that entities are. Attributed relationships allow the graph to explicitly represent the nature of relationships,which would otherwise be misattributed to or derived from the properties of entities in other systems.This more naturalistic representation allows VICTRE to better represent and understand the context in which these data reference and relate humans. The VICTRE semantic knowledge graph can be deployed on several open-source persistent storage layers, including Apache HBase and Cassandra. On top of this persistence layer sits the Titan graph database, a distributed graph database implementation that enables the analytics VICTRE and CA soldiers require. The TinkerPop graphcomputing framework provides the interface for accessing and manipulating data in the Titan database. This stack enables efficient scaling in response to data volume or user activity. 3.3. Context-aware recommendations The VICTRE recommendation engine reasons about the information stored in the semantic knowledge graph, and ranks the knowledge according to the interests of the soldier. To support personalization, the recommendation engine needs to understand how information interrelates, and how information is used in the analysis and COA development process. As described above, the graph provides the understanding of how information relates in the
Ryan S. Mullins et al. / Procedia Manufacturing 3 (2015) 4115 – 4120
4119
human context. In addition, it stores interaction data – such as what user interface is being used, user profile information, and the state of the soldier’s assessment –to support an understanding about when information is used. The recommendation engine uses these data to infer when and how information will be relevant to the soldier. 4. Conclusions In this paper, we presented an overview of VICTRE, a context-aware web application focused on CIM and the deployment of a Civil Affairs Team to an area of interest. CA forces use VICTRE to organize information about potential sources of civil instability in the country, and to decide which sources of instability need to be targeted to optimize the overall stability of the area. To make this determination, the soldiers use open-source news reports and social media in combination with assessments performed of people, places, and other entities in their area of interest. They must then assess progress, measure impact, and adjust strategies. VICTREsupports these tasks by combining natural language processing, graph-based knowledge representation, and interactive visualizations to bring contextually relevant information to the attention of soldiers and decision makers so they can more rapidly find, organize, and visualizecivil information.While this work is ongoing, when VICTRE is complete we will evaluate the utility and usability of the tool with trained CA soldiers. Acknowledgements The research reported in this paperwas sponsored by theU.S. Army Research Laboratory under contract W911QX-13-C-0186. The views and conclusions contained herein are those of the authors and should not be interpreted as presenting the official policies or position, either expressed or implied, of the U.S. Army Research Laboratory or the U.S. Government unless so designated by other authorized documents. Citation of manufacturer’s or trade names does not constitute an official endorsement or approval of the use thereof. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. References [1] U.S. Department of the Army. Civil Affairs Operations. Field Manual 3-57. Washington, DC: U.S. Department of the Army, October, 2011. [2] U.S. Department of the Army. Civil Affairs Planning. Army Techniques Publication 3-57.60. Washington, DC: U.S. Department of the Army, June, 2012. [3] Cappelli, P., & Sherer, P. D. (1991). The missing role of context in OB: The need for a meso-level approach. Research in Organizational Behavior, 13, 55–110. [4] Dey, A.K. Abowd, G.D. Towards a Better Understanding of Context and Context-Awareness (2000). ACM Conference on Human Factors in Computing Systems. Workshop on the What, Who, Where, When, and How of Context-Awareness. [5] Johns, G. (2006). The essential impact of context on organizational behaviour. Academy of Management Review, 31, 386–408. [6] Pfautz, S. L., Ganberg, G., Fouse, A., Picciano, P., & Schurr, N. (In Press). A General Context-Aware Framework for Improved HumanSystem Interactions. AI Magazine. [7] Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (4-5), pp. 993-1022. [8] Dumais, S. (2005). Latent Semantic Analysis. Annual Review of Information Science and Technology 38: 188. [9] Hofmann, T. (1999). Probabilistic Latent Semantic Indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference. [10] Jameson, A. (2001). Modelling both the Context and the User. Personal and Ubiquitous Computing, 5(1), 29-33. [11] Atterer, R., Wnuk, M., & Schmidt, A. (2006, May). Knowing the user's every move: user activity tracking for website usability evaluation and implicit interaction. In Proceedings of the 15th international conference on World Wide Web (pp. 203-212). ACM. [12] Guo, Q., & Agichtein, E. (2008, July). Exploring mouse movements for inferring query intent. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 707-708). ACM. [13] B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):142151, 2000. [14] W. Lin, S. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6:83–105, 2002. [15] Sharp, A., & McDermott, P. (2009). Workflow modeling: tools for process improvement and applications development. Artech House. [16] Powers, S (2003). Practical RDF. O'Reilly Media, Inc.
4120
Ryan S. Mullins et al. / Procedia Manufacturing 3 (2015) 4115 – 4120
[17] Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., ... & Wylot, M. (2013). NoSQL databases for RDF: an empirical evaluation. In The Semantic Web–ISWC 2013 (pp. 310-325). Springer Berlin Heidelberg. [18] Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S. E., & Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal—The International Journal on Very Large Data Bases, 18(1), 255-276.