Knowledge Base Support for Decision Making Using Fusion ...

4 downloads 1145 Views 55KB Size Report
volumes of data, so we look to them for help in cutting through ... How can knowledge bases help? ..... ticket artillery in certain areas or against certain types of ...
Knowledge Base Support for Decision Making Using Fusion Techniques in a C2 Environment Amanda Vizedom, Ph.D. Cycorp, Inc. 3721 Executive Ctr Dr Austin, TX 78731 Voice: 512-342-4043 Fax: 512-342-4040 [email protected]

Raymond A. Liuzzi, Ph.D. AFRL/IFTD 525 Brooks Road Rome, New York 13441 Voice: 315-330-3577 Fax: 315-330-2563 [email protected]

Abstract The subject of this paper is the problem of information fusion, and the potential of knowledge base technology to play a role in solving it. To explain and illustrate this potential, the paper offers a basic explanation of a particular knowledge base technology (Cyc), and describes a particular project (CPOF), and the work done for it, in which that technology is put to work on the problem.

1. Information Explosion (or, be careful what you wish for): Decision environments are increasingly characterized by plentiful data; plentiful data, however, has not translated into plentiful knowledge. Instead, the lack of feasible ways to process vast quantities of diverse data forces decision makers to ignore much of it, without any reliable way of knowing whether they are ignoring, or attending to, the right part. Today’s decision-maker is caught in the middle because neither humans nor computers can effectively handle large amounts of sophisticated information. Complex problem domains, large search spaces, lots of data but little information all contribute to the problem. Traditional data base technology cannot handle this problem because it addresses the management of data and not the management of the information process. Good decisions require understanding. Humans are good at understanding, but are overwhelmed by the sheer volume and complexity of the data in many domains. Computers can handle volumes of data, so we look to them for help in cutting through this overload. But if a computer is to be able to cut through the data usefully, reducing volume and complexity without hiding

Mark Foresti AFRL/IFTD 525 Brooks Road Rome, New York 13441 Voice: 315-330-3577 Fax: 315-330-2563 [email protected]

critical information, it has to be able to understand the data and its potential significance. Data alone cannot provide this understanding. Rather, the understanding emerges when knowledge is applied to data. Against a background of domain and general knowledge, data gains its significance. This is why we must turn to emerging knowledge bases to effectively address the decision-making problems of tomorrow. Knowledge is the key in tomorrow’s C2 systems, and at the core of this solution is knowledge base technology. How can knowledge bases help? By supplying the background knowledge that gives data its meaning, and by providing a way to interpret, compare, and combine data based on an understanding of that meaning. Below we give a brief explanation of knowledge bases, and of how they can be so utilized. We then discuss the example that gives this paper its title: an application of Cyc knowledge base technology to information fusion issues in Command and Control decision environments, as part of DARPA's Command Post of the Future (CPOF) program. (9)

2. Knowledge Base Technology A knowledge base is a compilation of information about the world. That information is broken into small pieces and represented in some machine-friendly format, such as a logical axiom or an if-then rule. Knowledge representation languages and methods vary widely, but knowledge bases share this feature: they are optimized to handle general knowledge, rather than specific data. (Fig 1) The distinction between knowledge bases and data bases rests on this distinction between

general knowledge and specific data. A knowledge base (KB) is optimized for storage of general, potentially complex, knowledge -- the kind that can be instantiated. A Data Base (DB), on the other hand, generally does not have the means to represent these general principles, but is optimized to store very specific data, such as lists of items and attributes. Some examples: (a) A KB might contain info such as: the fact that elephants are mammals; general rules about mammals (that they bear live young, lactate, have hair, etc.) and exceptions to those rules; principles of genetics and inheritance, rules about circuses and their animals, definitions of weight capacities and other concepts and relationships involved in transportation. A DB might contain info such as a list of the 15 elephants owned by Barnum & Bailey, with each elephant's age, gender, and weight. (b) A KB might contain information about doctors in general, e.g.: doctors have degrees from medical schools; most often they set up practice near where they went to school; income depends in part on specialty. A DB might contain a list of information about individual doctors, such as medical school attended, specialty, state of practice, prescription records. (c) A KB might contain principles related to transportation, military activities, weapons and their use, along with rules about weapons ranges, the effects of weather on equipment and terrain, resource consumption, and resource acquisition. A DB might contain a list of all military units in a region or operation, and each unit's supply status table or current location. Knowledge representation languages may be What is a Knowledge Base? Definition: A set of representations of facts about the world. Each individual representation is called a rule or axiom. These rules/axioms are expressed in a knowledge representation language.

Declarative Knowledge Base

Data Base

K which n o w lthe edge Data, to knowledge Base is applied

Data

Military Units Probl

&Activities

emSpec ific

Geography

Com mon

Military

Military Vehicles

Organizations

& Movements

Weather

Terrain

Vehicles

Models of

Models of

Action &

Physical

SPACE

TIME

Causality

Objects

Domain Theories: Concept Definitions Laws Equations Relations Constraints

Procedural Code: • Step-by-Step Instructions • Optimized for

Basic Representation Concepts: Sets, Sequences, Arrays, Quantities, Measures, Probabilities

computational efficiency General Inference Procedures

Domain knowledge is explicitly stated in concept definitions, axioms and constraints

Domain knowledge is implicit in the programmer’s step-by-step procedural instructions

Figure 1

either declarative or procedural, and vary greatly in their expressiveness. The advantages of declarative knowledge representation are substantial, however, and it is unusual to find a procedural KB of any significant size or breadth. Such advantages include the independence of knowledge representation and mode of use, independence of knowledge representation and indexing, and the preservation of implicit knowledge in the KB itself, independently of code which uses it. These advantages, in turn, result in greater reusability of knowledge, and in greater coverage with less bulk. The degree of advantage is large enough to render procedural representation infeasible for any case in which the domain is large, or in which knowledge from multiple domains is needed. The special value of knowledge bases is that they provide the foundation for reasoning, in which new information is inferred from what is already known. This goes beyond lookup; reasoning with a knowledge base involves applying and combining general knowledge to reach conclusions implicit, but not explicitly contained in, the information given. That knowledgebased reasoning enables diagnosis, monitoring, and general query-answering at a depth not possible with a DB alone. Take, for instance, a relatively simple question such as: “Which air vehicles can move to (a specific) region R?" Assume that we have a DB and a KB whose domains are appropriate to answering this question. To answer such a question using the database, a procedure must be written specifically to determine the answer. The writer of the procedure, in turn, must know and make explicit which pieces of data should be combined, exactly how they should be combined, and where they are found relative to the indexed fields of the database; all of this must be written into the procedure. To answer the question using the knowledge base, the question can be put directly. A KB for the domain will contain precisely the concept definitions and rules that determine which information comes into play and what rules govern the relationships -- that is, which information should be used and how it should be combined. From this knowledge, the answer can be inferred. Furthermore, the answer can be extracted from a DB only if it was designed with such questions in mind; indexing limits the query possibilities.

DB designers must know just what kinds of questions will be asked, and must represent the data accordingly. For declarative KBs, the representation and indexing are independent, so the possible queries need not be restricted on the basis of indexing. Knowledge can be used to infer new knowledge in ways never anticipated by the KB architects . To accomplish this reasoning, a declarative knowledge base is used in combination with an inference engine. An inference engine consists of procedural code that reasons about the knowledge in a KB. Generally, inference engine code models some combination of general inference principles, rules of valid argument, and/or procedures for handling special classes of reasoning. Knowledge base development involves the identification and capture of relevant knowledge -- the concepts (types, relationships, properties) and axioms needed for reasoning in some domain. This identification and capture process is known as knowledge acquisition. A declarative, formal representation of concepts and relationships is known as an ontology. The process of acquiring, analyzing, and formally representing those concepts is known most often as ontological engineering, sometimes knowledge engineering. There are a variety of knowledge acquisition methods, and their comparative suitability depends significantly on both the domain and the application. Generally, Subject Matter Experts (SMEs) provide domain knowledge, and ontologists or knowledge engineers use a variety of methods to collect, analyze, and formally represent that knowledge. Currently, while clerks can handle entry of data into a database, this is not yet feasible with entry of knowledge into a knowledge base. Development of tools to allow entry of knowledge into a KB by people other than ontologists (e.g., by SMEs directly) is a priority in the field. (7)

3. How Knowledge Bases can Help with Information Fusion: Data + Knowledge = Understanding. In cases where useful information can be gotten from a variety of sources, those sources are usually not compatible. The information in each is represented in an idiosyncratic way, in terms

of both semantics and schema. Furthermore, each information source is itself largely opaque, in that the meaning is mostly in the eye of the beholder. Thus, the problem: how can a schema's meaning be represented, not just the names and formats, so that we can combine information across sources and get results that are correct and useful? Unless data models and representations are the same across all sources being fused, understanding is necessary for information fusion. Information sources must be integrated semantically, and not just syntactically. Knowledge base technology provides the missing piece for integration of information across such incompatibilities in meaning. Semantic integration is enabled by: (a) a representation language rich enough to handle all of the data types and values, as well as their subtle differences, equivalencies, and relationships, (b) an ontology containing the concepts and relationships that give the data meaning, and (c) flesh on those bones: constraints and general knowledge about all those concepts and relationships. The concept definitions, rules and relationships that give data meaning are precisely the kind of stuff that knowledge bases handle. By accessing and interpreting data according to the knowledge in a KB, we bring the meaning of the data out of the beholder's head, and into the computer. And we can do this for as many data sources as we like. Once the meaning of data in a particular data source is represented in a KB, the knowledge in that KB can be used to reason about the implications of that data, and to compare and combine that data with data from other sources so represented. The general knowledge enables reasoning that follows unanticipated links between new data points and data types. And it enables combining the data from multiple sources without the integration problem of explicitly relating each data field in each source to each data field in every other source.

the most general terms and concepts and their definitions, relationships and constraints. These include concepts like tangibility, being a set, localization in time, spatial relations, motion, belief, events, and actions.

Inference Engine HL Modules General Logical Deduction API External Data Source

C y c-powered application

Knowledge Base Assertions in CycL

Figure 2: Cyc Knowledge Base System

4. KB-Enabled Information Fusion to Support C2 Decision-Making: Cyc in the Command Post of the Future 4.1 Cyc Knowledge Base and Inference Engine Cyc® is a very large, multi-contextual knowledge base and inference engine (1,2,3,4,5). The Cyc knowledge base contains a vast, declaratively-represented body of knowledge about the world. The Cyc inference engine reasons over this knowledge, inferring new assertions and adding them to the KB, returning them as answers to a query, or both. (Figure 2) Cyc is being developed at Cycorp, a company that spun off from Microelectronics and Computer Technology Corporation (MCC) after being worked on there as an R&D project from 1984-1994. Cycorp's President, Douglas Lenat, is also the originator of the Cyc system. A number of features set Cyc apart from other knowledge base technologies. First, the scope of Cyc is many times larger than that of most KBs. Second, and partly because of the generality of Cyc's domain, the representation language used in Cyc is more expressive than most, and capable of representing more comple x knowledge and meta-knowledge. Third, the Cyc KB is multicontextual; its knowledge is clustered into smaller bundles, via a kind of context called a microtheory. As mentioned above, knowledge bases vary widely in their scope, and in the expressiveness of the representation languages they use. Some knowledge bases are restricted to fairly narrow domains; others are more general. The Cyc knowledge base is targeted at a very general, and ambitious, level: the broad range of "common sense" knowledge that makes adult humans competent to make their way around the world. At its heart is an upper ontology, consisting of

Knowledge representation in Cyc utilizes a very expressive (n th order logic) language. This language evolved in response to the limitations of less expressive approaches, such as frames and slots; these latter were not adequate to capture and reason with the range and complexity of concepts and rules that even the average toddler is able to understand and process. An early decision was made to give Cyc some way to cluster pieces of information that rest on more -or-less the same assumptions. Those common assumptions could then in effect be factored out; they would be a property of the cluster. However, within the cluster each piece of knowledge could be treated without repetitively stating those assumptions. Each of these clusters is called a microtheory. The Cyc Knowledge Base is divided into a few thousand microtheories. Each microtheory is comprised of information (assertions) that share common assumptions. Microtheories are inter-related by a hierarchy of inter-visibility and inheritance; a microtheory can be a sub-microtheory to many others. When there are no inheritance links between two microtheories, they are effectively isolated. Every assertion has at least one microtheory to which it belongs. Each query, as well as each assertion, is also made in a microtheory. The power of this contextualized, and multicontextual, representation is two-fold. First, the KB can contain representations of views or theories which are mutually inconsistent. Appropriate microtheory placement enables those divergent representations to coexist without causing contradictions in inference. Second, inference can utilize microtheory location to ignore irrelevant information. That ability is essential if inference over such a large knowledge base is to be efficient and produce relevant answers. The Cyc knowledge base can be divided into three levels: the Upper Ontology, the Intermediate theories of Knowledge, and the Task-specific knowledge. (Figure 3)

Fundamentals (sets, time, tangibility, events, . . . )

Upper Ontology Middle Ontology: Intermediate- Level General Knowledge

Lower Ontology: Task-Specific Knowledge

Intermediate knowledge applicable across domains (weather, terrain, vehicles, communication, . . . )

Domain -specific knowledge (computer networks, military force structure, business rules)

Figure 3: The Cyc Knowledge Base in Three Levels

The Upper Ontology, as described earlier, concerns fundamental concepts such as individuals, collections, set membership, tangibility, time, actions and events. The lowest level concerns task-specific knowledge, such as rules about the maximum rate of speed of an F15 fighter in relation to air temperature and pressure. The Middle Ontology is mid -way between those two extremes. Common sense knowledge, applicable across domains, is the heart of this level. Some middle -level examples: if something is being shipped, both the stuff being shipped and the vehicle by which it is shipped are at the destination location when the shipment is complete (they might then go somewhere else). If someone dies, they aren't available to accept assignments later (they don't buy stuff anymore, either). If someone you care about gets hurt, you feel bad. When it rains, exposed objects get wet. Many, if not most, of these rules have exceptions, and Cyc has to know the common exceptions as well. The middle level of knowledge is vast and complex. For an intuitive check on this, consider that it corresponds roughly to what a human has to know in order to be competent at navigating the world around them, to have enough common sense to handle novel situations. Now consider how many years it takes for a human to reach that point. Inference in Cyc is accomplished in two directions, Forward and Backward. Forward inference is done at assertion time. When a new

assertion is made, the system spins out some of consequences of the new information automatically. Backward inference is done at query time. When a query is asked, the system reasons by sub-goaling backward from the problem given, via the knowledge in the KB, to produce, if it can, a reliable answer. This reasoning is performed in part by logical deduction. Logical deduction is often painfully slow, so Cyc has hundreds of hand-crafted special-purpose reasoning modules (known as Heuristic-Level modules, or HL modules) that serve as efficient replacements for general deduction. HL modules are validity-preserving; their role is to serve as a shortcut for common types of reasoning. At each moment, Cyc calls on an efficient HL module if any of them applies; if none of them apply, it falls back on general theorem proving. Resource bounds can be set for a particular query or an application: time, number of answers, number of back-chains, number of steps. Cyc was used in the DARPA High Performance Knowledge Bases (HPKB) program (6). HPKB was concerned with developing innovative technologies supporting the construction of knowledge bases, ontologies, and associated libraries of problem-solving strategies. In HPKB, Cyc was evaluated with respect to how it interpreted international events, predicted possible crisis situation evolutions and reactions, and provided crisis indications and warnings. It may be applied for improved training of intelligence analysts, and/or providing a corporate memory about past international crises. Cyc is currently being used in the HPKB followon, Rapid Knowledge Formation (RKF) (7) whose mission is to enable domain experts to build up knowledge bases of their expert knowledge themselves, without help from knowledge engineers, using dialogue, analogy, and similar methods. Cyc is also being used in a variety of government projects, including the Information Assurance, DARPA Agent Markup Language (DAML) (8), and others. Commercially, Cyc applications exist or are underway in a variety of domains, including computer network vulnerability assessment, image retrieval, intelligent question-answering, and Semantic Knowledge Source Integration (SKSI).

4.2 Bringing Knowledge and Data Together Cycorp employs a staff of 40 or so ontological engineers, whose job it is to develop and add to the KB. There were many years of trial and error involved in the evolution of the Cyc team’s current methods of adding assertions, training their staff to add assertions, and figuring out what sorts of people to recruit and how to test them for aptitude at ontological engineering. As mentioned above, regarding KB development in general, ontological engineering is a highly specialized skill, and it is not yet feasible for non-ontologists to add knowledge to the Cyc KB. This puts a limit on speed of KB development, and one of Cycorp's top priorities is development of tools, within the current DARPA Rapid Knowledge Formation Program (7), via which a wider variety of people can add knowledge to Cyc, with help from Cyc itself. However, a useful development arose from this premium on ontologists' time. For most applications, one wants to use a KB not by itself, but by applying it to some specific data. For a number of projects, Cycorp faced the fact that it was not feasible to add vast quantities of data by hand, nor was it desirable to copy data into Cyc if it already existed in a structured data source. The question was: how were the data and the knowledge to be combined, so that the knowledge could be used to interpret and reason about the data? In response to this challenge, the Cyc team developed two ways of using data from structured sources. In the first approach, the data remains external, and its schema is represented in Cyc. An inference engine module uses that knowledge of the schema to recognize when some reasoning requires a piece of information that is available from the external data source. When the module recognizes such a match, it formulates and sends the appropriate query, gets the response, and passes it back to the inference engine. The schema representation allows Cyc to understand what the resulting data mean, and the reasoning proceeds just as if the information had been looked up internally. This method has the benefit that the data can continue to be maintained and updated by an external party, and Cyc can use the updated information as it appears. In the second approach, the schema is also represented in Cyc. But in this case, instead of

real-time queries to the external source, the knowledge is "slurped" into the Cyc KB. This method has also been used on semi -structured information. (1) For example, the CIA World Fact Book is written in English, but each country’s brief is divided up into the same categories, each of which has almost identicallystructured sentences that are therefore relatively easy to parse and understand. The parsed forms are converted into full-fledged expressions in the predicate calculus - based language that Cyc uses. They are then placed into the proper microtheories, and ready to be used in inference.

4.3 Command Post of the Future (CPOF) The Command Post of the Future (CPOF) program is a technology-based program, which will create an adaptive, decision-centered, visualization environment for the future commander. The commander’s job, in the future as in the past, will be to make decisions and monitor their execution in the midst of great uncertainty. In the future, the commander's success will depend on using information dominance to increase the speed and precision of those decisions. At the same time, survival will depend on being small and mobile. Large command complexes will not survive in the highly lethal, future battlefield. The commander’s portal into this information environment will need to be easily operated by a small, distributed staff. The goal of CPOF is to shorten the commanders decision cycle to stay ahead of his adversary’s ability to react. To achieve this operational goal, the technical objective is to develop the technology necessary to create an adaptive, decision-centered, information visualization environment for the future commander and his immediate staff. Since the primary focus of CPOF is the creation of a new level of visualization and human-systems interaction technology, the development approach emphasizes the creation and continual testing of candidate technologies with prospective users. The key technology capabilities to be developed in support of a CPOF are the following: (1) an integrated visualization environment where the commander and his staff can view immediately understandable presentations of the changing battlefield situation, presentations which are tailored to the

(2)

(3)

(4)

(5)

situation and the command decisions of interest; a powerful and comprehensive humansystems interaction and human-to-human collaboration capability (through speech and gesture understanding, language understanding, and smart-room technology) to enable the commander and his staff to rapidly explore the information environment, without requiring dozens of staff members to operate and integrate multiple information systems; a command post dialog manager which will automatically track current command post activities to enable the automatic generation of tailored presentations of relevant information that suit the changing command post staff members, decisions, and topics of interest; an integrated suite of knowledge bases, intelligent agents, plan sentinels, information management assistants which would automate many of the lower level staff functions and automatically invoke and operate supporting situation awareness, planning and analysis applications; and a modular, portable suite of hardware and software components that encourages small physical dimensions for solution sets, and can be quickly configured and tailored to various command environments (stationary and mobile), at different echelons of command.

Looking into the future, program leaders do not want to develop something in the laboratory that cannot be utilized in the field, or be effective in the field. To avoid that, subject matter experts (SME) including retired and senior military officers have been brought into the program to work with the labs in developing new doctrine and conducting experiments to verify which technologies will be effective. It is the focus of the CPOF program to justify every technical decis ion with experimental results. Any new concept developed must maintain the existing command post functions and allow the commander to interface with the system in a natural way. In line with government mandates to keep everything as simple as possible, to avoid military specifications, and to use commercialoff-the-shelf (COTS) components to the greatest degree possible, the current CPOF program is not looking to develop all of its own

technologies. The CPOF program will not attempt to develop all of the needed technologies but will focus primarily on the visualization, human-computer interaction and knowledgebased information integration needed to realize the envisioned system. However, the project description points out that the envisioned CPOF would require the development of a wide range of technologies that do not exist today, such as high bandwidth wireless communications, a comprehensive distributed data management and distribution system, integrated battlefield situation models and databases, a wide array of planning and decision aids, and survivable command vehicles. Some of the technologies can be utilized from other projects, which are addressing some of the deficient technology areas, and other concepts will be developed to support the CPOF objectives.

4.4 Cyc in the CPOF: Supporting C2 Decisions by Fusing Information using Knowledge-Base Technology A significant piece of the challenge described above lies in the problem of data-richness and comparative information-poverty. Contemporary and future command posts feature many sources of data, far too many to be continuously monitored by human command staff. But data from any of these sources could be crucial at any time. Furthermore, simplistic data monitoring techniques won't help. It is rarely the case that important data points can be identified within a single source. Whether the data from one source are important at any given time depends highly on what else is going on. Simply alerting when a significant value or single-source combination occurs will either miss most of the important cases (when the significance of the data is missed, without the context of data from other sources), or produce alerts too frequently (when any value that could conceivably be important is flagged as important), or both. In other words, it is the combination of data from the multiple sources that yields important information. For example, consider a command post for a brigade-sized Task Force. Here are some data sources that might be involved: (a) Data Source 1: Feed of electronic field reports from 1st Battalion. A number of

sources send reports to this feed; reports are tagged with identity of originator. A portion are humanly generated reports on enemy activities, friendly situation, battle damage assessments. Another portion are automated reports from generated by the battalion's unmanned ground sensors. (b) Data Source 2: Feed of electronic field reports from the Task Force's Host Nation battalion (equipment, organization, capabilities and doctrine different from that of 1st Bn). (c) Data Source 3: Feed of electronic field reports from the Battalion's Future Combat Systems Company (or other elite/advanced unit, employing state of the art sensor and weapons, with capabilities different from that of 1st Bn). (d) Data Source 4: Feed of automated periodic GPS reports from all Task Force elements (or perhaps all but HN Bn). (e) Data Source 5: JSTARS feed. (f) Data Source 6: GIS data link: digital terrain database for region, maintained and updated by third party. (g) Data Source 7: Weather service feed. Forecasts and conditions updates for region. (h) Data Source 8: Intel feed from Division. (i) Data Source 9: Battalion-wide logistics data base. (j) Data Source 10: Battalion-wide electronic orders This is a simplified problem; the command environment is characterized by even more information complexity than this. There are many data fusion problems and opportunities here. Just a few of them: (i) Data sources 1 and 2, e.g., likely feature some fields and values that are syntactically the same, but mean something different depending on the respective force doctrines. This is also likely in Joint environments, where different branches of service may have different doctrinal meanings. (ii) The implications of data values depend on the capabilities of the data source and the

situation, as well as doctrine. For example, consider that "moving" means something different in a report from JSTARS, which is primarily capable of detecting objects of a certain size moving over a certain speed, than it does in a report from a HUMINT source trained to define "moving" as relocation of a unit, distinct from patrolling or other kinds of motion. Also consider that a report of a company-sized enemy unit should be treated with slightly different implications and confidence if the source has a clear view behind and around that enemy unit (where accompanying units would likely be) than if the source has an obstructed view (such that the rest of a battalion could easily be behind that company without the source's knowledge). (iii) Data source 8 involves different concepts than do sources 1-3. The former is likely more concerned with purposes, methods, and capabilities; the latter are likely more concerned with current activities. The two can only be put together in light of knowledge about the relationships (dependencies and implications in both directions) between current activities and purposes, methods, and capabilities. (iv) Logistics planning could be greatly improved if data source 9 could be updated based on information from sources 1-4. It could be improved even more if the logistical implications of activities specified in the orders (source 10) were understood, and could be compared to the supply status in source 9, and the terrain and weather information in sources 6 and 7 such that potential logistical problems were noticed ahead of time. (v) Reports about enemy units, locations and activities may not seem important on their own, but together may demonstrate a pattern. That pattern in turn may signal enemy intent. Discovering the pattern, however, may require bringing together spot reports from multiple human and automated sources, terrain data, and intelligence reports, all against a background of knowledge about military activities, capabilities, equipment, and forces. Each of these challenges is a natural fit for a knowledge-based solution. In each case, background knowledge can enable meaningful interpretation of the available data. A knowledge base application can reason over this

data, looking for significant implications and possible problems. This is the framework of the application of Cyc for the CPOF. Cycorp is developing an application that uses general and military background knowledge to understand and fuse battlefield data. The specific focus of the work under CPOF is problem (v), the last of the points above: to get information about fundamental patterns, then interpret those patterns to suggest possible enemy intent. The first half of this task is to pick out the interesting patterns amidst the data chaos. Patterns of interest are based on knowledge of battlefield and general fundamentals -- from an understanding of equipment capabilities, effects, and principles of use, to an understanding of terrain, spatial relations, and weather effects. Cyc will be reasoning over the combined data, looking for intent-signaling qualitative patterns. Significant patterns discovered by Cyc could include such things as : massing of artillery firepower on a location of high tactical value; positioning of artillery forward, relative to own units or main battle activity; increase in logistics activity near areas which are good candidates for landing zones; a qualitative change in what recon is trying to get a look at; intense use of highticket artillery in certain areas or against certain types of targets; or significant supply movement into currently unoccupied area. The second half of the task is to predict enemy operations based on these patterns. Here, knowledge of equipment and unit capabilities, enemy doctrine, trafficability principles, and desirability of various types of objectives (among other things) are applied, along with individual weather, terrain, and other data points as relevant, to reason about the possible significance of the observed data. For CPOF, given limited time and resources to identify and represent the relevant tactical knowledge, Cyc’s prediction will focus on Air Assault operations. (Note: there will be some scattered coverage in other areas and some understanding of general tactical significance will fall out naturally from knowledge already present in the KB). What could they be about to do, given these patterns? What are they likely to do, given these patterns? In both sub-tasks, the background knowledge is essential to the reasoning. Very few indicators are based on single data points. Very few are

based on sequences or combinations of data in a particular field or from a single source. Neither the patterns nor the significance could be inferred, except in the most simplistic cases, without application of that knowledge. The domain and the data are just too complex to be handled by approaches such as prerepresentation of all possibly significant data combinations. Novel combinations are always occurring. The knowledge in the KB makes it possible to understand the meaning of the data, and apply that understanding in combining the data (e.g., positions of units from field reports, and terrain data from GIS, and weather data, and intelligence and field reports on unit equipment and capabilities), inferring the important patterns (e.g., they are massing high-ticket artillery capability on air defenses along the northern air avenue of approach), and inferring the possible significance of those patterns (an air assault along that avenue may be in the works). The potential benefits to the commander are significant. Human attention to every one of the incoming data points is not feasible, and the patterns frequently involve putting together too many pieces out of the numerous streams. Some portions of the reasoning are computational, and thus unlikely to be performed well by a human; some are qualitative and nuanced, and thus out of reach for the technology which rests on databases alone. The background knowledge makes all the difference; if the meaning of an incoming data point is understood, it can be combined meaningfully with other data. The knowledge base technology enables this understanding, and so enables the combination of new data points with data from other sources, and with data known to be available elsewhere and looked up automatically. The knowledge base also enables reasoning about that information and what it does , or doesn't imply, given the specifics of the situation. The commander, or other user, can be given summaries of the data according to these qualitative characteristics and implications -i.e., in a way that is intuitive and meaningful. This CPOF development path, recall, addresses just one of the information fusion challenges mentioned above. In turn, those challenges listed are just a few of the many to be found in the C2 environment. Each one of those challenges, and of many more not listed, is a natural fit for knowledge base technology.

It is the knowledge that gives the data from each source its meaning, and the concepts and relationships that govern the implications when information from multiple sources is combined. Knowledge bases bring these concepts and relationships out of the user's head and into the realm of the computer. The current problem lies in the gap between machine handling of speedy computation and large quantities of data, on the one hand, and human understanding of the meaning of data, interpreted according to its source, and the potential implications of that data, on the other. Information fusion requires both types of ability: data handling and understanding. Knowledge base technology bridges that gap. References: (1) Cycorp’s home page (http://www.cyc.com) (2) Knowledge Source Documentation by Eva Pierce (3) Artificial Intelligence (third Addition) by Patrick Henry Winston (4) Lenat, D. (1995). "Cyc: A Large-Scale Investment in Knowledge Infrastructure." Communications of the ACM 38, no. 11 (November). (5) Robert C. Kahlert and Daniel Mahler "Semantic Knowledge Source Integration," Cycorp, Inc. (6) Lin, A., and Starr, B., (1998). "HIKE (HPKB Integrated Knowledge Environment) - An Integrated Knowledge Environment for HPKB (High Performance Knowledge Bases)". In Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop (KDEX'98 (7) RKF, (2001) Rapid Knowledge Formation project, http://projects.teknowledge.com/RKF (8) DAML (2001). DARPA Agent Markup Language, http://wwww.daml.org/ (9) CPOF reference, DARPA BAA 98-7, 1998

Suggest Documents