Automatic Modeling and Usage of Contextualized ...

4 downloads 0 Views 469KB Size Report
behavior. The modeling through observation further supports creation of models with difficult or unwanted behavior (e.g. drunk driving, insurgent behavior, etc.).
Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

Automatic Modeling and Usage of Contextualized Human Behavior Models HANS FERNLUND and AVELINO GONZALEZ School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816-2450 USA Abstract: - Modeling human behavior is a complex task, as it might include unpredictability, sub-conscious knowledge and intuition. This paper presents some initial results from a novel algorithm that creates human behavior models automatically by observing humans performing. Furthermore, the paper uses conclusions from the No Free Lunch Theorems to signify the scalability of the modeling algorithm. These implications from the No Free Lunch Theorems, is universal and might be applicable to many related areas of modeling. Key-Words: - Human Behavior Modeling, Context-based Reasoning, Genetic Programming, No Free Lunch Theorem

1 Introduction Striving to make simulation models as realistic as possible, many different approaches has been suggested when it comes to human behavior modeling (e.g. COGNET, Act-R, SOAR, etc.). However, human behavior is not deterministic and often far from optimal. Human behavior in traffic is one area where the behavior is far from optimal. If humans would perform optimally in traffic, there would not be any accidents and congestions would be rare. There is support in the literature that many human behavior models suffer from too doctrine-like behavior and are not very realistic [1], [10], [16], [17]. Sometimes, the models reflect how things theoretically should be conducted but not how they are performed by humans in reality [3]. In certain fields, human behavior gained by experience might perform better than if they followed known rules, regulations and doctrines. Soldiers that experienced several real combats seem to have higher success rate in their missions than newcomers. This might be related to the richness in the data from real environments that never can be fully expressed in doctrines or manuals. We have developed an algorithm that will learn human behavior and model simulated entities by observing performing humans. We combine a human behavior modeling paradigm (Context-based Reasoning) with a machine learning algorithm (Genetic Programming) to a new method called Genetic Context Learning (GenCL). By using a machine learning algorithm, the mechanisms exist to capture the features of human behavior that otherwise can be hard to find (i.e. experienced experts). Unpredictable and somewhat surprising behavior has been captured in previous work [8]. The method is semi-automatic (some manual preprocessing of data is necessary) and supports fast, just-in-time modeling of desired models. Modeling data

have been captured both in simulated environments and in real environments and models have successfully been evolved. The semi-automatic modeling reduces the analytical and design process of modeling human behavior. The modeling through observation further supports creation of models with difficult or unwanted behavior (e.g. drunk driving, insurgent behavior, etc.). This paper presents results established in different application areas with this new modeling technique and uses some proofs from the No Free Lunch Theorem that supports the scalability of the new method. We start off with a brief description of Context-based Reasoning (CxBR) and Genetic Programming (GP) and how these two techniques are combined.

2 Context-based Reasoning (CxBR) Gonzalez and Ahlers [9] presented CxBR as a modeling paradigm that can efficiently represent the tactical behavior of humans in intelligent simulated agents. Results have shown that it is especially well-suited to modeling such behavior. CxBR is based on the idea that: − A situation calls for a set of actions and procedures that properly address the current situation. − As an exercise plays out, a transition to another set of actions and procedures may be periodically required to address a new situation. − Things likely to happen under the current situation are limited by the current situation itself. CxBR encapsulates knowledge about appropriate actions and/or procedures for specific situations, as well as compatible new situations, into hierarchically-organized contexts. All the behavioral knowledge is stored in the Context Base (i.e., the collection of all contexts). The top layer of contexts in the hierarchy contains the Mission Context. At the next layer are Major Contexts and below those, a number of Sub-Contexts layers can exist.

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

Figure 1 shows an example of a context structure from a simple context base that models contextual components of tank platoon behavior. Mission Contexts define the mission to be undertaken by the agent. While it does not control the agent per se, the Mission Context defines the scope of the mission, its goals, the plan, and the constraints imposed (time constraints, weather, etc.). Assault and Destroy

Road March

Bounding Overwatch

Assault

Mission Context

Major Contexts

Flank Attack

Sub-Contexts

Figure 1: Context-base organization

The Major Context is the primary control element for the agent. It contains functions, rules and a list of compatible Major Contexts that can follow it. Identification of a new situation can now be simplified because only a limited number of all situations are possible under the currently active context. Sub-Contexts are abstractions of functions performed by Major Contexts which may be too complex for one function, or that may be employed by other Major Contexts. This encourages re-usability. Sub-Contexts will de-activate themselves upon completion of their actions. One and only one Major Context is always active for each agent, making it the sole controller of the agent. When the situation changes, a transition to another Major Context may be required to properly address the emerging situation. For example, a tank platoon may make contact with an inferior force that requires a transition from a Road March to an Assault Major Context. Transitions between contexts are typically triggered by events in the environment – some planned others unplanned. Events internal to the agent (e.g., mechanical breakdown) can also trigger transitions. Note that the context structure is derived from the different context a human could reside in, within the current problem domain. They are not related to doctrines, directions or operating manual descriptions or

Initialization

Evaluation Generation i+1

Figure 2: Evolution in GP

Selection

Update population

Reproduction Generation i

procedures. However, these descriptions or procedures might assist during definition of human behavior context structures. CxBR is a very intuitive, efficient and effective representation technique for human behavior. A full description of CxBR can be found in [9]. In this research, the knowledge in the contexts (i.e., the specific action of the agent within a context) and the knowledge to identify the situation and apply the correct knowledge are evolved by a machine learning algorithm (GP). Before learning takes place, an empty set of contexts is created. The knowledge is created by the GP machine learning algorithm that observes expertise performing. By using this approach, the aim is to create a rich and realistic model of expert behavior.

3 Genetic Programming (GP) GP [16] is derived from Genetic Algorithms. Both are stochastic search algorithms. The search process in GP looks for the best suitable program that will solve the problem at hand. The target system for the GP could be a CPU, a compiler, a simulation or anything else that could execute the pre-defined instructions, from now on referred to as a program. GP evolves source code representing a program that can address a specific problem. This makes it very suitable for use together with CxBR. GP can build complete software programs that support the internal construction of the CxBR (i.e. the context-base). Evolving a program with GP is described in Figure 2. To make GP work, some basic requirements must be satisfied. Initially, we need to have a set of individuals (i.e. programs that represent different solutions to the problem). The function set in GP defines all different functions and operators that each individual can exist of. The function set and additional constants and variables need to be defined prior to learning. This will affect the learning performance of the GP algorithm. Then, all the individuals need to be evaluated in some manner as to what degree they are able to solve the problem. The individuals with better suitability would preferably be preserved and survive, or breed new individuals to the next generation (i.e. selection). The next GP step would be to evolve the individuals (i.e. reproduction) in some manner to preserve the “good” features and develop even better individuals. The most common genetic operators are crossover and Until end condition is reached mutation. They support the development and evolution of the individuals. The criteria for stopping the evolutionary process can be a maximum number of evaluations made, a maximum number of

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

generations evolved, the or other measurable evolutionary process is computer program that certain degree.

fitness reaching a certain level, criteria given. When the finished in GP, there exists a will solve the problem to a

4 Genetic Context Learning

data will serve as the fitness measure base that determines the appropriateness of the individuals being evolved. In figure 3 a diagram shows the components and their interactions in GenCL. The source code individual in the GP module represents parts of the context base. It is fed into a CxBR simulator that executes it to produce the performance results of that individual. The behavior resulting from the CxBR simulator is then compared with the observed human performance (the fitness function) and a fitness measure is computed. Depending on the current learning task, an individual can describe either the action within a specific context or the set of rules that determine context activation (i.e. situational awareness). Accordingly, the population either contains individuals competing to represent the action within the context, or individuals competing to represent the context activation process. If the evolving context is not at the lowest context level, the individual might contain action knowledge from contexts at the lower level as part of its function set.

Combining CxBR with GP turns out to be quite synergistic. GP adds a machine learning algorithm to CxBR that automates the process of creating human behavioral agents. This could reduce the cost and effort and also detect behavioral patterns of the expert that could be difficult to detect by traditional means. Before learning takes place, an empty context structure is manually defined. The context structure here refers to a definition of the contexts and sub-contexts that will be used by the agent in executing its required task. The context structure is highly mission-specific, and is designed by the engineer charged with developing the agent, possibly with assistance from a subject matter expert. Note that this context structure is highly coarse and could easily be defined by someone with basic knowledge about the domain or from doctrines. This initial context structure is empty and does not contain any knowledge. However, such a predefined structure will prune the search space that will enhance the learning capabilities of GP [11]. Instead of creating the context knowledge by hand, we use the GP process to build the knowledge within the contexts and the situational awareness knowledge that activates the correct context for the current situation. The GP’s evolutionary process provides the CxBR frame with appropriate knowledge. This new approach to model tactical agents is called Genetic Context Learning (GenCL). The individuals in the genetic population are components of the context base and a simulator is used to simulate an individual’s behavior (see Figure 3). The behavior from the simulator is then compared with the observed expert performance, and a fitness measure is established to evaluate the model’s appropriateness. The evolutionary process will strive to minimize the discrepancies between the performances of the contexts created by GP and the expert’s performance.

The GP process automatically builds the action knowledge within the contexts and the knowledge to apply the right context in a specific situation, thereby providing the CxBR context base with appropriate knowledge. The evolutionary process strives to minimize the discrepancies between the performances of the agents being evolved by the GP and the observed human performance.

4.1 Learning in GenCL

5 Experiments and Results

The learning process begins by observing and collecting data from a human performer carrying out the mission/task of interest. A means to automatically collect data from the observed object is required. The observed and logged data are then manually parsed according to what data sequences apply to which predefined context. This recorded human performance

Figure 3: Genetic Context Learning

Two areas of human behavioral modeling have been investigated (car driving in city traffic [8] and tank behavior in the battlefield [6]) with two different techniques of collecting the performance data. The data in the automatic modeling of car driving behavior was captured with help of a driving simulator. Collecting data from a simulator is rather simple because the environment and events can be captured from the

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

simulated environment. There is no need to deal with sensors or related noise present in the real world. The data in behavior modeling of tanks in the battlefield was collected from live exercises where two opponent tank platoons made unanticipated contact. The data collection was made available because the exercise was equipped with a Deployable Instrumentation Training Systems (DITS) that records all the soldiers’ and vehicles' movements and actions during the exercise. In DITS, each soldier and vehicle is equipped with a Global Positioning System (GPS) receiver and all firing events are simulated with laser simulations. All this information and much of the status information are transmitted to a server where all the data is stored for further analysis (e.g. after action review). A small collection of this data was made available for the learning algorithm; such as position, speed, heading, turret heading, player status, use of Laser Range Finder, Fire and Hit results from fire simulation. Further data used comes from a terrain classification (e.g. forest, open field, water, etc.) of the environment database. The first experiments with GenCL were conducted in a commercial driving simulator and the approach was to model human behavior automatically through observation. The objective was to capture individual behavior patterns among the drivers and to focus on human behavioral features. Unpredictability and inconsistency are typical human behavior features. We found that if we let the traffic lights in a city traffic environment change from green to yellow (to become red) at an instance when the driver was 30 meters from the light, a decision must be made whether to stop the car or to continue. This seemed to trigger the desired behavior - four of the five drivers occasionally stopped and occasionally run the light. The fifth driver always stopped at all lights. Validation data was collected with the same five drivers, four months later. Now they drove a different route and the traffic lights in the city environment were changing at different distances (30, 35 and 40 meters to the light). The agents were now validated in new, unseen scenarios where the lights did not change in the same manner as during training. The results of this research showed that the synergetic combination of the CxBR modeling algorithm and GP was able to learn from observation, generalize to new situations, create stable agents and the automatically created agents performance was not deteriorating when compared to the manually created ones [8]. The second area of human behavioral modeling concerned live exercise data from battling tank platoons [5], [6]. As we moved from a well-defined environment without any major noise (i.e. a simulated environment) to a real environment where all the data is collected from different sensors, more preprocessing of the data became required. One example is when the unit carrying a GPS

stands still, the GPS can not provide an accurate heading (due to limitations of the GPS). Furthermore, the amount of data that influences the performance also increases in a real environment. A real environment also contains a fair amount of noise. Lastly, the degree of freedom of movement for a tank in an open environment is much larger than a car in city traffic, where the latter is limited to a lane on the road. All these factors make the automatic learning problem for this application much more complex. Data was collected from five different occasions when two tank platoons were engaged in an unanticipated contact (as unanticipated as it can get with a repeated exercise) at two different locations. Investigating the scenarios recorded, the three Major Contexts Road March, Contact, and Hasty Defense, could be identified. Additionally, three Sub-Contexts to the Contact context were observed; Attack, Bounding Overwatch and Flank Assault. The initial results presented in [6] covers the Major Contexts Road March and Contact with the Sub-Context Bounding Overwatch. The GenCL learning approach learned the behavior of the tactical agent by observing a collection of data from these contexts. The initial results show that the approach with CxBR and GP was able to learn and generalize the behavior of the tanks in this real and highly complex environment. Note that the behavioral actions of the agents in the two experiments mentioned here were solely built by GP. No initial or supplementary knowledge was provided by humans. The only human interference with the learning process was the manually preprocessing of the data samples and the definition of the context hierarchy.

6 Using the Models In the application where the model represented a tank crew, the purpose was to develop an automatic After Action Review System. The idea was to model an expert tank crew with GenCL and then pair it with a trainee group and compare its behavior in the situations of the trainee. Each participant in a large exercise can then get a tool for self evaluation where an expert agent is put into the same situations as the trainee and their discrepancies can be monitored. Two major advances were made during this research: 1) Detecting contextual discrepancies and 2) Agent Synchronization. A discrepancy can be a simple momentary deviation as the position, movement, or firing action of the trainee is significantly different from the agent’s. Another type of discrepancy occurs when the context of the human trainee is different from that of the agent. The first is rather easy to determine by merely overlaying the locations and actions of the trainee onto

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

that of the expert agent. The second type of discrepancy is the more significant of the two but also more difficult to discover. To make a useful comparison to a contextbased model, the AAR system must infer the context in which the trainee is currently operating. The context in which the expert model is residing is given by the Context-based model. Inferring a trainee’s intentions and the set of skills being used at the time of the comparison can provide a very useful means of reviewing his performance. The problem, of course, is how to infer the context in which the human is operating. The approach we use in this research uses a pattern matching technique that compares the trainee’s action with all possible context actions by the expert agent. The comparison that results in the closest match will indicate the context in which the trainee is most likely to be operating. This matching of patterns can be said to infer the context and/or sub-context in which the trainee is operating.

6.1 Detecting Deviating Intents The basic idea is to insert context agents into the simulation. These agents operate in the background and behave as if they only have one Major Context. This makes the context agent unable to transition among various Contexts. One context agent is assigned to each possible Major Context. The objective is to compare each of their behavior with the trainee’s, thereby inferring which major Context the trainee is likely to be operating under. The difference between a context agent and the expert agent is that the former only exhibit the behavior of one specific Major Context at all times while the expert agent exhibits a behavior that makes use of all these contexts as they are used by the CxBR mechanism. If the inferred trainee context disagrees with the expert agent context, then a potential contextual discrepancy exists (i.e., deviating intents). The complete expert agent is still used for detecting physical discrepancies between the trainee and the expert [5]. To infer the context of the trainee the discrepancies between the trainee and each of the context agents can now be compared. The contextual discrepancies are temporal in nature, so in order to detect a contextual discrepancy the expert agent and the trainee need to be monitored over a period of time. Such a contextual discrepancy will infer that their intent appears to be different. Eventually, when the deviation is classified as a context discrepancy, the algorithm needs to back-track the events in order to find the start of the deviations.

6.2 Agent Synchronization The other important result derived from the automatic After Action Review System concerns agent synchronization. When comparing the trainee with the

expert agent, the interesting part is to know what the expert would have done in the same situation that the trainee experience. This means that the expert agent needs to be superimposed periodically on the trainee and their action compared. Hence, the agent cannot be totally autonomous. As the recorded data is replayed in the simulation environment, all of the actions displayed, except for the agent’s actions, come from recorded data from the real exercise. This means that the expert agent cannot affect the outcome of the simulation in any manner. If the agent is left autonomously in the simulation for an extensive period of time (i.e., no synchronization is performed), the action of the agent might not be valid. If the expert agent is exposed to the enemy or open fire in the simulation, the enemy would not see it, and therefore not react to it. This is because in the simulation, the enemy’s actions are recorded and replayed. Even if the agent is not interacting with the enemy or his own troops, the agent and the trainee might continue their missions on completely divergent paths. The accumulated deviations will then be large enough to trigger a discrepancy of the trainee. Such a discrepancy, by definition reflects a potentially serious mistake by the trainee that in this case might be a result of accumulated errors and not an obvious misbehavior. Hence, the only sound use of the agent is to superimpose it on the trainee and compare their action for a short period of time in order to determine if the trainee’s actions are acceptable. When this occurs and the agent is forced into the situation of the trainee and we say that the agent is synchronized with the trainee. During the synchronization, the agent needs to be forced to regain the same state as the trainee, both when it comes to location, time and status, but also forced to operate in the same context as the trainee. During synchronization, the agent also needs to update its temporal memory. The agent is continuously synchronized with the trainee in a predetermined time interval. If the discrepancies between the agent and the trainee are minor during the time in-between the synchronizations, the actions of the trainee are deemed to be acceptable. In this manner we do not compare the trainee’s behavior with one optimal (or sub-optimal) action pattern of the agent. In any situation, there are probably a number of correct ways to handle the situation. Here we determine whether the action of the trainee is close to the action the agent might have taken. In other words, the comparison strives to see if the action of the trainee is an acceptable and not to compare it with one “correct” behavior.

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

7 Scalability of GenCL One might ask how good the GenCL algorithm will model more complex human behavior. We believe that GenCL is scalable and some support of this can be found in the No Free Lunch Theorem (NFLT). The original NFLT [18] states that if one search algorithm is applied to all possible problems, the average performance is equal to a random search. Conversely, if one problem is applied to all possible algorithms, the average performance is also equal to random search. NFLT was later proved to hold for optimization problems [19] and to supervised learning algorithms [21]. The implication and importance of these theorems are that if a learning algorithm should be successful, it has to be optimized for the problem at hand. While striving to develop black box learning algorithms that perform better than others, several researchers have shown instances where the NFLT does not hold. Igel and Toussaint conclude that if only a small subset of all possible problems is considered, the NFLT does not hold [13]. Droste et al. show that if restrictions are enforced on the complexity of optimizer functions, one can construct a black box optimizer that outperforms random search [4]. Christensen and Oppacher show that data analysis and “general and widely applicable set of conditions” will help to create an algorithm able to surpass a random search [2]. However, Wolpert shows in [20] that even a small constraint or information that might seem trivial regarding the problem domain, actually adds a substantially amount of information to the learning process. What they all actually are addressing is the implication of the NFLT that in order to improve the performance for a machine learning algorithm, domain knowledge needs to be incorporated into the learning procedure [11], [19]. In other words, the proper way is to analyze the problem and then design an appropriate algorithm optimized for that specific problem. In essence, the NFLT treats small change in the configuration of a machine learning algorithm as totally new algorithm. As an example, different Artificial Neural Networks (ANNs) with a different number of hidden nodes are not regarded as the same algorithm. By changing the topology or parameter settings, a new learning algorithm is constructed. In essence, this is how one might adjust the algorithm to different problems and hence, incorporate domain knowledge in the learning process. The importance of the domain knowledge implication is that if we are using the algorithm in a new domain, the probability of success is grater if the domain knowledge easily can be incorporated into the learning procedure. Similarly, if a learning algorithm should be scalable and able to handle more and more complex tasks, it is an advantage if the modeling paradigm supports knowledge

representation at different complexity levels. If such support exists it will also support the learning algorithm to use correct methods in searching for solutions at the specific task at hand. In other words, it will be easier to adjust the learning algorithm with proper tools and restrictions for the current complexity level. Yet again, we like to emphasize that it is small restrictions and constraints that gives major advantages to the learning algorithm [20]. These small restrictions and constraints reduce the search space significantly for the learning algorithm and make it manageable for the algorithm. Additionally, as the search space gets smaller the number of unacceptable sub-optima decreases. It is of course important the “right” restrictions are made and it boils down to the support given by the algorithm to incorporate prior domain knowledge. Two features of GenCL learning support improving learning capabilities through domain knowledge: 1) the non-transforming property of knowledge and 2) the partitioning of knowledge into contexts. The first property provides the algorithm with the appropriate tools and low-level restrictions. The later provide a hierarchical knowledge structure where more complex tasks might reuse simpler ones as a mean to solve parts of the complex task. Furthermore, it gives the algorithm the ability to choose the appropriate knowledge to apply in different situations.

7.1 Non-transforming property of GP Some learning algorithms, like Artificial Neural Networks, perform a transformation of the knowledge as learning takes place. As an ANN learns, appropriate weights are updated to better transform the inputs to proper outputs. When learning is over, the ANN knowledge stored is a set of weights that is difficult to correlate to the confronted problem. In order to optimize the learning properties for the problem, the ANN needs to be dimensioned (e.g. number of layers and hidden nodes in a Feed Forward Network) and the learning parameters need to be adjusted. When the problem space is transformed during learning and the solution space shows little or no correlation to it, the knowledge to improve the learning capabilities (i.e. incorporate domain knowledge) of the learning algorithm also transforms. In other words, if an ANN is used, appropriate knowledge needed to improve learning is knowledge about ANN and how to apply this kind of algorithm to the problem. The experience of an ANN engineer might help learning by applying rules of thumbs and knowledge about the learning properties of the learning algorithm. Expert knowledge about the problem domain no longer helps to improve learning. If the algorithm does not transform the search space, the a-

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

priori domain knowledge could more easily be used in the learning process. The Genetic Programming (GP) machine learning algorithm can be regarded to be a non-transforming algorithm, if the genome representation used is source code (or any other code easy for humans to interpret). The algorithm slightly transforms the search space prior to learning, since source code is not the normal way someone describes a problem. Still, high-level source code is designed as a simple and deterministic translation of the English language to be interpretable and understandable to a computer. The source code transformation is almost negligible and fairly understandable to most humans, at least to programmers. The domain knowledge from expertise is now useful to improve learning. One way of to incorporate domain knowledge in GP is to modify the function set used by the GP during learning. If the GP is using high-level programming source code to encode the genome, this is a rather straightforward operation. If some features of a complex, real world problem under investigation can be identified, appropriate functions can be used in the learning process (e.g. exponential function, second order differential equations, etc.). We could even design a procedure that reflects this feature (or parts of this feature) and then let GP algorithm use this procedure as part of the function set. If the learning algorithm transforms the search space in some manner in order for the learning to take place, the clear connection between the problem space and the solution space might be lost. We like to emphasize that the non-transforming feature of GP creates a rather simple mapping from the problem space to the search space and the ability of including domain knowledge into the learning process. This feature might ease the use of the learning algorithm.

7.2 Partitioning the Problem Domain into Contexts While the non-transforming property of GP and the ease of mapping domain knowledge to appropriate function sets apply mainly to low-level, detailed knowledge, the partitioning of the problem domain into contexts gives a structure to handle many different types of knowledge. This will provide a tool to handle complex scenarios with knowledge applicable in different contexts. The CxBR context organization enhances the probability of a successful GP evolution because it reduces the search space in the learning task. Before learning takes place, a structure of empty contexts is manually defined. Only the contexts and their relations are defined but they do not contain any behavioral knowledge for any situation nor any knowledge that indicates when a context shall be active. This knowledge is later evolved by GP.

Dividing a problem space into different contexts is fairly easy when it comes to human behavior modeling. It is our opinion, with support from cognitive psychology [6], [14], that people behave in a contextbased fashion. The creation of a coarse context structure is therefore straightforward to define, from shallow domain knowledge or with help of doctrines or manuals. By using the hierarchical context structure of CxBR, a structure is created where the search space for the learning algorithm is limited for each part, while able to use them all in a complex system of contextual knowledge.

8 Conclusions In this paper we presented the results from the usage of the novel human behavior modeling paradigm GenCL. We have added theoretical results from the NFLT to provide support for this learning paradigm in more complex problem domains. As the results from the scalability analysis indicates that one important factor of any machine learning algorithm is how easy it is to incorporate prior domain knowledge into the learning process, it might be applicable in many other problem domains. One such domain might be system identification by grey-box modeling. In grey-box modeling some features of the system is known and incorporated into the model while others are not. One approach is to use a machine learning algorithm to optimize these unknown model features. Here it is important that the machine learning algorithm can take advantage of the known features of the system to be modeled. References: [1] Calder, R., Smith, J., Coutemanche, A., Mar, J., and Ceranowicz, A., “ModSAF Behavior Simulation and Control”, In Proceedings of the Third Conference on Computer Generated Forces and Behavioral Representation, Orlando, FL., March, 1993, pp. 347-356. [2] Christensen, S. and Oppacher, F., “What can we learn from No Free Lunch? A First Attempt to Characterize the Concept of a Searchable Function”, Proc. of GECCO 2001, Morgan Kaufmann, pp. 1219–1226. [3] Deutsch, S., “Notes Taken on the Quest for Modeling Skilled Human Behavior”, Proceedings of the Third Conference on Computer Generated Forces and Behavioral Representation, Orlando, FL, March 17-19, 1993, pp.359-365 [4] Droste S., Jansen T. and Wegener I., “Optimization with randomized search heuristics - the (A)NFL theorem, realistic scenarios, and difficult

Proceedings of the 10th WSEAS International Conference on SYSTEMS, Vouliagmeni, Athens, Greece, July 10-12, 2006 (pp417-424)

functions”, Theoretical Computer Science, 287 (2002) 131-144. [5] Ekblad, J., Gonzalez A., Fernlund, H., Barath, P., “Automatic Detection of Discrepancies in After Action Review” Proceedings of the Interservice/Industry Training Simulation and Education Conference, Orlando, FL, Nov 2005 [6] Endsley, M., “Towards a Theory of Situational Awareness in Dynamic Systems”, Human Factors, (1995) 37(1), pp. 32-64 [7] Fernlund, H., Gonzalez, A., Ekblad, J., Barath, P., “Evolving human behavior models from live exercise data” Proceedings of the Interservice/Industry Training Simulation and Education Conference, Orlando, FL, Nov 2005 [8] Fernlund H., Gonzalez A. J., Georgiopoulos M., DeMara R., F., “Learning Tactical Human Behavior Through Observation of Human Performance”, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 36, No 1, February 2006 Page(s):128 – 140 [9] Gonzalez A. J. and Ahlers, R. H., "Context-Based Representation of Intelligent Behavior in Training Simulations” Transactions of the Society for Computer Simulation International, volume 15, no 4, December 1998 [10] Henninger A., Gonzalez A., Gerber W. Georgiopoulos M., DeMara R..,“On the Fidelity of SAFs: Can Performance Data Help?” Proceedings of the Inter service/Industry Simulation and Education Conference, Orlando, FL, 2000 [11] Ho Y., Zhao Q., Pepyne D., “The no free lunch theorems: Complexity and security”, IEEE Transactions on Automatic Control. 48(5):783-793, 2003 May. [12] Hsu, W.H., Gustafson, S.M., “Genetic Programming for Layered Learning of Multi-agent Tasks”, Proceedings of the Genetic Evolutionary Computation Conference, GECCO 2001, San Francisco, CA, July 9-11, 2001 [13] Igel C. , Toussaint M., “On classes of functions for which No Free Lunch results hold”, Information Processing Letters, v.86 n.6, p.317-321, 30 June 2003 [14] Klein, G. A., “Recognition Primed Decisions”, Advances in Man-Machine Research, W. Rouse (ed.), Greenwich, CT: JAI Press, pp 47-92, 1989 [15] Koza, J. R., Genetic Programming, MIT Press, Cambridge MA, 1992, ISBN 0 262 11170 5 [16] Ourston, D., Blanchard, D., Chandler, E., and Loh, E., “>From CIS to Software”, In Proceedings of the Fifth Conference on Computer Generated Forces and Behavior Representation. Orlando, FL., May, 1995, pp. 275-285

[17] Smith, S.H., and Petty, M.D., “Controlling Autonomous Behavior in Real-Time Simulation”, Proceedings of the Second Conference on Computer Generated Forces and Behavior Representation, Orlando, FL., March, 1992, pp. 2740 [18] Wolpert D.H. and Macready, W.G. “No free lunch theorems for search”, Technical Report SFI TR 95 02 010, 1995. [19] Wolpert D.H. and Macready, W.G., “No free lunch theorems for optimization”, IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, pp. 67 82, 1997. [20] Wolpert, D.H., “Any Two Learning Algorithms Are (Almost) Exactly Identical”, In Proceedings of the ICML-2000 Workshop on What Works Well Where, 2000 [21] Wolpert, D.H., “The supervised learning no-freelunch”, Theorems, Proc. 6th Online World Conference on Soft Computing in Industrial Applications, 2001.

Suggest Documents