Metacognition for Self-Regulated Learning in a Dynamic ... - CiteSeerX

2 downloads 0 Views 612KB Size Report
the system has access to a supervised-learning algorithm that can be used to create ..... Fox and Leake [9], [10] use a model of the reasoning process to derive ...
Metacognition for Self-Regulated Learning in a Dynamic Environment† Darsana P. Josyula, Franklin C. Hughes, Harish Vadali, Bette J. Donahue Fassil Molla, Michelle Snowden, Jalissa Miles, Ahmed Kamara and Chinomnso Maduka Department of Computer Science Bowie State University, Bowie, MD 20715 [email protected] † This research has been supported in part by grants from NSF and NASA.

Abstract—This paper describes a self-regulated learning system that uses metacognition to decide what to learn, when to learn and how to learn in order to succeed in a dynamic environment. Metacognition provides the system the ability to monitor anomalies and to dynamically change its behavior to fix or work around them. The dynamic environment for the system is an air traffic control domain that has six approach vectors for planes to land. The system has access to three basic approach strategies for choosing a landing terminal: Nearest Terminal, Free Terminal and Queued Terminal. In addition, the system has access to a supervised-learning algorithm that can be used to create new strategies. The system has the ability to generate its own training data sets to train the supervisedlearner. The metacognitive component of the system monitors various expectations; anomalies in the environment cause expectation violations. These expectation violations act as indicators for what, when and how to learn. For instance, if an expectation violation occurs because aircraft are not being assigned approach vectors within a given time threshold, the system automatically triggers a change in landing strategies. Examples of anomalies that cause expectation violations include closing one or more of the six approach vectors or changing all of their geographical locations simultaneously. In either case, the system will respond to the situation by assigning the planes to one of the currently active approach vectors.

I. I NTRODUCTION The ability of an agent to learn about its environment and make decisions based on that information can mean the difference between success and failure. Recent work [1], [2], [3] on human learning has suggested that the best learners are the ones who practice self-regulated learning. Self-regulated learning refers to the ability of an agent to be able to determine when to learn, what to learn and how to learn. Knowing when to learn requires an ability to judge when to start and stop learning. Knowing what to learn requires an ability to identify the specific piece of knowledge that is lacking. Knowing how to learn requires an ability to choose the best learning strategy available to learn what is required. Metacognition, the act of thinking about thinking, is an integral part of this process. It allows agents to change the way learning occurs. For students, an example of these steps is exam preparation. A student notes that an exam is approaching and

decides that his knowledge of some subject is lacking. The student then chooses the best way to learn the required material. The student may decide to re-read the section of the textbook that deals with that subject. Or, perhaps the student is more hands on, so instead of reading the text he would solve sample problems or he is more visual and would find a video tutorial online. The student stops reading, solving, or watching when he is confident that his knowledge has improved enough to earn the grade he desires. Metacognition guided self-regulated learning gives the student the ability to learn how to construct a learning strategy that is appropriate for a particular problem. In this paper we discuss how a metacognitive component can allow self-regulated learning in a dynamic environment — an air traffic control domain. The air traffic controller (ATC) is tasked with deciding which approach path each incoming aircraft should be assigned. The ATC is equipped with three basic approach path selection strategies and is capable of creating new strategies based on different learning algorithms. The metacognitive component of the ATC helps it learn the landing strategy that is most appropriate for the configuration of the environment that it is situated in. The learning may be as simple as changing the current strategy to a different but pre-existing strategy or as complicated as discovering a new strategy to add to its repository of available strategies. The metacognitive component of the ATC initiates the creation of new strategies or switching to alternate strategies in response to failed expectations. For creating new strategies, the metacognitive component of the ATC sets the training parameters (e.g., training duration, data set size and desired output function) based on the current environmental configuration, triggers creation of a training data set making use of the training parameters and initiates a supervised learning algorithm to train on the data set. The following sections describe the domain in which the ATC and its metacognitive component were tested, the real power behind metacognition as illustrated by examples, other related work, our conclusions and future work to be done.

II. A IR T RAFFIC C ONTROL S IMULATOR The air traffic control simulator has two major components—(i) the ATC that monitors the traffic within a specified radar range and directs aircraft toward available approach paths, and (ii) the aircraft that fly towards the ATC monitored radar area, wait for direction from ATC for an approach path and use that approach path for landing. The simulated x-y area is a 10, 000 by 10, 000 unit square with the ATC at its center: (5000, 5000). Aircraft have a z component representing their altitude (minimum : 0, maximum : 500). The ATC’s radar range is a square with corners at (2500, 2500) and (7500, 7500). The aircraft are spawned randomly in the region outside the ATC’s radar range. The aircraft initiate a connection with the ATC upon creation and are issued a unique ID. The aircraft use their default destination which is the ATC’s location at (5000, 5000) to determine their initial flight path. Aircraft outside of the ATC’s radar range fly under their own guidance until they cross into the area, at which point they begin to circle until the ATC determines and communicates instructions. Communication between the aircraft and the ATC is accomplished by TCP/IP socket connection. All aircraft land at the ATC’s location and must fly there through one of six approach paths located within 1000 units of the ATC. Once the aircraft lands, its trajectory and current position are erased from the GUI. A. Aircraft Actions Each aircraft can perform the following actions: 1) Fly to a destination at (x, y, z): The aircraft determines the proper velocities in the x, y and z directions to take it from its current location to the destination in as straight a line as possible. 2) Fly through multiple destinations: The aircraft flies through a list of destinations in order, each time flying straight from one goal to the next and ending at the final destination. 3) Fly in a circle: The aircraft flies in a circle of a specified radius. 4) Delay: The aircraft delays its flight by slowing down its speed. 5) Communicate with the ATC: The aircraft communicates with the ATC by sending messages to the ATC and receiving messages from the ATC. The aircraft can send one of two messages: • •

An update containing the aircraft’s current location, goal location and flight path. A message telling the ATC that it has successfully landed and will disconnect.

The aircraft can receive messages (See Section II-B1) from the ATC that contain instructions on the actions to be performed.

6) Land: The aircraft checks its location against a list of possible approach paths and if it matches one it will begin to perform a landing maneuver which takes the aircraft from its current location and altitude to the ATC’s location and an altitude of 0. B. ATC Actions The ATC can perform the following actions: 1) Communicate with each aircraft: The ATC can send one of the following messages to one or more of the aircraft: • An ID message that contains a unique identifier that the aircraft receive on creation. This ID is used by the ATC for tracking and modifying the flight path of specific planes, identifying new planes entering its radar range and distinguishing between different aircraft displayed on the GUI display. • A terminal approach message that contains instructions for the aircraft to use one of the six (or more) approach paths. • A delay message that can be accomplished by either slowing down or flying in a circle. • A destination updating message that contains one or more destinations the aircraft must fly through. The ATC can also receive messages from all aircraft within the radar range. 2) Alter approach paths: The ATC can add, delete, or edit existing approach path locations. There must always be a minimum of one approach path but there is no maximum. 3) Change strategy for choosing approach paths: The ATC must choose between multiple strategies for determining the most efficient approach path an aircraft should take. The ATC (with its metacognitive component) is capable of choosing between multiple arrival strategies. Nearest Terminal, Free Terminal, and Queued Terminal are the basic strategies available to the ATC. In addition to the basic strategies available, the ATC is capable of creating other strategies by self-initiated supervised learning (See Section II-B4). The basic strategies are discussed next. • Nearest Terminal Strategy Under this strategy, aircraft will be assigned to the closest approach path. Once an aircraft has been cleared to approach from a specific approach path, that path is unavailable to others until the first aircraft has landed. Other aircraft reaching the ATC radar range must wait their turn for assignment to an empty approach path by circling at the outer radar region. • Free Terminal Strategy: Aircraft will be assigned to the nearest free approach path when ATC uses this strategy. As in Nearest Terminal Srategy, once an aircraft has been cleared to approach from a specific approach path, that path is unavailable to others until the first aircraft has landed. However, unlike Nearest Terminal Strategy, in this strategy other aircraft may be diverted to another free

path if such a path is available. If no path is free, then this strategy works similar to Nearest Terminal Strategy and the other aircraft must wait their turns by circling at the outer radar region. • Queued Terminal Strategy: This strategy involves the Nearest Terminal strategy with two important differences: (i) the approach path is not closed down after an aircraft has been assigned to it, and (ii) the speed of each aircraft can be manipulated to avoid collisions. The advantage of this strategy is that the duration for which an aircraft circles before it gets assigned an approach path may be shorter if this aircraft can move towards an approach path without colliding with other aircraft that are already moving toward their designated approach paths. The strategy places a maximum of five aircraft into a landing queue for each approach path. Once the queue is full other aircraft will be forced to circle and wait until there is room in the queue. While the first aircraft is on approach at full speed, the second aircraft’s flight path from its current location to the starting point of the approach path is checked against the first aircraft’s flight path. If the ATC detects that a collision between the two aircraft is likely to occur it calculates a modified slower speed for the second aircraft and directs the second aircraft to fly at the modified speed. The third aircraft in the queue has its likely flight path checked against the first and second and so on. 4) Create New Strategies using Supervised Learning: The ATC can discover new strategies for choosing the approach paths by self-initiated supervised learning. The ATC creates its own training data set dynamically with a variable number of data points gathered from a virtul ATC which is operated in the system background. The virtual ATC spawns virtual planes at random points or from specific regions according to the the training data requested. The size of the training data is determined by the metacognitive component. Each input for creating the training data set includes (i) the approach paths, (ii) a set of locations and speeds of aircraft that are in the radar range and (iii) the current speed and location of an aircraft for which the approach terminal and flight speed need to be determined. Each output includes the approach path that the aircraft should be assigned to and the flight speed for that aircraft. The output data for the training set is determined by applying the Queued Terminal strategy to the created input data. Once the training data is created, varying configurations of a supervised learning algorithm are applied to each training set in order to determine if new strategies with lower anomaly rates can be created for situations modeling the training data. Because of this feature the ATC is capable of dynamically learning new strategies increasing its robustness and ability to operate in a dynamic environment. The supervised learning algorithm that is used in the

current version of the simulator is a back propogation neural network. Out of this learning will come a completely new strategy that is added to the ATC’s repertoire of Nearest Terminal, Free Terminal and Queued Terminal strategies. This process allows the ATC to learn new strategies that can help in a dynamic environment. If the newly formed strategy is put into use and the metacognitive component continues to see expectation violations the size of the training data is increased and the strategy creation process is restarted. 5) Avoid Collisions: The basic three strategies guarantee that there will be no direct collisions among the aircraft inside the ATC’s radar region; however, this need not be the case for the self-learned strategies. Also, under Queued Terminal strategy it is possible to have aircraft right next to each other and this close proximity is dangerous in the real world. For these two reasons, a separate fail-safe collision avoidance mechanism is available to the ATC. This mechanism automatically maintains minimum safe distance zones around each aircraft by calculating their future flight paths and manipulating the speed of one or more aircraft should they intersect. 6) Perform Metacognitive Monitoring and Control: The ATC is equipped with a metacognitive component that monitors expectations and offers responses when the expectations are not being fulfilled. In our dynamic environment the ATC’s ability to perform metacognition makes it a far more effective tool than otherwise. Metacognition makes it possible for the ATC to realize when its standard strategies are ineffective and guides the ATC to learn and apply new ones. Figures 1 through 4 show the flow of information to and from the components of the ATC. C. ATC Component Interaction

Figure 1.

The basic interaction between the components of the ATC.

Figure 1 broadly describes the interaction of the metacognitive component, control modules, user interface and knowledge base. The control modules update and access the knowledge base, the metacognitive component checks the observations in the knowledge base against its own expectations in order to adjust the control modules and inform the user when needed. The user interface allows the user to alter the performance of the control modules

and introduce anomalies for testing purposes. Figures 2 through 4 show the detailed interactions of each component under different circumstances.

knowledge base are the current observations of collision frequency, flight speeds, and flight durations. The metacognitive component has, within its own separate knowledge base, expectations in the form of threshold values for each of these observations. If the observed value of collision frequency is beyond the threshold value, the metacognitive component registers an expectation violation and lowers the current strategy’s effectiveness rating. The metacognitive component then instructs the strategy chooser to search through the strategy repository for the highest rated strategy and implement it. Once implemented, the metacognitive component waits for further observations. Should more expectation violations occur it can instruct the strategy creator to initiate the creation of entirely new strategies and then use the strategy chooser to implement them in an attempt to find a more effective solution while at the same time keeping the user updated with its actions through the user interface.

Figure 2. The detailed interactions of the components of the ATC when the metacognitive component is not involved in decision making.

Figure 2 shows the operation of the ATC when the chosen strategy is performing as desired. The strategy executor feeds the flight data and approach path locations into the algorithm of the currently selected strategy to calculate each aircraft’s designated flight velocity and approach path which are then communicated to each aircraft. The metacognitive component monitors the observations but otherwise is not involved because each of them falls within the expected thresholds.

Figure 4. The detailed interaction of the User Interface with the various control modules of the ATC.

Figure 3. The detailed interactions of the components of the ATC when the metacognitive component is actively taking part in decision making.

Figure 3 shows the operation of the ATC when the chosen strategy is not performing as desired. Within the

Figure 4 shows how the user interface (UI) can be used to introduce anomalies into the system. From the UI, users can change approach path locations, initiate strategy creation, manually select the current strategy, and control the flow of communication to and from the aircraft. The expectations of the metacognitive component were chosen such that the component itself need not be aware of each of the numerous ways the system can break down. All the anomalies will have some effect on the observations stored in the knowledge base which will cause a triggering of the metacognitive component should the values cross the thresholds as the following examples illustrate. • The user blocks the communication of a designated approach path from an aircraft so that it continues to circle, not knowing that it should have received a clearance to approach. The aircraft’s flight duration will increase eventually crossing the threshold value and





starting the metacognitive component’s response. The user moves the approach path locations. The collision frequency and/or the flight durations could be adversely impacted raising them enough to trigger the metacognitive component’s response. The user selects a strategy that is not efficient for the current set of aircraft, communicates an incorrect approach path or sends an aircraft into an infinite loop. In each case the increase in flight duration would trigger the metacognitive component’s response.

A. Illustration The following example illustrates how the metacognitive component helps the ATC regulate its learning of the best strategy to be used in a dynamic environment.

With proper expectations and response strategies in place, all possible failure models need not be hard coded into the system. Metacognition allows the system to deal with different failures by monitoring expectations and responding to expectation violations. III. T HE P OWER OF M ETACOGNITION The ATC makes its metacognitive component aware of the following observations: (i) aircraft circle times, (ii) flight speeds, (iii) aircraft locations, (iv) the number of times the collision avoidance mechanism is used under the current strategy and (v) the strategies that are available in the repository. The expectations of the metacognitive component are as follows: • • • •

Each flight will land within 100 seconds of coming into the ATC’s radar region. The aircraft speed should match the speed the ATC assigns to it. The collision avoidance system should be used a minimal number of times. The number of strategies that are available must be greater than 5.

If an expectation violation occurs because there are not enough strategies available for choosing terminals, the metacognitive component will trigger the strategy creator to generate new test data in order to create a new strategy by applying various supervised learning algorithms. Once the ATC learns a new strategy, that strategy becomes part of the strategy repository. Should the flight times continue to exceed expectations or the collision avoidance mechanism be over worked the metacognitive component will ask the ATC to change to another strategy that has not been tried yet. The new strategy could be a newly discovered one. If further expectation violation occurs, the metacognitive component can tell the strategy creator to increase the number of data points that it uses to generate training data in order to create a more robust strategy that provides more efficient results. The metacognitive component can tell the ATC to use any of the strategies it knows and it can delete strategies that do not perform.

Figure 5.

Trajectories of multiple aircraft flying toward the ATC

Figure 5 shows the GUI with multiple aircraft beginning their flights toward the ATC. The darkened center rectangle represents the ATC’s radar range. Once an aircraft crosses the radar range it circles until the ATC consults its current strategy and communicates instructions. The default strategy is Nearest Terminal. The aircraft and their trajectories are represented by the thick dark lines, their current position is marked by the ID/Altitude label. The ATC is located at the center of the radar region. Each of the lines connected to the ATC’s location represent one of six approach paths the aircraft must use. Although the number and location of these approach paths can change, shown here are the six default values. Under the Nearest Terminal strategy an aircraft entering into the ATC’s radar range is assigned to its closest approach path if the path is free otherwise it begins to circle. The disadvantage is that multiple aircraft could be circling while waiting for one approach path when adjoining approach paths are free. Figure 6 illustrates this occurrence. The metacognitive component’s expectation that all aircraft should arrive at the ATC within 100 seconds of crossing into the ATC’s radar region fails since many of the aircraft are circling while their nearest approach paths are full. It responds by having the ATC change the strategy to the Free Terminal strategy. Under the Free Terminal strategy aircraft are cleared to approach from the nearest free terminal. This allows the ATC to land aircraft without wasting as much time, thereby fulfilling the stated expectation. Figure 7 shows the result of the change in strategy from Nearest Terminal to Free Terminal. Aircraft 2S7A2, 8LOOE

Figure 6. The southwestern aircraft: 8LOOE, 2S7A2 and EXKJI are made to circle while QRUSA lands from the western approach path and MQJAX approaches the southwestern approach path. Both northern approach paths are empty.

Figure 8. For illustration purposes all aircraft are created such that their nearest approach is the western approach path.

with the Queued Terminal strategy five of the aircraft are immediately sent to the closest approach path with the other two circling for a smaller amount of time. This strategy is best used when the concentration of aircraft is high for multiple approach paths.

Figure 7. All aircraft immediately head to approach paths as determined under the Free Terminal Strategy.

and EXKJI are sent to approach paths that are available instead of waiting at the edge of the radar range for their closest approach paths to become available. When the distribution of aircraft is spread out the metacognitive component selects Free Terminal strategy; however as Figures 8 and 9 illustrate, with a more concentrated distribution of aircraft, the metacognitive component selects the Queued Terminal strategy. Under the Nearest Terminal strategy, all but the first aircraft in Figure 8 would be left circling while the approach path was unavailable leading to very long flight durations for the last flights to cross into the radar region. However

Figure 9. Queued Terminal strategy directs five aircraft at once to approach by manipulating their flight speeds to avoid any collisions. Two aircraft must wait for the first aircraft to land before one of the two is permitted in the approach.

Figure 9 illustrates a high concentration of aircraft at one approach path being successfully instructed under the Queued Terminal strategy. Using flight speed manipulation this strategy can provide efficient approach path instructions during periods of high traffic. Our approach to solving the problem of learning strategy selection in a dynamic environment necessarily cannot rely upon any explicit knowledge of the environment itself. The

metacognitive component therefore does not incorporate hard coded heuristics for determining learning strategies since dynamic environments can lead to any such heuristics becoming obsolete from one moment to the next. Instead, the metacognitive component looks solely at failed expectations and determines in real-time how to respond based on its array of available actions. Say, for instance, the approach paths need to be changed. Perhaps the system is being deployed at a different airport or some terminal has to go down for maintenance. An ATC with metacognition is able to treat this situation the same way it treats the others. Aircraft that are sent to incorrect approach path locations would begin to circle and their flight times increase. The metacognitive component of the ATC notices that flight times are outside of expected values and instructs the ATC to create a new training data set that incorporates the new approach path locations. This new strategy then fulfills the required maximum flight time expectation. IV. R ELATED W ORK Metacognition has been shown to be a key ingredient in problem solving and learning in the research in various fields including psychology, education and linguistics. For example, a study [3] involving college students and their ability to predict the grades they would receive on examinations based on their self awareness of their academic strength or weakness, found that the more accurate the student’s prediction the higher the student’s score. The authors summarize that “expert learners are also skillful at metacognitive knowledge monitoring”. The ability, or lack thereof, of a problem solver such as a student to recognize that their knowledge has come to a point that will allow them to succeed at difficult tasks related to that knowledge can predict future success. This relationship between selfregulated learning and metacognitive knowledge monitoring as witnessed in human beings is the motivation behind our approach to regulating the learning for systems deployed in dynamic environments. In the field of artificial intelligence metacognition has been applied in various ways to help systems learn. Cox [4], [5] presents a computational theory of introspective failure-driven multistrategy learning. The reasoning of an agent is represented explicitly in knowledge structures called Meta-XPs. The Meta-XPs explain how and why reasoning fails. This knowledge is used by the learner to determine the proper learning strategy. The theory is implemented in a case-based reasoning system called Meta-AQUA [6], [7]. This system reads stories sentence by sentence and attempts to understand them. If the knowledge base fails to provide an explanation for a sentence, then a reasoning failure is generated. When a reasoning failure occurs in MetaAQUA, the system creates new learning goals autonomously based on the introspective analysis of its own successes and failures at the performance task. A non linear planner

then selects a learning strategy based on the learning goal. This approach to solving the problem of selecting the best learning strategy is based on the machine’s prior knowledge of what strategy best fits the determined goal. However in certain dynamic environments the effectiveness of any one strategy can change with time, so it is harder to explicitly state the best strategy for each learning goal. Raja and Lesser [8] describe a reinforcement learning technique which allows agents to learn meta-level control policies that govern decisions in multiagent environments. The system learns a meta-level Markov Decision Process (MDP) model that represents the system behavior for a particular environment from a set of states, actions, transition probabilities and a reward function. The system learns the MDP model by making random decisions to collect state transition probabilities. While these studies focus on learning metacognitive control knowledge that can help in domain activities like task scheduling, our research focuses on the application of metacognition to self-regulate the acquisition of new knowledge (discover new approach choosing strategy) or revise existing knowledge (change current landing strategy). Fox and Leake [9], [10] use a model of the reasoning process to derive expectations about the ideal reasoning behavior of the system; the actual reasoning is compared to this ideal to detect reasoning failures. Their system uses introspective reasoning to monitor the retrieval process of a case-based planner and detect the retrieval of inappropriate cases. When retrieval problems occur, the explanations for the failures are evaluated and these explanations are used to update case indices in order to improve future performance. This work is close in spirit to ours in the use of expectations as a means to detect failures. While their system monitors expectations to improve the performance of a case-based planner, our system monitors expectations to improve the performance of an agent situated in a dynamic environment. The POIROT project [11] presents an architecture that combines a set of machine learning approaches to learn complex procedural models from a single demonstration in a medical evacuation airlift planning domain. The overall learning framework is based on goal-driven learning that performs targeted searches for explanations when observations do not agree with the developing model and creates new knowledge goals for the different learning components. A meta-control learning moderator signals when to propose learning hypotheses and when to evaluate them based on the knowledge needs of the different components. While POIROT focuses on learning a generalized hierarchical task model from a demonstration of a sequence of web service transactions, our work focuses on learning a task model that is appropriate for the current environmental setting. Cox and Raja ([12]) believe that “at the meta-level, an agent must have a model of itself to represent the products of experience and to mediate the choices effectively at the ob-

ject level. Facing novel situations the successful agent must learn from experience and create new strategies based upon its self-perceived strengths and weaknesses.” In this way, the system becomes a complete cognitive system capable of making decisions and having a clear understanding of its own capabilities, its relationship to the problem and the environment in which it exists. V. C ONCLUSIONS AND F UTURE W ORK At this point the metacognitive component is helpful in choosing strategies that reduce expectation violations, but it has a limited number of learning algorithms which it can apply to the training data set. In the future we hope to increase the number and type of classifiers available for new strategy creation. To this end we are exploring the Weka data mining and machine learning software [13] from the University of Waikato. This software package contains multiple machine learning algorithms incorporated into a Java API and has independently produced solutions to several training data sets. Integrating this software package into the strategy learning module of the ATC will allow for more diverse strategies yielding an increase in efficiency and robustness for dynamic environments. Also in the future we would like to remove the metacognitive component and instead have the ATC communicate with the Metacognitive Loop (MCL) software [14], [15], [16], that reasons with the same metacognitive algorithms but makes use of three generic ontologies—indications, failures and responses—to note anomalies, assess failures and respond to the situation. MCL has been tested successfully in other domains and its use in the air traffic control domain will further test the generality of the MCL ontologies. MCL uses a Bayesian network to select a proper response for a particular input and only requires reconfiguration of the fringe nodes when applied to different domains. This project will be used to support further evolution of MCL in hopes of creating a general reasoning system that can be applied in any domain with minimal reconfiguration. Use of MCL will allow the system access to a more complete decision making system that uses more complex methods of evaluation to learn proper reactions to expectation failures. R EFERENCES [1] A. L. Wenden, “Metacognitive knowledge and language learning,” Applied Linguistics, vol. 19, no. 4, pp. 515–537, 1998. [2] W. P. Rivers, “Autonomy at all costs: An ethnography of metacognitive self-assessment and self-management among experienced language learners,” The Modern Language Journal, vol. 85, no. 2, pp. 279–290, 2001. [3] R. Isaacson and F. Fujita, “Metacognitive knowledge monitoring and self-regulated learning: Academic success and reflections on learning,” Journal of the Scholarship of Teaching and Learning, vol. 6, no. 1, pp. 39–55, 2006.

[4] M. T. Cox and A. Ram, “Introspective Multistrategy Learning: On the Construction of Learning Strategies,” Artificial Intelligence, vol. 112, pp. 1–55, 1999. [5] M. T. Cox, “Introspective Multistrategy Learning: Constructing a Learning Strategy under Reasoning Failure,” Ph.D. dissertation, College of Computing, Georgia Institute of Technology, Atlanta, USA, 1996. [Online]. Available: hcs.bbn.com/cox/thesis/ [6] A. Ram and M. Cox, “Introspective reasoning using meta-explanations for multistrategy learning,” in Machine Learning: A Multistrategy Approach IV, R. Michalski and G. Tecuci, Eds. San Mateo, California: Morgan Kaufmann, 1994, pp. 349–377. [7] A. Ram, “AQUA: questions that drive the understanding process,” in Inside Case-Based Explanation, A. R.C. Schank and C. Riesbeck, Eds. Hillsdale, New Jersey: LEA, 1994, pp. 207–261. [8] A. Raja and V. Lesser, “A framework for meta-level control in multi-agent systems,” Autonomous Agents and Multi-Agent Systems, vol. 15, no. 2, pp. 147–196, 2007. [9] S. Fox, “Introspective learning for case-based planning.” Ph.D. dissertation, Department of Computer Science, Indiana University, Bloomington, IN., 1995. [10] S. Fox and D. Leake, “Introspective reasoning for index refinement in case-based reasoning,” Journal of Experiment and Theoretical Artificial Intelligence, vol. 13, pp. 263–88, 2001. [11] M. H. Burstein, R. Laddaga, D. McDonald, M. T. Cox, B. Benyo, P. Robertson, T. Hussain, M. Brinn, and D. V. McDermott, “Poirot - integrated learning of web service procedures.” in AAAI, D. Fox and C. P. Gomes, Eds. AAAI Press, 2008, pp. 1274–1279. [12] M. Cox and A. Raja, “Metareasoning: A manifesto,” in BBN Technical Memo TM-2028. Cambridge, MA: BBN Technologies, 2007. [Online]. Available: http://www.mcox.org/Metareasoning/Manifesto/manifesto.pdf [13] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, 2005. [14] M. L. Anderson and D. R. Perlis, “Logic, self-awareness and self-improvement: The metacognitive loop and the problem of brittleness,” Journal of Logic and Computation, vol. 15, no. 1, 2005. [15] M. Schmill, D. Josyula, M. L. Anderson, S. Wilson, T. Oates, D. Perlis, and S. Fults, “Ontologies for reasoning about failures in AI systems,” in Proceedings from the Workshop on Metareasoning in Agent Based Systems at the Sixth International Joint Conference on Autonomous Agents and Multiagent Sytems, 2007. [16] M. L. Anderson, S. Fults, D. P. Josyula, T. Oates, D. Perlis, M. D. Schmill, S. Wilson, and D. Wright, “A Self-Help Guide for Autonomous Systems,” AI Magazine, 2008.

Suggest Documents