Tangible Ambient Intelligence with Semantic Agents in Daily Activities Sébastien Dourlens*, Amar Ramdane-Cherif, Eric Monacelli LISV, University of Versailles Saint Quentin, Centre Universitaire de Technologie, 10/12 Avenue de l'Europe 78140 Vélizy, France (
[email protected],
[email protected],
[email protected] )
Abstract. A system should be able to interact with an ubiquitous network of heterogeneous sensors and actuators in order to exploit the maximum available information. Event intensive environment, reactive systems, multiple communication languages and integration of various devices make the architecture and the final system so complex that performance and efficiency tend to be widely affected. Existing solutions are mostly dedicated in solving particular problems. The proposed solution is a different approach to extract the meaning of the situation using semantic agents in order to manage the interaction processes in the human environment. Our components, semantic agents and services, are web services to compose the architecture and act in the environment. Agents possess a new rational memory and an inference engine in order to model and reason with words on entities behaviours. Agents and services exchange events in a knowledge representation language (KRL) close to natural language. They work together to ensure an ambient intelligence for taking care of disable or elderly people at home. Two important points are highlighted: First, building the architecture with several scenarios of composition of services and agents. Second, how agents and services will interact to provide support.
Keywords: Interaction, Semantic Agents, Daily activities
1. Introduction Interaction between robots or machines with the human environment using agents is complex. We are looking for the solutions to reinforce the understanding and disambiguation of this interaction. As a possible solution, we propose in this paper to design a pervasive architecture composed of generic components. To build such architectures, several existing standard technologies must be chosen and taken into account. Different scientific domains must also be combined into our encompassing semanticdriven and semantic-aware system architecture. Therefore, semantics assumes a core role in the architecture and proposed interaction processes, namely raising the cognition level of the system. Pervasive systems consist of humans, robots and smart components cooperating together by performing different tasks. In these systems, a richer symbolic representation facilitates the interaction with humans, other entities and objects to store facts that can arrive. As it is a human environment, it is desir*
Corresponding author. E-mail:
[email protected].
able to share a narrative KRL [60,61] close to human natural language (NL). Frames are pretty well adapted for representing complex events [40,47].This KRL can be used to communicate events between all agents and between agents and services, and to memorize knowledge of the world. This knowledge represents what the entities and the behaviours (events produced by entities) are and in general what is happening in the environment in order to take decisions. Robotic interaction is multimodal because of several inputs and outputs modalities to manage between agents and environment [1]. Sensors and actuators are driven by input/output services. Services are driven by agents. Then, our architecture must realize composition of services and agents to create agencies. Indoor applications can be extended to outdoor environments using ubiquitous network and realizing temporarily a composition of other agencies. To explain the main idea of this paper and as a proof of concept, we developed a multi agents architecture embedded in robots and a house. It is destined
for human activities of daily living (ADL)[32]. Robots have to assist human in a pervasive environment [53,20,11,39]. Robots contain agents connected to the network. The objective of the application is to insure house security with a smart monitoring or alarm system, human security and assistance for health care for old or disable persons, people comfort simplifying human tasks and companion for lonely people or kids by reading, playing and dialoguing. Jack is a human being alone at home. His house is composed of walls, doors, windows and intelligent equipment connected to a domotic network. Robots and House contain semantic agents and services as we will define after. Robots are able to communicate with all sensors and actuators like video cameras, vocal synthesis, perform face and object recognition. Nao robot acts as a companion, Spykee robot as a security guard, and Roomba robot as a vacuum cleaner for cleanliness of Jack’s house. The agents’ memory possesses a set of models of event to recognize or use other agents and services of the network. Many events and concepts relative to house objects and human activities are already present in memory. In this paper, we bring functional and exchangeable components to fulfil the above requirements. Section 2 presents the interaction problem that appear in multimodal systems interconnected to a human environment. Section 3 presents the requirements that we expect to obtain. In particular, components of the pervasive architecture must be as generic as possible, respect current standards and use recently available technologies. Section 4 presents the modelling of the interaction architecture and its components. Section 5 presents the modelling of the pervasive architecture. In section 6 the application to daily activities is presented. Section 7 presents a comparison of different semantic architectures. Section 8 gives the conclusion. 2. Interaction Problem Until recently interaction was limited to humanmachine interfaces. Human-robot interaction (HRI) [13,18] is a new growing research field not limited to user interfaces and, hardware and electrical engineering specific to Robotics research domain. Multimodal Interaction is a part of HRI that involves events acquisition and awareness context, interpretation context and execution context. Communication, cooperation and coordination between entities and groups of entities must be established to insure the global interaction according to the composition of agents.
Multimodal Interaction refers to two important processes of interaction. They are well presented in [29] but to resume author’s view: Multimodal fusion starts from low level integration (signal information) to high level storage of the meaning (semantic event information) by composing and correlating previous data coming from multiple sources (sensors, interaction context, software services or web services) [30]. Therefore, information fusion refers to particular mathematical functions, algorithms, methods and procedures for data combination. Multimodal fission, in the opposite way, is the process to physically act or show any reactions to the inputs in the current situation. According to decision rules taking following the fusion process, the fission will split semantic results of the decision into single actions to be sent to the actuators. Multimodal fusion is a central question to solve and provide effective and advanced human-computer interaction by using complementary or redundant modalities. Multimodal fusion helps to provide more informative, exact, complete, reliable interpretation. The cross-modal dependency between modalities allows reciprocal disambiguation and improves recognition in the interpretation of the scene or the state of the world at a high semantic level [23,27,17,28,22]. Multimodal fission will define the best modalities and actions to do in the environment depending on the current context and the evaluation of events resulting of the fusion step. Essential requirements for multimodal fusion and fission engines are: - Synchronization of modalities by sending events; - Cognitive algorithms including a formal logic; - Context representation considering all concepts and actions; Data transfer bandwidth which is necessary for efficient applications and real-time constraints to respect in real life and - Robotic systems and the simulation environment to validate the architecture. 3. Requirements Multi Agent System is useful paradigm for distributed Artificial Intelligence and for distributed Knowledge Management [3, 34, 10, 6, 41]. Advantages of agents are to be autonomous, to interact with their environment and interact with other agents. Different types of agents exist: reactive (preprogrammed reaction), intelligent (can choose or adapt reaction to situation) and hybrid. Intelligent agents can be programmed, cognitive or rational. Rational agents [50] use logic to infer decisions and
execute plans. Cognitive agents have human brain functions similarities. BDI agents [48] are cognitive agents who have beliefs, desires and intentions exploited to solve some modal inferences conflicts. Our semantic agents require only to be cognitive. They are simple generic agents; all have exactly the same piece of code which is the inference engine (based on formal logic) and an embedded semantic memory. Memory content, and in particular models, will vary from one agent to another depending on its fusion or fission work in the architecture. Our architecture must be designed to provide behavioural interpretation so the memory is required to store events or facts happening in environment. Fusion agents and fission agents are autonomous but must manage facts at different behavioural levels of abstraction. The memory must be built in knowledge representation language compatible with ontological inference engine to capture classes of concepts, instances of concepts and models of event. Inference engine is in charge of querying ontologies in the goal of extracting meaning of past facts from this memory. Domain ontologies are graphs of concepts linked to largely modelled using the Ontology Web Language (OWL) [35,44,52]. Formal logic brings stored data consistency checking and correct resulting inferences (reasoning). These agents must also be able to work together in local network (ambient intelligence) but not only. To extend interaction to human world, they can discover new services (pervasive architecture) and interact with them (ubiquitous network). That’s why web services were designed. Web services [38,4,46] are services that can execute multimodal software or hardware functions. They use Simple Object Access Protocol (SOAP 1 ) and their abstract definitions are listed in WDSL files or UDDI servers. Semantic web services [36,43,55,31,33,58,] can even store and exchange information using knowledge representation language. An interesting idea could be to have the same communication language for all agents and services. A second reason is that our agents don’t need ACL languages like Knowledge Query and Manipulation Language (KQML2) based on KIF3 to get direct answers. Because, all events are propagated and filtered from the inputs towards the outputs services and thus agents don’t manage dialog processes or internal interactions between them. Agents act like state machines. This is better for discrete events and 1 2 3
http://www.w3.org/TR/soap12-part0 http://www.cs.umbc.edu/research/kqml/papers/ http://www-ksl.stanford.edu/knowledge-sharing/kif
not to lock parallelism, autonomy and adaptation of agents. Foundation for Intelligent Physically Agent (FIPA4) standards from IEEE Computer Society also provides good advices in managing physical agents or services driving our material.
4. Multimodal Interaction Architecture 4.1. Introduction The objective of our architecture is to realize the fusion and fission processes using all useful situational knowledge (i.e. past and current states of entities) stored in the autonomous agent memory.
Fig. 1. Pervasive architecture for multimodal interaction
Figure 1 presents fusion and fission agents connected to input/output services in the network. In our case, services represent any smart or pure reactive components of the pervasive environment. They are ambient communicating entities in only one direction. Each new state or answers to messages from a previous agent will come through the environment which is an external loop for collaborative team of agents. Obviously, meaning of the situation must be quickly extracted to take a reactive decision. This meaning is very important to obtain a correct interpretation. Meaning of the situation and situation refinement require developing a description of the current relationships among entities and events in the environment context [59,8]. The extraction of the meaning to understand what is happening, as well as ontological storage [60,61] of the events, is very important for the interpretation. Events are stored under related classes of behavioural models at different levels of subsumption. Next sections will present our realization of the interaction architecture.
4
http://www.fipa.org
4.2. “PutThat There” Scenario We take the famous ―Put That There‖ scenario [9] which is an academic and demonstrative example of multimodal interaction. Figure 2 shows the fusion of events coming from three input services of the types ―Gesture Sensor‖, ―Object Recognition‖ and ―Vocal Recognition‖. They can be embedded in the robot or be parts of the house. The fusion process is in charge of merging events sent by input services that happen in a period of time in a composed event that represents ―a human giving a vocal order and pointing an object and a location‖. Once this meaning of the situation is obtained, the fission process may decide to send orders or plans to output services (―Move Arm‖ and ―Open/Close Hand‖) managing robot actuators that will act by grasping the pointed object, moving the arm to the pointed location and releasing the object. The ―Put That There‖ scenario will be used throughout this paper to illustrate the proposed concepts.
fusion agents and fission agents. They will be differentiated in Section 5. Figure 4 presents the basic structure of an agent and a service. Agent contains its knowledge base (called agent’s memory), its inference engine and its communication module. Service has only code and standard memory (properties and methods in generic programming object model), a communication module and a hardware controller module. The hardware controller enables the service to receive information from a sensor or to drive an actuator. The communication module contains the network card and its semantic functionalities to write and send the events in environment knowledge representation language (EKRL) or to receive them.
Fig. 4. Semantic agents and services components Fig.2. ―Put That There‖ Scenario
4.3. Semantic Agents and Services To fulfil our requirements and have the most possibly adaptable, distributed, ubiquitous and reliable architecture, we design two types of autonomous components (semantic agents and services) permitting to build any multimodal architectures in interaction with the human world (Figure 3).
Fig.3. Ambient Intelligence Network
Input/output services (light grey) are connected to software library or physical entities of the environment (black). To respect our definition of multimodal interaction, we have two kinds of agents (rounds):
Services are standard web services able to create messages and communicate in EKRL. Role of semantic services is to send any information obtained from the environment to agents using hardware sensors or execute a software function as a simple web service, or to execute orders to control actuators. Services can be seen as reactive agents with no cognitive part but enough code to build EKRL messages for communication with agents and to realize the process they are designed to. Semantic agents are also web services but little more sophisticated because they are cognitive or functional; they possess their own abilities and program to achieve their tasks and goals. Semantic agents contain an embedded inference system able to process the matching operation. They are intelligent agent with cognitive abilities to answer queries. Scenarios or execution schemes are stored in their knowledge. It is important to note that semantic agent is a generic component of our model of architecture. The only difference between two agents is its knowledge
in memory, not its execution code. In particular, specific query models (also called models of rule) stored in the agent’s memory define the agent program, i.e. the role of agent in the organization. They receive, filter and attach facts to their own models of event. In the same idea, they will send event orders or new composed facts according to their internal query models for the function they are conceived. A developer can program agents by reusing default concepts and models and by adding their models of rule in the agent’s memory with our editor presented in Section 4.4. Figure 5 shows that Semantic agents and services are pure standard webservices using SOAP for interoperability. They interact with the environment via the (wireless or not) network also part of the environment. OWL-S repositories and UDDI directory servers for discovery are also input services of our architecture
describe events in a narrative way very close to a natural language. EKRL is fully used to build event messages and store facts directly in their classes of models in the agent memory. The formal system is composed of the formal language based on variable arity relations in logic of predicates (event frames). It permits the realization of semantic inference in order to extract the meaning of the situation. In EKRL, frames are predicates with slots that represent pieces of information. A slot is represented by a role associated to an argument. A predicate P is a semantic nary relationship between Roles and Arguments and represents a simple event se or a composed event ce; it is denoted by the following formula: P((R1 A1)… (RnAn)) Where Ri is a predicate role and Aj is a list of arguments. Roles Rn are the possible roles (dedicated variables storing arguments) in the event model and An are the possible combination of values or instances of concepts in a stored model or fact. :(name of the event model) Natural language: „' (optional natural language sentence description) Fig. 6. Event Model Description
Fig. 5. Networking Messages
In sections 4.5 and 5, we will present their importance in the composition of services and agencies. Using external standard web services require XML to EKRL wrapper (input services) as they are not using EKRL semantics. The result is that our architecture is operating system-independent and built on standards to be compliant. To achieve composition, agent memories will keep service profile, service grounding and service model in memory. Agent and Services have an IP addresses and one or several TCP ports. Mobile agents have to change IP addresses while moving from a network connection to another. Additional security schemes may be added to manage privacy of information, services or network when the mobile agent acquires a new address or a Kerberos ticket is granted to access a service. 4.3.1. Event Knowledge Representation Language Event Knowledge Representation Language (EKRL) is a semantic formal language L that can
The figure 6 shows a sample model written with the EKRL syntax. The list of all roles is part of the Meta ontology of the agent memory. The term Role can be OBJECTIVE, SOURCE, BENEFICIARY, MODALITY, TOPIC, CONTEXT, MODIFIER, DATE and so on. Models of event are models of predicates and instances of predicate specific to a situation. Exist: Available Service SUBJECT: composition SENDER: services DATE: date time LOCATION: location Fig. 7. ―Exist:Available Service‖ event model
Figure 7 presents a predicate model of ―available service‖ event. ―Exist‖ is one of the root predicates of the models ontology tree (among MOVE, RECEIVE, BEHAVE, OWN). ―Exist‖ is a general event model of the ongology of models of event expressing a creation or a discovery of anything. SUBJECT, SENDER, DATE and LOCATION are the roles of this predicate. As you can see, this event model will
permit agents to know new services which are connected and available for composition. This event will be sent by services to inform about their availability. In the instances of this model, agents will be able to know the service name, the date of sending and the location of the services.
OWL subsumption, memberships, equivalence, synonymy, discrepancy relationships and some linguistics modalities.
4.3.2. Agent‟s memory Agent’s memory is the most important piece of software of our agent for it to be cognitive: abilities to store and retrieve events, understand the situational meaning and create new events to be sent to other semantic agents and services. Meta concepts, concepts, models of event, instances (facts, happened scenarios and context knowledge) are stored in ontologies [19,42] using OWL relationships [35]. Agent’s memory is a knowledge base enabling the storage of all events coming from the network and used for cognitive operations by recalling any past facts or for reasoning and acting following stored models of event. This memory contains a domain ontology called Concepts Ontology and a second ontology called Models Ontology fully linked with concepts in frames to give agent the cognitive abilities. Figure 8 represents the knowledge base and inference modules of the agent. StoringEngine and QueryEngine are functional parts of the ontological inference engine. Cylinders are parts of the knowledge base and are respectively ontology classes and instances in Meta, Concepts and Models Ontologies. All inserted facts coming from the network are fully linked in a rational way to concepts, models in this agent memory.
Fig. 8. Storage and Querying in the agent memory
In figure 9, Meta Ontology is a domain ontology containing all types of nodes and relationships, roles and modifiers used to build concept and models ontologies. Meta ontology contains meta concepts like
Fig. 9. Knowledge base content
Concepts Ontology is a common domain ontology containing hierarchically sorted concepts with instances of concepts. It is fully compatible with OWL in order to allow the import of new concepts. Models Ontology is a knowledge base containing hierarchically sorted models of event with instance of models called facts. It embeds templates of events under the form of predicates and instances of events. Content of models is an EKRL frame filled with concepts and instances of concepts to build the facts. 4.3.3. Agents Inference Objective and Main code We propose to use EKRL for agent communication, storage and as events format. In all other platforms, agent behaviours are programmed in Java, C or C++. Designers of our agents just need to insert EKRL events to program them. But generally, ontology is a shared blackboard used to transmit data between agents in ACL messages. Programming wise, it is necessary to develop converters and to control the standards and protocols, therefore becoming the work of expert in software engineering which is difficult to implement and to maintain. To define and control their applications, current standards are more high level protocols requiring efforts of programming, than a more natural language used to execute tasks and interpret the situation. Using this natural language is one of our contribution motivations. Agents with inference engine were rational but not so cognitive. To enhance their capacities, some of
our predicates are similar to those of events calculus [26,49,54]. One more point is our ontologies are absolutely not only used to communicate but also to store facts and compose new facts of higher level in case of fusion and of lower level in case of fission. We include some models of event corresponding to actions or scenarios in the memory so that this memory can recognize facts of this nature and send meaning of higher level. It is more a modelling and agent programming memory than a self adapting memory. We wanted a well organized symbolic memory that can fulfil robotic interaction requirements. Models are used so that: Agent can store the various events; Agent can read its program. This program is in the form of models of query stored in memory because the work of the agent of fusion or fission is only to produce events from those already present in its memory; Developer can question the memory to check instances of concepts, events and recorded facts; Developer can program the agent: add or modify concepts, models of events, models of requests to store facts directly under the models of corresponding events. EKRL is thus also used as programming language besides being a language of communication and storage. Agent inference engine plays the role to process memory information and is used to: 1. Store events in memory under existing models of event; 2. Query the memory (using query models) to find direct answers (direct matching) or to find indirect answers (matching needs operations execution) using operations on concepts as arguments of a role and operation on events (other predicates) as arguments of a role; 3. Execute scenarios (send/broadcast several events to one or several service agents). Algorithm 1 is the main algorithm executed by the agent’s code. Two methods krl_listen() and krl_send() permit to communicate with other agents. These functions will use a model of events and fill the different role using values or concepts related to the service job. The Main loop stores new facts and
check for models of queries (program of the agent). In the case new facts are added or queries match query instances, the main algorithm will check for new matching facts and will send them to other agents able to accept them (i.e. if they have the same models in memory). If event model doesn’t exist, the new fact will be simply ignored. Then agent won’t manage this kind of events meaning and will concentrate on the tasks for which they are programmed. Do Fact←krl_listen() If Fact Then NewFact←StoreFact(Fact) End If [Result, [Queries]]← QueryEvents(QueryModel) If Result or NewFact Then For each Query of Queries [Result, [MatchEvents]]← QueryEvents(Query) If Result Then krl_send (MatchEvents) Next End If Loop Algorithm 1. Main Code
StoreFact Function This function permits to store a fact into memory under its event model. If Result (Boolean) is true, fact is stored under its model. Else, event is ignored. If no corresponding event model is found, it means agent is not designed to process this type of events reducing its memoryload and workload. Prototype of the function is: [Result]← StoreFact(Fact). [vectorname] syntax denotes a vector named vectorname.
[ParseError,RootPredicate, Predicate, [Roles], ments]]←Split(Fact) If ParseError Then Return False EventPredicateID ←Matching(RootPredicate, Predicate) If EventPredicateID>0 Then StoreRA(EventPredicateID, [Roles], [Arguments]) Return True Else Return False End If Algorithm 2. StoreFact
[Argu-
QueryEvents Function This function queries memory for models of event and facts. If result is true, events found are sent to all other agents. Else, no event is sent. Prototype of the function is: [Result, [MatchEvents]]← QueryEvents(QueryModel or QueryModelID)
If (QueryModelID) Then QueryModel←Get(QueryModelID) [ParseError, RootPredicate, Predicate, Roles, Arguments]←Split(QueryModel) If ParseError Then Return [False, EmptySet] [EventsPredicateID]←Matching(RootPredicate, Predicate, [Roles],[Arguments]) If count(EventsPredicateID)>0 Then [MatchEvents]←GetTextEvents([EventsPredicateID]) Return [True, [MatchEvents]] Else Return [False, EmptySet] End If Algorithm 3. QueryEvents
Matching Function This function does the matching operations between predicates and roles. StoreRA() and ReadRA() are respectively SQL Insert and Select operations in the Role-Arguments table filtered by ID arguments given to these two functions. Modifier is a role to modify the sense of an event. According to modifiers, matching result can vary. If the sense of the fact is negative, the sense of the composite event is inverted so the matching takes into account this linguistic modality as well. Protoype of the function is: [EventsPredicateID]←Matching(RootPredicate, Predicate, [Roles], [Arguments])
ReadConcept(QueryArgID) is SQL Select operation in the nodes table where type of node is concept classes or instances, and where these nodes are under the given node using the subsumption relationship of links table. The SQL request gives all nodes of subtree sorted. Arguments of ―date‖, ‖location‖, ―context‖, ‖content‖ and ―value‖ roles will be compared with specific meta operators like ―>‖, ―