the RAPP platform and providing to the dynamic agent the interface to the robot's ... sending emails containing messages recorded by the NAO robot is governed ...
Distributed, reconfigurable architecture for robot companions exemplified by a voice-mail application Maksym Figat, Tomasz Kornuta, Marcin Szlenk and Cezary Zieli´nski Institute of Control and Computation Engineering Warsaw University of Technology Nowowiejska 15/19, 00-665 Warsaw, Poland Email: {M. Figat, T.Kornuta, M.Szlenk, C.Zielinski}@ia.pw.edu.pl
Abstract— The limitations of the computational capabilities of robots’ on-board computers necessitate the distribution of their control software in a cloud. This paper presents a distributed, reconfigurable control architecture based on the decomposition into agents, two of which are executed on the robot: the core agent is a fixed part of the controller, providing the task-independent robot capabilities, whereas the dynamic agent, loaded from the cloud when required, executes the task dependent program. The two mentioned agents execute the task, additionally utilising the capabilities of the cloud. The system is presented on an exemplary application of a NAO robot sending voice-mails.
I. I NTRODUCTION Robots assisting humans in everyday tasks require sophisticated capabilities mimicking those possessed by humans, i.e. visual perception leading to scene understanding, planning by decomposition of complex problems into simpler ones or bimanual manipulation of articulated objects. This is especially evident in the case of robots designed to assist social inclusion of the elderly, where the diversity of the tasks necessary to execute is overwhelming [?]. This leads to high complexity of robots’ control systems, which in turn requires high computational power of their on-board computers, and this usually exceeds the capabilities of the modern units. The technical goal of the RAPP project (Robotic Application for Delivering Smart User Empowering Applications [?]) is to overcome those computational limitations by distribution of robot controllers onto a network of computers, potentially forming a computational cloud. But in opposition to the efforts in the field of Big Data and Cloud Computing for robotic purposes [?], made in projects such as KnowRob [?] or RoboEarth [?], our research was focused on a modifiable architecture enabling effective distribution of the computations between the robot on-board computer and the cloud, simultaneously facilitating the injection of user’s tasks into the system. In [?] we presented a new distributed architecture, based on decomposition of a system into a set of cooperating agents. The assumed approach was previously used for the specification of a fixed composition control systems, e.g. governing the work of a set of robots forming a fixture during machining parts of airplane fuselages [?] or a dual-robot system with haptic feedback [?]. The system presented in this paper uses the idea revealed in [?]. That system enables a dynamic change of the composition of the control system, according to the requirements of
the task. In particular, it enables downloading from the cloud repository and running agents that execute the user’s task. The contribution of this paper is a detailed definition of the roles, operation and cooperation between subsystems forming the developed distributed controller. In particular, a voice-mail application is presented. It is dynamically loaded from the repository and executed on a NAO robot, thus enabling the user to conveying by using voice to the robot the text of the message and dispatching it to the recipient. This application was selected, because it is simple to understand on the one hand, and on the other requires the utilization of both robot effectors (loudspeakers), receptors (microphones) and external cloud services (gmail server), hence contains all elements that are necessary to develop more complex robot tasks for future RAPP application developers. II. G ENERAL STRUCTURE OF THE RAPP SYSTEM Design of a not only distributed, but also reconfigurable system, requires appropriate tools. The utilized approach is based on decomposition of the system into a set of individual, yet cooperating entities, called agents. In particular, because some of them are supposed to possess real bodies (e.g. agent controlling the NAO robot), we refer to them as embodied agents [?], [?]. An embodied agent aj , where j is the designator of the agent, may by composed of five types of subsystems: • Real effectors E j,h – devices affecting the environment, • Real receptors Rj,l (exteroceptors) – devices gathering the information about the state of the environment, • Virtual effectors ej,n , responsible for presentation of the real effectors to the control subsystem in such a form that the expression of the task is considerably simplified, • Virtual receptors r j,k , responsible for aggregating data obtained from the real receptors into a form accepted by the control subsystem, • Control subsystem cj , responsible for the realisation of the task allotted to the agent and coordination of the associated subsystems. It is assumed that an embodied agent possesses a single control subsystem and zero or more real and virtual effectors and receptors – hence the additional indices: h, l, n, k appear. In particular an agent might not possess any effectors and receptors, yet be useful for performing some external computations, hence computational agents emerge.
CLOUD arepo
ROBOT
CONTROL crep SUBSYSTEM
Services
acore Commands
Current state
Aggregated readings
VIRTUAL ecore EFFECTORS Commands
VIRTUAL rcore RECEPTORS
Current state
NAO
Readings
Ecore
NAO
EFFECTORS Services
acloud
CONTROL ccloud SUBSYSTEM
Services Services
ccore
CONTROL SUBSYSTEM
adyn
Rcore
RECEPTORS Services
CONTROL SUBSYSTEM
cdyn
RAPP Fig. 1.
General 4-agent system structure
Operation of embodied agents is defined in terms of behaviours, i.e. patterns of cyclic executions of actions, which are parametrized by transition functions, describing the performed operations in a strict, mathematical way. Behaviours are assembled into complex networks, which are modelled as finite state machines (FSMs). In this paper, however, the distributed nature of the system is at the focus, so the cooperation between agents and their internal subsystems is presented skipping the definitions of behaviours described in e.g. [?], [?].
enabling access to NAO robot sensors and actuators. The repository agent arep was implemented by using the HOP framework [?], which is used for downloading RApps, communication between adyn and acloud , and invoking services provided by the repository agent arep . The interoperability between HOP programs and ROS nodes is possible due to the Rosbridge server, which provides a JSON API to ROS functions for non-ROS programs.
Fig. 1 presents the general structure of the RAPP system, consisting of agents residing in the cloud and a robot. The cloud part of the system contains a repository agent arep and a cloud agent acloud . The former holds the RApp store, enables the robots to download RAPP applications (RApps in short) delivered by the RApp providers and, additionally, may offer certain computational services. The acloud represents one or more agents that might be activated by arep or are simply external service providers, such as external databases, email servers etc. – because of this there is a striped background behind acloud . Each robot contains a specific core agent acore , which governs both its effectors and receptors, thus exposing the robot hardware through an appropriate representation to the RApps. The core agent acore communicates with arep , thus it is able to download RApps. RApps are composed of a single dynamic agent adyn executed on a robot and might be supplemented with agents running in the cloud. When adyn is executed acore becomes passive, whereas adyn supervises the robot using its predefined services offered by acore .
III. G ENERAL BEHAVIOUR OF THE C ORE AGENT
The implementation of the above described system is based on the integration of several different information technologies and programming frameworks [?]. The agents running on the robot, i.e. acore and adyn , were implemented as ROS (Robot Operating System) nodes [?], thus utilize ROS-based communication methods (topics, actions and services). The virtual effectors and receptors of the core agent acore are built on top of NAOqi – a distributed, object-oriented framework
As it has been already stated, each robot contains a specific core agent acore . This agent is responsible for communication with the user, downloading the dynamic agent adyn from the RAPP platform and providing to the dynamic agent the interface to the robot’s hardware. The hardware is represented in acore as virtual effectors and virtual receptors. The examples of NAO virtual effectors are: ecore,body that is responsible for the motion of all of its limbs and the head, and ecore,ls that governs the loudspeaker and executes text to speech transformation. The examples of NAO virtual receptors are: rcore,cam that is responsible for acquiring images by the camera, rcore,sonar that detects obstacles using ultrasonic sensors, rcore,touch that detects obstacles using micro-switches located in the feet of the robot, and rcore,mic that detects sounds using microphones and recognizes words. All those virtual entities are commanded by the control subsystem ccore . The general behaviour of the core agent acore is presented as an FSM in fig. 2. After initialization the core agent registers itself with the RAPP platform by connecting to the repository agent arep and starts to listen to the user’s commands. These can be of two forms, either an isolated keyword or a long command. The process of interpretation for those two forms of commands is different. The isolated keywords are known to the core agent and the core agent decides which dynamic agent should be downloaded in response. The long commands are only recorded by the core agent and then sent for interpretation
s1 Initialization
s2 Register with RAPP Platform Listen to the user 3
Isolated word Interpret 5 command s Abort
Destroy the s10 Dynamic Agent
Inform user s14
s
Failed
Failed
s13 Long command Record command
Interpret command to s4 by the RAPP Platform
Unregister from s11 RAPP platform
s6 Load the Dynamic Agent
s12 Finish
s7 Activate Dynamic Agent
Task finished
s8 Wait for Dynamic Agent command
s9 Execute Dynamic Agent command
Fig. 2.
Graph of the FSM governing the behaviour of the core agent acore
to the RAPP platform that decides which dynamic agent to download. In both cases the successfully interpreted user’s command results in downloading and activating the adequate dynamic agent adyn . If interpretation of the command fails, the core agent returns to listening to the user. After the dynamic agent has been activated, the core agent awaits and executes commands coming from the dynamic agent. Once the dynamic agent has completed its task it informs that the core agent can destroy it and go back to listening to the user’s requests. The core agent acore provides many different services that can be invoked by the dynamic agent adyn to control the robot and both receive and process sensor data. Each such service is represented as a sub-FSM symbolised by the double encircled node in the graph in fig. 2. It should be noted that all realtime critical services are provided by the core agent in its state s9 , so the delays introduced by the communication with the dynamic agent and the cloud may slow down the activities of the system as a whole, but will not be dangerous. In the further part of the paper three sample services which have been used in the exemplary application will be presented. IV. G ENERAL BEHAVIOUR OF DYNAMIC AGENTS The activities of the control subsystem of the dynamic agent adyn are described by an FSM presented in fig. 3. The dynamic agent is application dependent, however there are some parts that are task independent. The graph nodes labelled as s1 and s5 form the task independent part. The behaviours associated with those nodes are responsible for the creation of the connection to the core agent and prompting it to destroy the dynamic agent adyn respectively. The behaviours associated with the nodes s2 , s3 and s4 must be defined by the application
provider. As usually those are compound behaviours represented by a lower level FSM they are represented by double circles. Here three phases of application execution have been distinguished: initialisation, task execution and termination. acore command s1 channel initialization s2 Task initialization s3 Task execution s4 Task termination Send termination s5 command
Fig. 3. Graph of the FSM governing the activities of the control subsystem of a dynamic agent adyn
V. E XEMPLARY APPLICATION : NAO ROBOT FACILITATING DISPATCHING OF VOICE - MAIL An exemplary application providing a simple method of sending emails containing messages recorded by the NAO robot is governed by the user verbally. This example was selected, because of its simplicity and completeness. The application has to assume control of the robot effectors (loudspeakers), receptors (microphones) and utilizes external services in the form of Google mail server (gmail) to send messages, thus all elements of the system are involved. Upon user request the application is loaded from the RAPP repository (atextrep ) and activated on the robot control computer. The structure of the voice-mail application is presented in fig. 4. The dynamic agent adyn invokes services offered by the
GMAIL SERVER acloud
CONTROL ccloud SUBSYSTEM
ROBOT Services
adyn
CONTROL cdyn SUBSYSTEM
Services
acore
Current state
Commands
RAPP
Aggregated readings
VIRTUAL ecore EFFECTORS
VIRTUAL rcore RECEPTORS
Current state
Commands
Readings
NAO Ecore EFFECTORS
Fig. 4.
words, that should be recognized by the speech recognition software, whereas the subscribe function transforms sounds into recognised words. If a word is recognised with a probability lower than a certain threshold the word is treated as unrecognised, otherwise the control subsystem ccore returns the recognized word. acore
A. Selected core agent services The voice-mail application requires just 3 services from the core agent acore , enabling adyn to command the robot to say a given sentence, recognize a word or record a voice message. 1) Core agent say text service (fig. 5): causes acore control subsystem ccore to delegate to the virtual effector ecore,ls the synthesis of sound based on the provided text. The ALT extT oSpeech NAOqi module plays the role of that virtual effector. It calls a synchronous function say to transform the given text into sound.
ccore
rcore,mic
Rcore,mic
setVocabulary (dictionary)
rapp_recognize (dictionary,threshold)
subscribe (name)
start listening
loop [probability > threshold] getData() recognized word, probability unsubscribe (name)
stop listening
recognized word
Fig. 6. Sequence diagram of the RAPP recognize word service of the core agent acore
acore ccore
ecore,ls say(text)
Ecore,ls start saying text
task status
Fig. 5.
NAO Rcore RECEPTORS
Structure of the RAPP system controlling NAO responsible for sending a voicemail
core agent acore . Additionally, it utilizes services provided by “Gmail server”, which plays the role of the cloud agent acloud . Services provided by the core agent acore and communication between agents (implemented in the form of ROS services, topics and NAOqi library function calls) are presented in the following sequence diagrams. Asynchronous messages are represented by an arrow with a concave head, whereas synchronous services by a solid triangle arrowhead [?].
rapp_say(text)
ccore
CONTROL SUBSYSTEM
Sequence diagram of the RAPP say service of the core agent acore
2) Core agent recognize word service (fig. 6): causes the core agent acore control subsystem ccore to delegate to the virtual receptor rcore,mic the recognition of isolated words from a limited vocabulary. The virtual receptor rcore,mic consists of the following modules NAOqi: ALSpeechRecognition, ALSoundDetection and ALM emory. At the very beginning it invokes two functions from the ALSpeechRecognition module. The function setV ocabulary updates the set of
3) Core agent voice record service (fig. 7): causes the core agent acore control subsystem ccore to delegate to the virtual receptor rcore,mic the recording of sound for a given period. The NAOqi provides ALAudioRecorder module, which forms a part of the virtual receptor rcore,mic . To start gathering sound samples from the microphone it is necessary to invoke asynchronously from the ALAudioRecorder module the function startM icrophonesRecording. When the recording time is up the control subsystem ccore of the core agent acore commands the virtual receptor to send an asynchronous message to Rcore,mic . Once the recording stops, the mentioned service returns a path to the recorded file. B. Dynamic agent Task execution is described by the FSM presented in fig. 8. In an exemplary task of sending a voice-mail with the FSM nodes synchronously called services are associated. First five
acore
acore
adyn
ccore
rapp_record (time) a
rcore,mic
start recording
Rcore,mic
cdyn
rapp_record(time)
start recording
recorded file destination
{ba = time}
b
stop recording
stop recording Fig. 9. Sequence diagram of the “Record message” behaviour of dynamic agent adyn
recorded file destination Fig. 7. acore
ccore
Sequence diagram of the RAPP record service of the core agent
adyn
acore
cdyn
nodes utilize services provided by the core agent acore . They have been implemented as ROS services. The cloud agent acloud provides the service assigned to the last node s36 . In order to provide developers with an easy way of programming the NAO robot, all services were encapsulated in functions: RAPP API functions. These functions create the base language for NAO robot programming. They are presented by UML sequence diagrams. 1) The record message behaviour of dynamic agent (fig. 9): adyn calls the rapp_record function sending time as a parameter and suspends itself until the core agent acore returns the reply. 2) The inform user behaviour of the dynamic agent (fig. 10): adyn calls the RAPP API function rapp_say sending text as a parameter. The activities of the agent are suspended until the core agent acore returns the reply. 3) The recognize word behaviour of the dynamic agent (fig. 11): adyn calls the RAPP API function rapp_recognize sending a set of words that should be recognized by the virtual receptor rcore,mic . The dynamic agent adyn suspends itself until the core agent acore terminates the recognition behaviour and returns the recognized word as a reply. Once it has been
s31 Say: “starting word recognition” Get e-mail address based s32 on the recognised word s33 Say: “recording message” s34 Record message content s35 Say: “sending e-mail” s36 Send the e-mail Fig. 8. The graph of the FSM governing the execution of the user task – sending a voice-mail
ccore
rapp_say(text)
task status
Fig. 10. Sequence diagram of the “Inform user” behaviour of dynamic agent adyn
done, the dynamic agent adyn pairs the recognized word with an associated email address. 4) The send email behaviour of the dynamic agent (fig. 12): The “Gmail server” is used as a cloud agent acloud to service requests received from the dynamic agent adyn . Communication between the two agents is based on the SMTP (Simple Mail Transfer Protocol) protocol. The dynamic agent suspends itself after requesting the service to send the email. The behaviour ends when the cloud agent acloud replies. VI. C ONCLUSIONS The presented system was composed of a desktop computer and a NAO robot, communicating with a gmail server via wireless network. The acore and adyn agents were launched on the NAO robot control computer, whereas the repository agent arep was executed on a desktop computer, providing the HOP server with a set of downloadable packages, containing, besides others, the described voice-mail application. The repository and core agents are still undergoing extension. In particular, we are still extending the set of functions establishing the RAPP API for the development of applications, i.e. diverse dynamic agents adyn . This requires a broad range of exemplary tasks, hence currently we are developing a more complex task of hazard detection. People suffering from mild dementia tend to forget things. Among others they forget to shut windows, doors or taps etc. when going out. The NAO robot can localize such situations. This application utilizes
acore
adyn cdyn
ccore
rapp_recognize (dictionary, threshold)
recognized word gets email address
Fig. 11. Sequence diagram of the “Recognize word” behaviour of dynamic agent adyn
acloud
adyn
ccloud
cdyn
send mail (message)
reply from server
Fig. 12. Sequence diagram of the “Send email” behaviour of dynamic agent adyn
several complex functions provided by the core agent, such as motion, navigation, vision and communication functions, not described in this paper. However, we decided to present voicemail, because it requires utilization of both robot effectors (loudspeakers), receptors (microphones), and utilizes external services (gmail), yet it is simple enough to be brief and thus to guide the future RAPP developers. This is crucial, because the RAPP repository will be opened, enabling developers with different programming skills to produce and share their own robotic applications. For this reason we also want to simplify the process of task development. One of the possibilities is to provide tools facilitating task specification and code generation based on Model Driven Engineering (MDE) approach [?]. In robotics there is a noticeable increase in interest in this kind of approach. Prominent examples of recent progress in this field include Automax toolchain [?], based on the Eclipse Modelling Framework (EMF) [?] or toolchain based on JGraphChart and RAPID used in [?]. In our solution, we are planning to prepare a metamodel based on EMF and then use Graphical Modelling Framework (GMF) to provide essential tools, enabling the user to visually program the dynamic agent task and automatically generate the ROS packages.
ACKNOWLEDGMENT The authors acknowledge the support of European Union within the RAPP project funded by the 7th Framework Programme (Collaborative Project FP7-ICT 610947).