Autonomous Application Recovery in Distributed ...

5 downloads 25615 Views 1MB Size Report
vices for the recovery of faulty or replaced (control) devices after the detection of failures. Some IPMCS support recovery of applications, parameters and configuration data after loss ..... Application Execution Interface: Defines a kind of hard-.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

1

Autonomous Application Recovery in Distributed Intelligent Automation and Control Systems Thomas Strasser, Member, IEEE, and Roman Froschauer

Abstract—During the last decade a clear trend towards distributed automation in industrial systems was observable. This means that applications are executed at heterogeneous control devices and communication networks. One of the main drivers of this development was the availability of cheap computing and communication resources. Moreover, a strong market demand for operation and adaptation of automation and control services without any downtime is also often requested. As a result appropriate approaches recovering and (re-)configuring automation and control devices as well as even their services and functions during full operation are needed. The relatively new standard IEC 61499 “Function Blocks” provides a reference model for the development and implementation of distributed Industrial Process Measurement and Control Systems (IPMCS). It provides a scalable and open architecture to model distributed automation and control applications. The high-level goals of IEC 61499 can be summarized as interoperability, (re-)configurability and portability of distributed applications for IPMCS. Therefore, it provides a very good basis for dynamic (re-)configuration and recovery of applications and status information in heterogeneous IPMCS and may master some of the shortcomings of present-day systems. The main purpose of this article is to present and discuss a general concept for autonomous recovery of applications within the context of distributed automation and control systems which has been implemented using the IEC 61499 reference model. Index Terms—Distributed intelligent automation, automatic application recovery, (re-)configuration, Holonic Manufacturing System (HMS), Industrial Process Measurement and Control Systems (IPMCS), IEC 61499, Lower Level Control (LLC).

I. I NTRODUCTION

T

ODAY manufacturing processes will be performed more and more by automated systems and, as a consequence, the level of automation in factories and plants will increase steadily. Moreover, today’s production machines and systems are often build out of mechatronic components in order to modularize them and therefore to allow a higher degree of re-usage and reconfigurability. This is a very important point which has to be taken into account in order to fulfil today’s changing customer demands [1]–[4]. Associated with this trend is also the increase of the complexity of such solutions and approaches. One major factor is the usage of a large number of (control) devices, such as Programmable Logic Controllers (PLC), Industrial PCs (IPC), Embedded Controllers, along with sensors and actuators from different vendors in order to manage and control the industrial plants. T. Strasser is with the Energy Department of AIT Austrian Institute of Technology, 1210 Vienna, Austria (e-mail: [email protected].) R. Froschauer is with AlpinaTec GmbH, 5203 K¨ostendorf, Austria (e-mail: [email protected]). Manuscript received April 28, 2011; revised September 30, 2011; accepted January 20, 2011.

The installed automation devices, which are part of the mechatronic components, are often connected by heterogeneous communication networks and the (control) applications are usually distributed and executed across these networks. Such a concept is called a distributed Industrial Process Measurement and Control System (IPMCS) in the literature [5]. Interoperability, (re-)configurability and portability requirements are difficult to achieve in such complex IPMCS [5]– [8]. Moreover, the automation market has raised a demand for “zero-downtime” operation in IPMCS [7], [9]. Appropriate methods for (re-)configuring and recovering automation devices and distributed applications during full operation as well as their corresponding services and functions are needed. Current IPMCS already provide rudimentary and simple services for the recovery of faulty or replaced (control) devices after the detection of failures. Some IPMCS support recovery of applications, parameters and configuration data after loss of such information. A major shortcoming of state-of-theart IPMCS results from their proprietary recovery approaches which do not support heterogeneous communication networks. Moreover, standard-compliant implementations of recovery features are missing [10]. Considering to the above briefly described requirements and shortcomings of present-day IPMCS the IEC 61499 standard [11] may provide an appropriate solution. It features a scalable and open reference architecture to model automation and control applications for distributed IPMCS. Furthermore, it supports interoperability, (re-)configurability and portability of distributed automation and control applications. In addition, it provides a device management interface and therefore delivers the basis for dynamic (re-)configuration and recovery of automation and control applications also in heterogeneous (communication) environments. A. Contributions of this Article The main purpose of this article is to present and discuss a general concept and an architecture for avoiding unwanted downtimes to a maximum extent in distributed IPMCS. The proposed concept facilitates the exchange of hardware components (i.e., automation and control devices) without any need for extra configuration. Its purpose is to enable a kind of “Plug & Work” for IPMCS components. Current approaches for device recovery service in the field of decentralized and distributed control systems for industrial purposes are considered and analyzed with regard to their shortcomings. On the basis of these shortcomings a new distributed approach of less-downtime device replacement is presented. Moreover, a mapping of this concept to the IEC 61499 for

2

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

distributed automation systems is presented. It supports an automatic recognition of new devices in a distributed IPMCS and an automatic deployment of control applications, device parameters, user data, device configurations and states. In order to autonomously recognize new devices, and for bi-directional transfer of data and information, a standardized remote-configuration mechanism is required. The IEC 61499 architecture supports remote-configuration by design. This enables building runtime systems and applications, supporting online recovery and reconfiguration. This article also discusses a prototypical implementation of this approach on the basis of IEC 61499 function blocks. Furthermore, the prototype was tested using an Ethernet-interconnected distributed demonstration system on the basis of the well known IEC 61499compliant Function Block Development Kit (FBDK) [6].

connections to the PLC. These modules are connected via field bus systems [12], such as PROFIBUS, INTERBUS, DeviceNet, etc. and nowadays via Industrial Ethernet [13]– [15] (e.g., PROFINET I/O, EtherCAT, Ethernet POWERLINK, etc.). The major problem again is the dependence on the central PLC [5], [16]. In order to cope with these shortcomings modern IPMCS are designed to work in a distributed way. The main goal of such systems is to distribute the intelligence to more than one device in order to increase reliability and performance. In most cases the devices are connected via a flat communication architecture (e.g., CAN, Ethernet, Industrial Ethernet, etc.), which enables the implementation of user-specific hierarchies, like the wellknown Server/Client principle. [5], [7], [9], [16] B. Recovery of Device Configuration

B. Organisation of this Article Related work and useful concepts for device recovery concepts are presented in section II. Especially for control and field devices—which form the basic infrastructure of distributed IPMCS—different solutions are briefly explained, whereas a special focus is put on their recovery features. The basic idea of the proposed recovery concept is discussed in section III. Moreover, section IV gives an overview of the IEC 61499 standard, its configuration management interface and its capabilities to enable a standardize way to implement automatic application and device recovery. In addition, the mapping of the recovery concept to IEC 61499 elements, the recovery architecture and the related algorithms are presented in section V. Section VI presents a prototypical implementation whereas section VII explains performed experiments. A summary of the outcomes of the proposed approach and conclusions are provided in section VIII. II. R ELATED W ORK This section provides a brief overview of important developments in industrial automation. In addition, concepts for device recovery and their advantages and disadvantages in centralized, decentralised and distributed IPCMS are addressed. A. From Centralized to Distributed Automation Systems The most common topology of IPMCS is the usage of centralized PLCs controlling simple and complex machines or even whole production facilities. Every sensor and actuator is directly connected to this single point of intelligence and a more or less modularized control program drives the process. A major advantage of PLCs is the relatively easy location of faulty (field) devices, because each Input/Output (I/O) connection is assigned to a defined position. The big disadvantage of PLCs is that each device needs an own electrical connection to the PLC, which is vulnerable to failures. Moreover, a central PLC itself is a single point of failure since it is responsible for the execution of the whole control application. [5] The next step in the evolution of IPMCS was the introduction of decentralized architectures, which are using decentralized I/O modules to reduce the amount of direct

The replication and recovery of data is essential in order to guarantee the operation of decentralized and distributed IPMCS in case of failures. In order to be able to restore control applications, configuration data (e.g., device parameters, etc.) and state information, relevant device data have to be replicated in advance. The following possibilities divided into three categories are reported in the literature [10], [16], [17]: • Type of Storage: network-based or central/local storage, • Type of Information/Data which shall be recovered: applications/programs, states, configuration data, addresses, user data, etc. and • Automatic or User-driven Recovery: recovery of information/data with or without user-interaction and/or usage of special software tools. The following existing approaches, methods and concepts of device recovery services for IPMCS are discussed in detail: 1) Centralized/Local Systems: They are characterized by not using a communication network to export any kind of data to an external data storage. Hence the valuable data has to be stored in an appropriate non-volatile memory. In modern automation systems PLC’s with PC-conforming interfaces and/or IPC’s are becoming very common. Therefore, cheap and standardized storages, such as CompactFlash cards or USB sticks are used as backup memories. These systems usually have only support for application program and configuration recovery. Current centralized IPMCS do not support any kind of state, although a need for a new kind of system supporting these features can be felt [16]. 2) Decentralized Systems: In decentralized systems the devices are often connected via master/slave-based field bus systems and nowadays also with Industrial Ethernet approaches [10], [16], [17]. The slaves are running their own control program and the master is responsible for the communication between the slaves. In a very common approach the master stores the configuration parameters of its slaves and in case of recovery the user can download the last known parameter set to a replaced slave (e.g., in case of failure). In most cases the master can not automatically assign an address which belongs to a physical position in the system. Therefore, the user is responsible for telling the master where the replaced slave is located and which kind of process data is related to this slave.

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

The proprietary approaches of configuration and data recovery are briefly explained in Table I [16]. TABLE I B RIEF OVERVIEW OF RECOVERY FEATURES OF IMPORTANT DECENTRALIZED SYSTEMS ( I . E ., FIELD BUS SYSTEMS ) [16] Decentralized Systems PROFIBUS [18], [19]

PROFINET I/O [13], [20]

DeviceNet [21]–[23]

INTERBUS [12], [24], [25]

Provided Recovery Feature(s) manual address recovery; device recovery information (including user-data) is stored at the engineering host (i.e., General Station Description—GSD—file) and the master device, amount of recovery information is limited by the memory size of the master device master-controlled master/slave system; offers certain recovery features (i.e., recovery of IP-addresses, I/Oconfigurations & symbolic names); a slave can be assigned to one or more master controls to achieve a kind of redundancy CAN-based, message-oriented producer/consumer communication model, which supports Automatic Device Replacement (ADR); usage of a scanner device which scans the network and stores configuration information of every known device; scanner transfers stored configuration data to new (recovered) devices; due to the limited memory of the scanner this approach is not suitable for program and/or state recovery; furthermore the specific problem of two faulty devices of the same type is not covered due to the logical ring structure of the INTERBUS system a faulty device can be detected very easily and after the replacement of a faulty device the new device automatically gets its address and receives its configuration from the master device

Summarizing, several decentralized systems support the recovery of device configurations, a little amount of user data and in one case the physical device address. DeviceNet and PROFINET I/O are both using a kind of master device or scanner which is responsible for recovery. Furthermore, these systems support continuous operation of the other system parts during recovery. PROFIBUS and INTERBUS are more or less traditional systems which only support the recovery of configuration data, whereas INTERBUS additionally supports the physical address recovery, due to its ring structure. 3) Distributed Systems: With a special regard to the replacement and recovery of a faulty device during operation, some vendors have developed different distributed approaches which are briefly outlined in Table II. Such concepts for the automatic recovery of faulty devices work differently, whereas the storage of configuration data is done similarly in most approaches. In case of PROFINET CBA [26] recovery mechanisms of storing configuration data on a non-volatile memory or on specific controllers are supported. DeviceNet stores information and data in a scanner device with a very limited amount of memory. Both systems support the recovery of network-configuration data, parameters and programs in their corresponding communication environments (i.e., Ethernet/PROFINET or DeviceNet). A recovery among heterogeneous communication networks—which are often used in industrial applications—is currently not supported. 4) Missing Features: Considering the above discussed concepts and approaches there are still some features which are not or only partly supported by present-day systems. Basically, in most cases the user is responsible for setting up

3

TABLE II B RIEF OVERVIEW OF RECOVERY FEATURES OF IMPORTANT DISTRIBUTED SYSTEMS ( I . E ., FIELD BUS SYSTEMS ) [16] Distributed Systems DeviceNet with DeviceLogix [22] PROFINET ComponentBased Automation (CBA) [13], [20], [26]

Provided Recovery Feature(s) similar to DeviceNet; supports special field devices, which are programmable; the usage of function blocks enables the recovery of applications, if there is enough memory in the scanner Option 1: similar to some centralized approaches MultiMediaCards/Micro Memory Cards (MMC) are used to store programs and parameter sets; the Link Layer Discovery Protocol (LLDP) is able to detect faulty and new devices; the new device gets all recovery information from the data stored at the MMC Option 2: specific field devices and controllers with the function “Replace device without interchangeable medium” can be used without the need of MMC; the replaced field device receives the parameter sets from the corresponding controller

appropriate network addresses. Therefore, a way of automatic address assignment is missing and needs to be developed, whereas common used technologies like the Dynamic Host Configuration Protocol (DHCP) might be considered for this. Another issue arises from typical setups of current IPMCS which mostly features heterogeneous devices and networks. Consequently, a control program recovery is not or hardly possible, due to the lack of a defined and device-independent application and communication interface. Furthermore, neither of the approaches described before supports any kind of state recovery or fully automated transfer of applications or configuration data. Modern automation and control systems have to execute their tasks within defined real-time constraints and therefore the recovery of faulty devices should also be done within at least soft real-time constraints. Especially, the last issues can not be fulfilled by any present-day approach which raises the demand for a completely new kind of automation and control system. An overview of the most important shortcomings and missing features of current systems are listed in Table III below [10], [16]. TABLE III M ISSING RECOVERY FEATURES OF CURRENT AUTOMATION SYSTEMS [16] Missing Features Automatic address recovery

Automatic application and configuration recovery

State recovery

Real-time execution

Reason A device independent communication interface and standardized device identification is missing A device independent application and communication interface for recovery in heterogeneous communication networks is missing An appropriate (standard-compliant) representation and execution model for state recovery in a distributed IPMCS is missing Real-time capable hardware and software is missing, whereas a complete engineering model is missing too

III. A PPLICATION AND D EVICE R ECOVERY C ONCEPT This section provides an overview of the idea behind the recovery of applications and device configurations in case of

4

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

Heterogeneous Field Bus System / Communication Network

Control Application A Control Application B

Heterogeneous Field Bus System / Communication Network Programmable Embedded Controller damaged / faulty exchanged

 Control Device 1

Control Device 2

Control Device 3

Control Device 1

Control Device 1 Control Device n

· · · ·

Fig. 2.

Fig. 1.

Cont. App. C



Control Application D

Recovery scenario—exchange of a faulty control device

failures. Moreover, a recovery concept for distributed IPMCS in modular manufacturing system and machines is introduced which allows to realise a Plug & Work approach and therefore to minimize unplanned production downtimes. In addition, key requirements for the realization and implementation of this concept are discussed.

Replication of Device configuration Device parameters Application parts Application states ...

Control Device 2

Cont. App. E

Control Device 3

Control Device n

1

1 1

2 After replacement of a faulty device automatic detection and recovery of · Device configuration · Device parameters · Application parts · Application states ...

Overview Recovery Concept

order to restore the whole device setup. During this procedure no additional configuration work has to be performed by an engineer or a technician. Fault models and control device failure detection [28], [29] are not directly covered by the presented approach. It focus only to the replication of applications and device configurations to other devices and in case of a hardware or software failure the configuration and application data are automatically transferred back in order to reduce the unwanted downtime. B. Key Application and Device Recovery Requirements

A. Recovery Concept in General The replication and recovery of automation and control applications and device configurations (i.e., parameters, application parts and states, etc.) in case of hardware and software failures1 becomes more and more important in automated manufacturing systems composed out of mechatronic components. Figure 1 shows a typical scenario for such systems whereas a faulty device (e.g., a mechatronic component with a programmable embedded controller) has to be exchanged with a new one. In order to minimize the downtime of the production equipment the exchange and ramp-up of the new device should be performed as fast as possible. An ideal procedure would be to remove the faulty device and to plug in the new one (i.e., connect the mechanical and electrical hardware to the rest of the production system) and the configuration and setup of the software is done automatically. This approach would lead to Plug & Work components which are comparable to the well known Plug & Play technology from the computer technology domain [27]. In order to realize such automatic recovery of automation and control applications as well as complete device configurations—which is necessary to fully achieve the Plug & Work vision—an appropriate control software environment and the corresponding recovery algorithms and services have to be developed. In the following Figure 2 the basic idea behind recovery procedure is shown. The whole configuration of a device (i.e., parameters, states, application parts, etc.) is periodically replicated to other control devices. In case of a device failure (e.g., hardware fault) and after its replacement the saved data and information from the other devices are transferred back in 1 Examples for failures are: mechanical breakdown, loss of communication (e.g., broken cable), defective control equipment, etc.

For the realization of the above briefly introduced replication and recovery concept the following requirements have to be taken into account in addition to the missing features as described in section II-B4: • Open System Architecture: This should allow the extension of existing recovery services if necessary, even during operation of the control devices. • Platform Independence: The implementation of the recovery concept has to be platform independent and must be portable to different controller environments. • States Monitoring: A monitoring mechanism for system and device states must be implemented to ease fault diagnosis, replication and recovery. The monitoring should include—but not limited to—a logging of application data, connections, active tasks, active applications, etc. • Timing Behavior: Switching between time critical and uncritical execution of recovery services should be considered. Real-time execution of the recovery services might be sometimes and sometimes not necessary during the recovery process. • Different Communication Networks: Since usually more than one communication approach is used in a complex IPMCS, the replication and recovery of applications and device configurations should be possible across heterogeneous networks. • Standard-compliant Realization: In order to allow the replication of applications and device configurations as well as as there recovery in case of a failure in a heterogeneous communication and hardware environment the implementation should be compliant to domain standards. A closer look at the replication and recovery possibilities and service of present-day IPMCS, as described in section II, shows that some of the above presented requirements

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

can be fulfilled. In principle, missing features, services and functions for the automatic application recovery can also be implemented using state-of-the-art technology like PROFINET CBA, DeviceNet, etc. A hardware and communication network independent recovery solution—which is a key requirement in heterogeneous environments—is hardly realizable. Analyzing these shortcomings, IEC 61499 might be considered as a possible alternative solution due its open system architecture. Therefore, a short introduction of its basic architecture and recovery possibilities is provided below. IV. IEC 61499 AND ITS BASIC C APABILITIES FOR AUTOMATIC A PPLICATION AND D EVICE R ECOVERY The following section present details about the IEC 61499 reference model. Moreover, possibilities and services of automatic recovery of faulty devices using IEC 61499 concepts— especially the management interface—are also explained. A. The IEC 61499 Reference Model The IEC 61499 standard [11] provides an approach to handle the increased complexity of next generation automation systems as mentioned in the introduction. This IEC reference model has been defined as a methodology for modeling open distributed IPMCS to obtain a vendor-independent system architecture. It serves as a reference architecture that is developed especially for distributed, modular, and flexible control systems and meets the fundamental requirements of open distributed systems as defined e.g., by Christensen [30], [31] or Lewis [5]. The IEC 61499 standard has even more ambitious objectives: They can be described by examining the three issues portability, (re-)configurability, and interoperability (for details see e.g., [5], [6], [8], [32]–[36] and [37]). The standard defines concepts and models that allow modular control software which is encapsulated in F UNCTION B LOCKS (FB) to be assembled to control applications and later on distributed to (embedded) controller nodes (i.e., called D EVICES in IEC 61499). It specifies an architectural model in a generic way and extends the FB model of its predecessor IEC 61131-3 [38] with an additional event handling mechanism. FBs are an established concept for industrial applications to define robust and reusable software components. They can store the software solution for various problems and they have a defined set of input and output parameters, which can be used to connect them to form complete automation and applications. The essential difference between IEC 61499 and IEC 611313 [38] is the execution model [39]. IEC 61131-3 has a cyclic execution model for control algorithms but IEC 61499 is based on events, and this means that IEC 61499 also supports asynchronous execution [5]. As a result, distributed IEC 61499 applications and/or application parts can be executed in a synchronous way through time triggered events but also in an asynchronous way as discussed e.g., by Strasser et al. [40] or Li Hsien et al. [41]. The underlying communication network or field bus system and its corresponding protocols are not directly covered by the IEC 61499 standard. An IEC 61499 “Compliance Profile for Feasibility Demonstrations”, which was provided by the Holonic Manufacturing

5

Systems (HMS) consortium, specifies its usage based on Ethernet [6] for distributed IEC 61499 automation and control applications. Nevertheless, also other communication networks and field bus concepts (e.g., Ethernet/IP, Ethernet POWERLINK, PROFIBUS, CAN, Modbus/TCP, etc.) can be integrated in IEC 61499 solutions [15], [42]–[47]. In summary, the most important concepts of the IEC 61499 reference model are the event-driven execution approach, the possibility of distributing control applications to different control devices, the standardized management interface providing a basic set of reconfiguration service and the application-centered modeling methodology. More details about the IEC 61499 management interface and its usage for reconfigurable automation and control are given in [5], [9], [11], [48]–[51] and [8]. The IEC 61499 standard therefore provides an ideal starting base for the architecture of next generation automation and control systems for various domains (e.g., Manufacturing [9], [35], [51], [52], Power and Energy Systems [33], [53], Logistics [54], etc.). Moreover, closed loop control applications realized with IEC 61499 have already been discussed by [55], [56] and [57]. In addition, the usage of IEC 61499 for Lower Level Control (LLC) in a HMS, performing real-time control, is already reported by various authors like Lepuschitz et al. [51] and Vrba et al. [58], to mention a view of them.

B. Possibilities of Recovery of Faulty Devices and Applications using IEC 61499 For the application and state recovery of faulty devices standards and guidelines have to be specified. Such rules are especially in heterogeneous communication network environments necessary whereas also well defined interfaces are essential. A minimum number of interfaces which should be applied are listed below [10], [16], [59]–[61]: •





Communication Interface: Defines a method that ensures the communication between devices and applications in heterogeneous networks. Application Execution Interface: Defines a kind of hardware abstraction layer for a device independent representation of applications. Such applications should be executable without the need for recompilation, especially on heterogeneous devices. Application Transfer: Defines an interface for the application transfer between devices and engineering tools.

In the IEC 61499 standard no rules or guidelines for the implementation of a recovery system are given but the standard defines very interesting concepts and methods which can be used for the above introduced interfaces [5]. The abstract and of course hardware-independent representation of devices, applications and FBs is fundamental for a heterogeneous concept of application and device recovery. On the basis of the shortcomings and missing features of current approaches— as described in section II-B4—the following list provides a brief overview of possibilities using the IEC 61499 reference model to implement an automatic application recovery of faulty devices in heterogeneous communication networks [16]:

6

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

Automatic Address Recovery: The IEC 61499 standard does not define how to assign unique identification addresses to devices, but it defines a hardware independent concept of representing information. This concept may be used to implement a device identification vector, which is used to identify new devices and assign an address to it. On the basis of this identification a distributed automation and control system would be able to restore the address of a replaced faulty device, by checking the type of the faulty and of the replacement device. • Automatic Application and Configuration Recovery: The abstract application interface is used to interpret, transfer and of course execute applications on different types of devices. Current IEC 61499 runtime systems (e.g., FBDK [6], 4DIAC [40], nxtControl [62], etc.) are using a middleware-based approach to reach a mostly hardware independent interface for the execution of programs. These concepts enable a hardware independent transfer and therefore restoration of programs and configuration data in heterogeneous networks. Moreover, the usage of IEC 61499 Communication Service Interface FBs allows the encapsulation of different communication protocols in abstract communication patterns (e.g., Server/Client, Publish/Subscribe) which makes the control application more independent of the underlying communication network [15], [43], [45], [47]. • State Recovery: The concept of the abstract application representation can simply be extended by adding an appropriate method of representing the actual states of program parts (i.e., FBs). Currently, the IEC 61499 does not provide a guideline or even a recommendation for this2 , but due to its openness it would be possible to define a kind of statemap which belongs to the appropriate application description. Using the defined ways of communication this statemap may be used to recover states of a faulty device and the mapped control applications and parts to it. • Real-time Execution: Due to the fact that the IEC 61499 does not give information on how to implement a runtime system the implementation of a real-time capable runtime system is possible too. Therefore the engineering model has to be extended with appropriate concepts of defining and verifying real-time constraints. A very promising approach has already been introduced by Zoitl [9]. Basically, the IEC 61499 reference model opens various possibilities, but with regard to the shortcomings of current approaches and concepts (see section II-B4) it may only be capable of overcoming all these missing features, in case some extensions have to be made. Especially the configuration management interface has to be extended with additional services for device operations, such as identification, storage of configuration data and states.

Recovery Server



... Network

Recovery Client

Fig. 3.

Recovery Client

Recovery Client

...

Recovery client/server architecture [16] Recovery Master

Recovery Master

... Network

Recovery Slave

Fig. 4.

Recovery Slave

Recovery Slave

...

Recovery multi-master concept [16]

devices as well as the resulting architecture. In V-A the differences and main functions of these two types of devices are explained. V-B introduces the basic algorithm of registering recoverable-devices in distributed systems and explains the needed processes. Moreover V-C1 explains the communication messages in further detail and gives examples of their usage. The different methods of transferring applications and states from and to devices are explained in V-D. In V-E an idea of recovering states is introduced and illustrated using an example. Finally V-F gives an example of how to implement a complete recovery system. A. The Multi-Master Concept Traditional recovery concepts, which are mostly used in storage networks for databases or even control systems, are using a client/server architecture as shown in Figure 3 (see also [63]). Every client tries to store its valuable data on a central recovery server, either client- or server-driven. In case of failure the client can restore the application from the server. With regards to this type of system numerous algorithms of replica propagation have been developed (for details see [17]). The big disadvantage of this approach is the vulnerability to failures in the central server. Therefore the multi-master concept, as shown in Figure 4, introduces the ability to integrate more than one recovery server in the network. Due to the changed type of features the server will be called recovery master or masterdevice from now on. Depending on the requested redundancy the amount of master devices can be increased arbitrarily. This requires a new kind of network communication features and a management rule set to determine the execution behavior. The approach described in this article is based on a multiple access network structure where every device has the ability to send and receive messages from and to every other device (for example Ethernet or CAN). Therefore, each of these network technologies supports a kind of multi-cast communication which is used instead of or additionally to numerous direct point-to-point connections. [10], [16]

V. R ECOVERY A RCHITECTURE AND A LGORITHM This section describes the main concept of communication and interaction between recoverable-devices and recovering2 This

issue could be easily described in an IEC 61499 compliance profile.

B. Registration and Recovery Procedure The device registration process describes the communication between master-devices and slave-devices in the network

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

...

Control Application n

Control Application 1

Replication Master

Management

Control Application 1

Replication Master

Management

Management Startup

Control Application 1 Get (random) Master-ID

Fig. 5.

Slave 2 Redundancy: 3

...

Device with Master Application

Wait for message

Control Application n

Control Application 1

Management

Control Application n

Control Application 1

Replication Slave

Management

Control Application n

Control Application 1

Replication Slave

Management

Slave 1 Redundancy: 2

...

Control Application n

Send: Master register request

Network

...

Replication Master

...

...

Control Application n

Recovery Group Master 2

Master 1

7

Receive: Master register request

Other Device

yes

Definition of a recovery group [10], [16]

Decrease each TTLC

Own message? no

and how to assign new slave-devices to one or more corresponding master-devices taking care in case of a recovery. The used replica placement strategy comes close to the wellknown “Client-based replica” or “Pull-approach”, whereas some aspects are similar to the “Push-approach” too (for details see [17]). The application process describes how a slave - application is queried, stored and transferred back to the slave-device by one or more master-devices. With regard to a software-based implementation the Master- (MD) and the Slave-Device (SD) can be represented by a master- and a slavecomponent. Therefore, the term master-device is equivalent to the term ‘device with master component’, whereas the functionality is reached by an additional software component, such as a master application. Similar to the MD the slavedevice can also be called “device with slave component”, whereas the slave component is supposed to be a part of each standard device. Furthermore the whole set of devices may also be called Recovery Group (RG) (see Figure 5), because this group of devices contains all necessary participants for a system recovery. The RG approach can be divided in two different main operational sequences: • Device Registration Process (DRP): It describes the communication between MDs and SDs in the network and how to assign new SDs to one or more corresponding MDs liable in case of a recovery. The used replica placement strategy comes close to the well-known “clientbased replica” or “pull-approach”, whereas some aspects are similar to the “push-approach”. • Application Query and Transfer Process (AQTP): This process describes how a slave-application is queried, stored & transferred back to the SD by one or more MDs. The devices are using a pre-defined set of messages to communicate with each other. These messages are called • Master Register request (MR), • Slave Register request (SR), • Slave Update request (SU), • Slave Recovery request (SR), and • Slave Delete request (SD) and are used by several system processes [10], [16]. These processes and the linked algorithms are described within the following subsections. 1) Master-Component Registration Algorithm: The mastercomponent registration process describes the way of how de-

Master already in list?

Fig. 6.

yes

Check ID / TTLC

no

ID != -1

Add master info to list

Update master info in list

Set TTLC to sizeof (master list)

Add master info to list

ID = -1

Delete master

Master registration sequence [10], [16]

vices with master-components (i.e., MDs) communicate with other MDs, especially when adding a new MD to the network. Right after starting up all MDs there is no organization between the devices and no device knows whether it should respond to a message or not. Therefore the main goal is that the MDs automatically build a hierarchy among them, by using self-generated random numbers also called masterID’s. This master-ID is generated by every MD right after the initialization of the network interface for multi-cast capable communication. The master-ID can be every positive integer number from 1 to n, whereas n depends on the highest possible integer number of the specific device. After this process a master register request is sent to the other members of the recovery group (i.e., multi-cast). Every other MD gets this message and adds the information about the new devices to its own masterlist. The message contains the physical address of the new MD, such as an IP-address in an Ethernet-based network, and its master-ID. If the message contains information about a MD which is already in the list, its information is updated. Each MD is performing these steps, which are also shown in Figure 6 and can therefore check whether its own master-ID is unique or not. In case a MD detects that its master-ID exists twice, it changes its own master-ID to a new random number and sends a new master register request message. This process may be repeated until every master-ID is unique. With regards to deterministic execution the master-list may be implemented static, because otherwise the list might get too large, due to adding too much devices. Therefore, the size of the list has to be defined before start-up. If a MD receives a message with a master-ID equal to −1, the sending device will be removed from the receiver’s list. [10], [16]

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

4

3

App licati on tr a

t req ue s

ca pli Ap

co ve

ry

Publish new master ID with a master registration request

Fig. 7.

1

Fig. 8.

Slave has been replaced by a new one with same ID

Slave 1 Redundancy: 2

Control Application n

Control Application 1

Replication Master

Management

Control Application n

Control Application 1

Replication Master

Slave 2 Redundancy: 3

...

Control Application n

Control Application 1

...

Replication Slave

Increase master ID

not accepted

Management

Decrease master ID

...

Control Application n

Delete slave from list

3

accepted

st que t y re reques very Reco

Control Application n

Add slave info to list

r ove

· Slave 2 · Slave 3

3

2

Control Application 1

yes

Replication Slave

First come first serve

no

Management

Amount < Redundancy

Rec

Re

5

ns

...

Master List

· Slave 1 · Slave 2

Control Application 1

Count master with higher ID

accepted

ra nt tio

...

Master List

fe r

Replication Slave

Receive: Slave register request

Master 3

4

· Slave 1 · Slave 2 · Slave 3

nsfe r

Device with Master Application

...

Master List

Management

... Control Application n

Control Application n

Control Application 1 Wait for message

Master 2 Control Application 1

Management

Replication Master

Management

Master 1

Management Send: Master register request

Replication Master

8

Slave 3 Redundancy: 2

Slave recovery sequence [10], [16]

Slave registration sequence [10], [16]

2) Slave-Component Registration Algorithm: After the master-component registration process is completed, the devices with slave-components (i.e., SDs) need to register themselves at the RG to establish the relationship between SD and MD. Therefore they start sending registration messages with the purpose of being assigned to one or more MDs responsible for them in the case of failure. The main part of accepting SDs is done by MDs very similar to the master-component registration process. The slave-component registration process works on basis of lists containing information about registered SDs in each MD. Contrary to the master-list which contains the same data in each MD, the slave-list contains only those devices which should be served by the specific MD. In case of a slave register request a hierarchy is derived from the master-ID of each MD. Depending on the specific position of each MD in this hierarchy and on the requested amount of redundancy the information of a SD will only be kept by those MDs, which have the highest master-ID. The algorithm (shown in Figure 7) assumes that the RG contains enough MDs to cope with all the redundancy requests and the MDs are registered by each other. Furthermore each MD has to listen and wait for incoming messages. [10], [16] If a device wants to register for a recovery, it has to publish a slave register request message which contains its network address, its requested redundancy and an application identifier. Every MD receives this message and checks whether it is allowed to accept the SD registration or not. On the basis of the master-list and the master-IDs the MD can determine its own position within the hierarchy and if it is between “highest position” and “highest position minus redundancy” the allowance for accepting the SD is granted. Therefore the redundancy determines the amount of MDs accepting a new SD. After accepting a SD the masterID is automatically lowered by a defined value or a random number between the old ID and zero. The change of the ID is published to the RG using an extra master registration message (this process is similar to the master-component registration process described in section V-B1). If the MD receives a

message from a SD which is already in its list, the stored information is deleted. Therefore, every slave register request effects a new distribution of the SD information across the whole RG. All stored information about this SD is deleted to prevent the system of version conflicts. After a successful registration of a SD the MD tries to retrieve the requested application either from the requesting device or from a local repository. [10], [16] 3) Slave-Component Update Algorithm: This mechanism is used to update existing device information for SDs which are already registered at one or more MDs. In case of an application update on a SD, this device may send a slave update request message to the RG, which contains the same information like a slave register request message. In opposition to the register request which would result in a completely new distribution of the device information, the update request is used to force the MD to update its information, including the stored application data. The RG has no other possibility to detect an application update and therefore this message can be used to keep every application up to date and prevent the RG from delivering old versions to devices. [10], [16] 4) Slave-Component Recovery Algorithm: Succeeding the registration and updating process the slave recovery request starts the prepared recovery process, as shown in Figure 8. Every MD which has the requesting SD in its list starts to transfer the locally stored application to the requesting device. The multiple connections between the devices are handled by the “first come - first serve” principle. Therefore, no additional algorithm is necessary and the amount of MDs transferring an application to a SD can be increased without any need for further configuration. The way in which the application is transferred depends on the implementation and is not covered in further detail in this work. [10], [16] 5) Slave-Component Delete Algorithm: There are two ways of deleting a SD entry from the slave-list of a MD [10], [16]: • Normal Device Deletion: A normal device deletion causes a simple deletion of a specified device information entry from the slave-list of a specified MD. In a distributed system this kind of deletion should be avoided, because

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

Master List

Slave List

myself

Slave 1: App 1

Slave 2: App 2

...

Slave n: App k

Master 1

Slave 1: App 1

Slave 2: App 2

...

Slave n: App k

Master 2 ... Master 3

Slave 1: App 1

Slave 2: App 2

...

Slave n: App k

Slave 1: App 1

Slave 2: App 2

...

Slave n: App k

...

...

...

...

...

Slave 1: App 1

Slave 2: App 2

...

Slave n: App k

Control Application n

Master Control Application 1

Slave 1: App 1 Slave 2: App 2 Slave 3: App 3 … Slave n: App k

Replication Master

Slave List

Master 1 Master 2 Master 3 ... … Master n

Management

Master List Control Application n

Control Application 1

Replication Master

Management

Master

Master n

Fig. 9.

Simple Master-Slave List [16]

the information of a deleted SD is not stored by any other MD. Therefore this procedure can result in undefined behavior of the RG, due to wrong information about the requested redundancy and the actual number of slave-list entries belonging to a specific MD. • Redundancy keeping Device Deletion: The second approach for deleting a SD solves the problem mentioned above and keeps the redundancy level for an application automatically at the requested amount. Using the slavecomponent registration process, as described in section V-B2, a SD information entry can be deleted without changing the redundancy level of an application or a SD. The MD generates an “alias” slave delete request which is sent to its own message dispatcher. Receiving this message the MD deletes the SD from its list and generates an “alias” slave register request containing the information about the deleted SD. The MD now decreases its master-ID to zero by sending a new master register request. After this the prepared slave register request message is sent to the RG. By temporary setting its master-ID to zero the deleting MD prevents itself from accepting the SD again, without storing the old device information. The other MDs react on the slave register request and distribute the information similar to the slavecomponent registration process. After this procedure the deleting MD generates a new random master-ID and sends another master register request message to tell the other devices that it is accepting messages again. 6) Multiple Application Recovery on Slave Devices: The approach described in V-B4 assumes that there is only one application per slave. In case of a recovery request every master-device starts to transfer the stored application to the slave-device and the single connection handler of a slavedevice accepts one incoming connection. The other masterdevices will get a connection time-out and stop their recovery. In case of multiple applications which need to be recovered the missing hierarchy leads to trouble, because every masterdevice starts to transfer the different applications. Due to the “First come - first serve” principle only one application is recovered (see Figure 9). The others will receive a time-out. Adopting the simple approach to support multiple application recovery two extensions are possible [16]: • Recovery Group Listener on Master-Device: This concept extends the simple master-slave list with an additional slave-list (also called passive slave-list) for each MD (see Figure 10). Every time when a master-device receives a slave-based message it determines whether it is responsible for this request or not, by looking up or adding the slave information to its slave-list. If the device is not responsible the message is ignored.

Fig. 10.

9

active slave list

passive slave list

Passive Slave List Concept [16]

The recovery group listener determines which masterdevices are accepting this message by using the masterID. The ignored message is stored in extra slave-lists which belong to the other master-devices in the recovery group. This concept leads to a different response concept on how to react on recovery messages. In case of an incoming recovery message the master-device has to look whether it is responsible or not. If the recovery request is accepted it has to determine which other master-devices are responsible for this device using the additional slavelists. By using the master-ID every master-device can build its own hierarchy to find out in which order the master-devices should transfer applications to a slavedevice. • Multiple Connection Handler on Slave-Device: This extra component shall manage multiple connections on the slave-device and is called “Multiple Recovery Connection Manager” (MRCM). Unlike the normal “Recovery Connection Manager” (RCM) which allows only one connection, this component is accepting one connection per application. The MRCM is multiplexing the multiple connections to the one and only connection which is allowed by the RCM. Therefore, there is no need for change in the master-device, because the multiple connections caused by each master-device which is responsible for a recovery message, are handled by the slave-device. Using an additional component on the slave device causes increased memory requirements on the slave device. Depending on the different types of slave-devices and the amount of applications which need to be recovered one of the approaches described before may be appropriate. In most cases the recovery group listener will be the most preferable solution because of the low memory requirements on the slave-device. C. Communication Messages in Detail The messages are designed to be as short as possible to save network traffic. Using a packet-based communication protocol, such as UDP MultiCast, it should be possible to transfer all needed information within one datagram. This ensures a fast response and overall performance of the system. Additionally it is easier to adapt this concept to different industrial fieldbus systems when using only short messages. 1) Common Message Structure: The following data is contained in every message sent from or to the recovery group. The common parameters of each message are [16]: • Message Identifier: Defines the type of a message. • Device Identifier: This identifier specifies a unique identification of a device. A device shall be characterized through a set of parameters which can be used to generate

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

Fig. 11.

Device identifier

(a)

Message identifier

Redundancy

(b)

Ethernet master (a) and slave (b) registration message [16]

Slave

> send request

...

Request ID = 1

> process request > send response

Response ID = 1

Request ID = 2

a unique hardware-identification. Especially in case of recovery it is necessary to determine if a new device has replaced an old one or not and whether the stored application is executable on the new device or not. Therefore device - classes and compatibility profiles must be designed to ensure the recovery group can identify new devices. In Ethernet-based networks a static IPaddress might be used for identification. Based upon this information any device must be able to establish a data connection to this device. 2) Special Message Parameters: The following special message parameters are considered: • Application Identifier (slave-based communication): This identifier specifies which application shall be recovered. In case the device has several sub-devices (such as several instances of an application or resources in IEC 61499) this tag must also contain information about the location of the application on the device. Based upon this any device must be able to find an application on the device. • Redundancy (slave-based communication): Using this information field a slave device can tell the recovery group which level of redundancy is desired for the application or device. The redundancy is expressed through an integer number greater than zero. Additionally a redundancy value equal to zero can be used to express the request to remove the slave device from the whole recovery group, which means from every slave-list. • Master-ID (master-based communication): The masterID is used to build a hierarchy of master devices. Under common circumstances the master-IDs are within the range from 1 to n (whereas n is limited to the minimum size of the integer representation of a master-device in the network). A master-ID equal to zero expresses a silent master which is currently not able to react on incoming messages or executing a redundancy keeping slave delete. Additionally a master-ID with a value of -1 can be used to express the request to remove the master device from the recovery group, which means from every master-list. 3) Sample Message Structure on the basis of Ethernet: Figure 11a shows the message for the master registration for an Ethernet based network whereas Figure 11b shows the message for the slave registration. D. Application Query and Transfer based on IEC 61499 After explaining the methods of managing distributed devices to recover applications this section explains how an application can be transferred from or to a device. 1) Step by Step Application Transfer: This method uses single commands for transferring an application from or to the device. After each request the communication partner

~

~ Request ID = n

> process response

Fig. 12.

> process request > send response

Response ID = 2

> process response > send next request

...

Control Application n

Application identifier

Device identifier Message identifier

Master

4

Control Application 1

Master ID

Test-application

Replication Slave

192.168.1.100:2000

Management

S

Control Application n

338

Control Application 1

192.168.1.1:2000

Replication Master

M

Management

10

Response ID = n

> process request > send response

Step by step application transfer [16]

must send a response to ensure the last requested action was processed with no error. In case of an error the response contains an appropriate error code. When transferring an application from a master to a slave device the commands are generated out of an XML-based application description. The basic communication model is shown in Figure 12. As described in section IV the IEC 61499 defines several commands to communicate with remote devices. An application transfer is done similar to a IEC 61499 development tool such as the FBDK [6] and typically uses the commands CREATE:, WRITE: as well as QUERY:. The commands are sent and received in a specified XML-syntax which is described in [11] or [6]. An example command is given in Listing 1 below: 1 3



Listing 1.

IEC 61499 XML-syntax for application transfer [6], [11]

The main disadvantage of this method is the high network traffic caused by the huge amount of response messages in case of a full program transfer. Furthermore this kind of transfer can take a long time due to various delays caused by processing all the response messages. 2) Burst Application Transfer: This approach enhances the method described in V-D1 by omitting the response messages. The remote device receives a continuous stream of XML commands, caches them and executes them after receipt. Due to the acceding numbering of the requests the remote device can determine whether it has received every request or not. The burst mode is initialized with a special start message and an optional stop message whereas each contains information about the length of the script. Using this information the receiving device can determine whether it has enough cache memory to store the script or not. This method improves the speed of transmission dramatically and lowers the network traffic. This approach extends the specification of the IEC 61499 “Compliance Profile for Feasibility Demonstrations” (details see [6]) by modifying the general management interface. 3) File-based Application Transfer: The approaches described above are all sending XML-commands either step by step or as complete stream using the XML-syntax defined in [11]. The “file-based application transfer” goes a step further, it

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

transfers the whole XML-description of an application directly to the other device. The receiving device has an own tiny XML-Parser to parse the file and generate the commands and scripts which are necessary to generate the requested application. After this the locally generated commands can be directly executed on the device through the local device manager. Furthermore the handling of errors is done on the remote device which saves a lot of calculating power on the programming device. The master devices are only responsible for distributing and storing the XML-based application description. Using the IEC 61499 device manager each slave device can program itself by parsing the XML-file. Additionally, the XML-file can be compressed using appropriate embedded compression algorithms to reduce the network traffic. Furthermore the checksum of the compressed file can be used to identify different versions of an application or determine whether the application has changed or not. The main advantage of this approach is its universality, because giving the remote device its own XML-parser opens the door to numerous features, such as XML-based logging. The main disadvantage is the increased memory consumption due to local XML-engine and its features.

E. State and Data Transfer In order to enable the backup and restore of states and variables the presented approach assumes that the devices are capable of querying their own states and variables and store them into a file or any other defined format. Therefore a standardized syntax is needed to store the states and data in an reusable way. [11] presents Document Type Definitions (DTDs) to exchange IEC 61499 library elements between software tools. According to these DTDs this approach defines a new DTD to express the states and variables of an application in reference to the XML description of an application. Basically the new DTD adds an additional attribute expressing the actual state and omits the unnecessary elements like comments or type declarations. Listing 2 shows the most important part of this new DTD, whereas the full DTD can be found in [16]. 2

4 6 8 10 12 14 16 18 20 22



Listing 2.

DTD description for state recovery [16]

Master-slave registration Multi-cast communication interface

Application transfer Application / state-map query Multi-cast communication interface

TCP/IP Communication Interface

File-based XML parser

Message Dispatcher

Burst mode

Master list

Message dispatcher

New slave discovered Slave needs update Slave needs recovery Move existing slave

Slave list

Data interface

Delete existing slave Delete existing master

Fig. 13.

11

Multi-cast address Automatic mode List size Initial list members Recovery management interface

Step-by-step

Application manager

Configuration interface

Command generator

Application / state-map storage

Application storage size Query mode Communication parameters

Resulting Recovery System Overview [16]

According to the DTD shown above a XML-file containing all states and input/output values, can be generated, which can also be called state-map. In case of recovery this statemap and the corresponding application description are used to recover the last valid state on a device. The actualization of the state-map is triggered by a state transition, whereas it depends on the application whether every transition causes an actualization or not. The transitions are executed within every Basic FB (BFB) which has an internal state-machine or every Service Interface FB (SIFB, see [11]). A device can store the actual state or the actual active sequence of FBs using the state field. If there is no active state or sequence inactive will be stored. In case of recovery this information can be used to set a FB into the same state as it was before the recovery. A restore of every connected input or output of a FB can be done by adding two fields to the representation of each connection. These fields contain the actual value of the source output and the destination input. Using this state-map and the transfer concept described in V-D3 leads to a simple and effective approach of transferring and thus recovering applications and their last known states and variables. The XML format has the main disadvantage of an increased overhead for storing states and values, but due to heterogeneous networks a platform independent method is preferred. [16]

F. Resulting System Architecture The approaches of transferring applications described in this section are strongly based on the capabilities of the IEC 61499 and its XML interface, whereas the master/slave registration can be implemented in any programming language. The application transfer uses the XML interface and extends it where necessary (management interface and its commands). For using state and variable recovery the basic runtime system must be extended with the functionality of saving states and variables to memory or to disk. The master/slave registration uses MultiCast communication and an implementation-independent concept for managing all devices. Therefore, the main concept which is outlined in Figure 13 represents a holistic approach which can be fully integrated in the IEC 61499 specification.

12

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

Master

TargetPrgmCltr EVENT

INIT

INITO

EVENT

EVENT

REQ

CNF

EVENT

EVENT

INIT

INITO

EVENT

EVENT

LISTEN

RECOVER

EVENT

EVENT

DEL

DISCOVER

EVENT

EVENT

ID

CNFID

EVENT

BOOL

QI

QueryXML BOOL WSTRING WSTRING

Fig. 14.

QI DEST

QO XML

APPNAME STATUS

ReplicationMaster

BOOL

QO

BOOL

WSTRING

WSTRING

WSTRING

INT

LCLID

MASTERID

INT

INT

DeltaID

MSTDELTA

INT

INT

MSTCACHE

INT

SLVCACHE

Target program collector function block [16]

VI. I MPLEMENTED IEC 61499- COMPLIANT R ECOVERY S OFTWARE F RAMEWORK

Fig. 15.

LCLAPP

DEST

BOOL

Automatic

WSTRING

MSTLIST

WSTRING

SLVLIST

WSTRING

RMTAPP

WSTRING

Replication master function block [10], [16] Slave

The concept and architecture described in the previous section III and V founded the basis for the development of a prototypic IEC 61499 FB library. This library contains the FBs for basic communication (e.g., such as Replication Master/ Slave, etc.), and furthermore several FBs for transferring, storing and parsing a control application.

EVENT

REGISTER

REGOUT

EVENT

EVENT

RECOVER

RECOUT

EVENT

EVENT

UPDATE

UPDOUT

EVENT

ReplicationSlave BOOL WSTRING INT WSTRING

Fig. 16.

QI MGR_ID

QO STATUS

BOOL WSTRING

REDUNDANCY APPNAME

Replication slave function block [10], [16]

A. Command Execution Engine The Command Execution Engine (CEE) is an extension to the standard IEC 61499 management interface using several new FBs. It is responsible for executing management commands on any device, even the own device. Furthermore, it generates the XML based management commands on the basis of an application-description. This application-description can either be queried directly from a device or can come from an XML-file. Using all these features an application can be queried, stored and transferred back to a slave-device. The following list provides an overview of the different IEC 61499 FBs which represents the CEE [16]: •





Embedded XML Parser: This FB parses a given string of characters into a defined script syntax. The string must be in the format of IEC 61499 XML and must contain an application description for at least one device. The script syntax is equivalent to the XML management command structure of IEC 61499. On the basis of this structure all necessary management commands for transferring an application from a master to a slave device can be created. With regard to predictable execution it is not possible to use a dynamic size-able list for storing this script. Therefore the script is stored in an internal array whose size is set at the initialization of the block. Program Storage: This FB stores numerous applications in script syntax. The scripted format consumes less memory than the XML ones. After a new slave-device has been discovered and its application has been retrieved this FB stores the application until a slave-device requests a recovery. The current implementation stores every application in the memory of the device. This concept may be extended by a file-oriented storage of scripts. Target Program Collector: This FB receives a XMLstring from a remote device or query from an application and transforms it to an XML-string. This string may be an application description and may be used to generate an application-script. Figure 14 shows the FB interface.



Request Generator: This block generates an IEC 61499 conforming XML management command out of four arguments.

B. Distribution Algorithm The distribution algorithm is also an extension to the standard IEC 61499 management interface which was done introducing new FBs. The FBs implement the distribution algorithm as described in section V. The following list provides an overview of the different IEC 61499 FBs which represents the distribution algorithm [16]: • Replication Master: This FB encapsulates the master component which is explained in section V. It manages the messaging between all devices of a recovery group. The messaging interface is implemented using the UDPMultiCAST group 224.0.105.1 at port 61499 which is currently not reserved. Master and slave devices are registered by the internal list manager using one of the messages as listed in V-B. If a new slave has been successfully registered the Discover event is fired to effect an application query by the command execution engine. If a slave-device requests a Recovery the Recover event may lead to an application transfer from the masterdevice to the new slave-device. All these operations are illustrated at the end of this section in a system overview and an interaction-diagram which should clarify the main concepts. The interface of this FB is shown in Figure 15. • Replication Slave: This FB encapsulates the slave components described in section V. It is able to send three different messages to the recovery group, which is accessible at 224.0.106.1 with port 61499. The messages (Register, Update, Recover) are explained in section V-B. This FB may be used to enable application recovery on a device. The interface of this FB is shown in Figure 16. • Live-Ping: This block is intended to be used as a helperblock to a R EPLICATION M ASTER block. With a cyclic

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

START1

Master

TARGETPRGCOLLECTOR

COLD WARM STOP

Recovery Group #1

INIT REQ

INITO CNF QueryXML 0.2

E_RESTART 0.0 MASTER

Master

INIT LISTEN DEL ID

QI DEST APPNAME

QO XML STATUS

INITO RECOVER DISCOVER CNFID CLIENT

ReplicationMaster 0.0 1 QI LCLAPP 0 LCLID 0 DeltaID 10 MSTCACHE 10 SLVCACHE 1 Automatic MSTLIST SLVLIST

Recovery Group #2 Combined Master/Slave

Combined Master/Slave

Master

Slave

Master

13

Master

INIT REQ

QO DEST MASTERID MSTDELTA RMTAPP

INITO CNF

CLIENT_2_1 0.0 1 QI QO ID STATUS SD_1 RD_1 SD_2

Slave

PRGSTORAGE

Slave

Slave

Slave

Slave

Slave

Slave PARSER REQParse REQScript

CNFParse CNFScript XSAX 0.2

XmlFile 500 CacheSize DeviceName

Fig. 17.

Cascading two recovery groups [16] Fig. 18.

event input, such as E CYCLE, it sends a cyclic Master registration message to keep its device information up to date. This message will lead to a periodic master-info update in every master-device. In case of an implemented time-to-live-counter this block is absolutely necessary, because otherwise those master-devices without a LivePing block will be omitted from the recovery group. C. System Overview and Sequence Diagram The FBs introduced before can be used to implement a recovery application using the syntax of an IEC 61499 implementation (e.g., FBDK [6], 4DIAC [64], etc.). Every FB can be placed on every device. Therefore, it is also possible to implement an application which contains a ReplicationMaster and a ReplicationSlave. The ReplicationSlave may be participant of another recovery group and therefore this mechanism can be used to cascade the backup of devices (see Figure 17) to enable recovery of master devices. The following sections explains how the FBs described above are used to build a complete recovery system. 1) Master Device Implementation: A master device contains a ReplicationMaster block and the Command Execution Engine, which consist of the FBs described above. Figure 18 shows the implementation of a master-device. The FB named “Master” is an instance of a ReplicationMaster and is responsible for managing master/slave registration. A Discover event leads to an application query, by the “target program collector”. After querying the application description it is passed through to the “Parser”, which parses the XML-string and generates an application script. After the parsing the application script is stored in the “Program Storage” and therefore kept available for a possible recover request. This recover request is signalized by a Recover event which leads to transfer of the stored application in the Program Storage to the requesting client. The “Request” block transforms the stored script items into the IEC 61499 management command structure, which is necessary for the communication with the device manager of a slave-device. The transformed commands are sent to the device by a standard CLIENT 2 1 FB.

Dest REQType OBJType Data1 Data2 MoreLines

INIT REQStore REQRead READ STORE DELETE

INITO CNFStore CNFRead READOK STOREOK CNFDEL

AppStorage 0.2 QI ID IN_DEST IN_REQTYPE IN_OBJTYPE IN_DATA1 IN_DATA2 Cachesize

QO STATUS DEST_OUT REQTYPE_OUT OBJTYPE_OUT DATA1_OUT DATA2_OUT MoreLines

REQUEST REQ

CNF

RQST_GEN 0.0 DSTI RQTYPE OBJTYPE ARG1 ARG2

DSTO RQST

Master device recovery engine implementation [16]

2) Slave Device Implementation: A slave device contains an instance of a ReplicationSlave block, whereas the “Register” event input is connected to the last INITO output in the event chain of the application. This block may be part of the management resource of the device, in which the device manager and communication server are situated. If the application is started on the slave-device for the first time, the ReplicationSlave block registers the device for recovery using the Register event. Right after the registration the “TargetProgramCollector” transfers the application to a masterdevice. If the device is replaced by a new device, the internal management resource may contain a ReplicationSlave and a functionality to detect whether there is a application on the device or not. If there is no application on the device the Recover event may be fired to effect an application transfer from a responsible master-device to the slave-device. The Update event may only be fired by a user request or by an appropriate application. The sample implementation contains a ReplicationSlave and some manual input fields to enable manual control of the block behavior. Figure 19 shows the implementation of a master-device. 3) Sequence Diagram: The sequence diagram in Figure VI-C3 illustrates a typical procedure of configuring a master and a slave device. At first the user designs and implements a recoverable system. Therefore, a master and a slave application have to be implemented, using an IEC 61499 editor, such as FBDK [6] or 4DIAC [64]. After this the management applications are downloaded to the master and slave-device. As mentioned in VI-B the R EPLICATION S LAVE block must be at the end of the INIT - event chain. After downloading and starting the slave application the R EPLICATION S LAVE sends a Slave registration message to the recovery group. The R EPLICATION M ASTER on the master-device accepts the slave-device and queries the requested application from the slave-device. The application is stored in a local storage block, such as the A PPLICATION S TORAGE. If the slave-device is replaced by a new one without any user-application the slave-device management sends a Slave recovery request and

14

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

START

REGISTER

COLD WARM STOP E_RESTART 0.0

INIT

the master-device to the slave-device. Another important factor is the memory consumption of the whole application and the specific FBs. Although the memory consumption has more or less no preference on normal PCs, but it is quite interesting with regard to a usage of these programs on embedded devices.

INITO IND

IN_EVENT 0.1 1 QI Register LABEL

QO

MGRID

REPSLAVE

INIT INITO REQ CNF

REGISTER RECOVER UPDATE

IN_TEXT 0.3 1 QI [1,30] SIZE localhost:61499 IVAL

STATUS

REGOUT RECOUT UPDOUT

INIT REQ

QO OUT

INITO CNF

OUT_TEXT 0.0

ReplicationSlave 0.0

QI "" IVAL 5,30] SIZE IN

QI QO MGR_ID Status REDUNDANCY APPNAME

QO

REDUNDANCY INIT REQ

INITO IND

IN_ANY 0.1 1 10 INT 2

QI W TYPE IVAL

QO OUT

APPNAME INIT REQ

REC

INITO CNF

INIT

IN_TEXT 0.3 1 QI [1,30] SIZE DEV1 IVAL

Fig. 19. User

UPD

INITO IND

INIT

IN_EVENT 0.1

QO OUT

1 QI Recover LABEL

INITO IND

IN_EVENT 0.1

QO

1 QI Update LABEL

QO

Slave device sample implementation [16] Development tool

Device (with master app.)

Device (with slave app.)

Device (with slave app.)

implement system start master app. start slave app. and

uest

query user app.

. send user app

store user app. remove slav e from networ

add new slav e (with slave

load user app.

register req

check responsibility

k

app.) to networ k

with no user

app. on it

detect existing app. load and parse user app. Into script

Fig. 20.

uest recover req load slave use

r app.

start recovered app.

B. Application Test The application test is done in a similar test arrangement as the FB test was done. After starting the Master application and the Slave application an additional, simple “Hello World” user application is started. Using the graphical user interface of the Slave application each message is triggered manually. Furthermore, each action is logged in a command line window, whereas additional information about the actual state of the application and its FBs is logged too. The slave application registers itself at the master application. The master application starts to query the XML-application description of the “Hello World” user application using “File-based application transfer” as described in V-D3. After querying the user application the embedded XML parser FB (i.e., the XSAX FB) parses the application description into a script and stores it locally using the AppStorage FB. After this the preparation for a future recovery is finished. Now the slave and the user application are quit to simulate a faulty device. The slave application is started again to simulate a new replacement device with no user application. The new slave application now sends a Recovery request to the master-device which initiates the transfer of the stored application script to the slave-device. The application script is converted to IEC 61499 management commands and transferred to the device manager of the slave device using the common “Step-by-step application transfer” as described in V-D1. The FBs RQST GEN and CLIENT 2 1 are responsible for these steps. After the transfer is completed the recovered user application is started and the process continues.

Recovery system sequence diagram [10]

responding to this the master-device starts to load the stored application onto the slave-device. [10] VII. R ECOVERY E XPERIMENTS AND T ESTS This section presents the arrangement, procedure and results of an experimental verification of the device recovery concepts as explained in section V and V. For the tests described in this section the FBDK [6] have been used but also other IEC 61499 implementations such as the 4DIAC initiative [64] can be used. A. Test Arrangement and Function Block Test First of all each FB has to be tested whether its behavior is correct or not. This can simply be tested by using the FB tester which is included in the FBDK. The ReplicationMaster is an almost network-triggered FB and its functionality can only be tested in combination with a corresponding ReplicationSlave FB at another device. Therefore, the test was performed using two PCs with FBDK. The application test is done using a similar test arrangement, whereas the main focus of the application is the correct transmission of the program from

C. Results During several tests described above the algorithm turned out to be stable for a low amount of devices (up to 5 devices have been used for the tests), which means each message was delivered as expected and furthermore the autonomous management of slave devices using the random master-ID performed quite well, whereas tests with a higher number of devices have not been done. One problem of the algorithm is the assumption of being able to process communication messages with a duration of zero. In fact the processing of an incoming or outgoing message takes a short amount of time and within this time no other messages can be accepted. Under common circumstances this problem is solved with multi threaded - communication handlers, but with regard to an embedded implementation this might not work, due to limited memory. In order to overcome this problem a safe communication protocol may be necessary to detect missing messages. Although the current implementation is not able to show the full potential of this approach, it turned out the great advantage of the concept is the openness for future implementations. On the basis of the recovery group communication scheme various applications and extensions may be developed and implemented in the future.

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

VIII. C ONCLUSION AND O UTLOOK With the approach presented in this article it is now possible to manage distributed automation and control devices connected to a multiple access network, such as Ethernet, CAN, etc. in such a way that autonomous application and state recovery in IEC 61499 based automation systems becomes possible. There are now several new software components, implemented as IEC 61499 FBs, which manage the recovery without further user interaction. The proposed concept of “recovery groups” is a self-organizing approach of querying applications, where new or faulty devices are automatically identified and recovered. The communication between the recovery group members is message-oriented and can easily be used in various types of communication networks, such as Ethernet and CAN. The application transfer is done using the defined IEC 61499 XML-based application interface. After several tests the management algorithm turned out to be stable for at least a small amount of devices, but it needs to be tested with more devices. Furthermore the system has not been tested using a mixed network topology, because the current prototype is considered to work with Ethernet only. One big problem that came up during the test was the amount of needed memory in the devices with master component which may not be appropriate for embedded devices. Each master-device must have a lot of free memory to have at least enough space to recover one device. This gets even worse in case the desired redundancy is increased to three or higher values. The amount of memory that is needed for recovery may be too much for high-volume embedded devices and it is arguable whether a customer wants to afford such devices or prefers to stick to traditional centralized server approaches. A lot of concepts introduced in this article are not possible with the current status of IEC 61499 implementation. Therefore the main idea about state recovery and other concepts are presented theoretically only. To implement these features on the basis of IEC 61499 a lot of extensions, such as a networkwide application identifier or an enhanced query operation, are needed. Therefore, for further work the following aspects have to be considered: secure communication, reduction of memory consumption, heterogeneous networks capability and handling resource constrained devices. Since it is not possible to include all these possible extensions into the IEC 61499 standard on a short term basis it would be a good opportunity to develop an IEC 61499 compliance profile for autonomous application recovery in distributed automation systems. Such a compliance profile can be expected to influence the standard on a long term basis. R EFERENCES [1] Y. Koren, U. Heisel, F. Jovane, T. Moriwaki, G. Pritschow, G. Ulsoy, and H. Van Brussel, “Reconfigurable manufacturing systems,” CIRP AnnalsManufacturing Technology, vol. 48, no. 2, pp. 527–540, 1999. [2] S. Lee and D. Tilbury, “Deadlock-free resource allocation control for a reconfigurable manufacturing system with serial and parallel configuration,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 37, no. 6, pp. 1373–1381, Nov. 2007. [3] H. Unver, “System architectures enabling reconfigurable laboratoryautomation systems,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 6, pp. 909–922, Nov. 2011.

15

[4] L. Wills, S. Kannan, S. Sander, M. Guler, B. Heck, J. Prasad, D. Schrage, and G. Vachtsevanos, “An open platform for reconfigurable control,” Control Systems, IEEE, vol. 21, no. 3, pp. 49–64, June 2001. [5] R. W. Lewis, Modeling control systems using IEC 61499. IEE Publishing, 2001, no. ISBN: 0 85296 796 9. [6] J. H. Christensen. (Access Date March 2011) HOLOBLOC.com - Function Block-Based, Holonic Systems Technology. [Online]. Available: www.holobloc.com [7] V. Vyatkin, J. Christensen, and J. Lastra, “OOONEIDA: an open, objectoriented knowledge economy for intelligent industrial automation,” Industrial Informatics, IEEE Transactions on, vol. 1, no. 1, pp. 4–17, Feb. 2005. [8] V. Vyatkin, “Iec 61499 as enabler of distributed and intelligent automation: State-of-the-art review,” Industrial Informatics, IEEE Transactions on, vol. 7, no. 4, pp. 768–781, Nov. 2011. [9] A. Zoitl, Real-Time Execution for IEC 61499, 1st ed. Durham, North Carolina, USA: International Society of Automation, 2008. [10] R. Froschauer, F. Auinger, G. Grabmair, and T. Strasser, “Automatic Control Application Recovery in Distributed IEC 61499 based Automation and Control Systems,” in IEEE 2006 Workshop on Distributed Intelligent Systems (DIS’06), June 15 - 16, Czech Republic, 2006. [11] IEC 61499: Function blocks, Part 1 - 4, International Electrotechnical Commission Std. IEC 61 499, 2005. [Online]. Available: www.iec.ch [12] G. Schickhuber and O. McCarthy, “Distributed fieldbus and control network systems,” Computing Control Engineering Journal, vol. 8, no. 1, pp. 21–32, Feb. 1997. [13] J. Feld, “Profinet - scalable factory communication for all applications,” in Factory Communication Systems, 2004. Proceedings. 2004 IEEE International Workshop on, Sept. 2004, pp. 33–38. [14] L. Seno, S. Vitturi, and C. Zunino, “Real Time Ethernet networks evaluation using performance indicators,” in Emerging Technologies Factory Automation, 2009. ETFA 2009. IEEE Conference on, Sept. 2009, pp. 1–8. [15] J. J. Scarlett and R. W. Brennan, “Evaluating a new communication protocol for real-time distributed control,” Robotics and ComputerIntegrated Manufacturing, vol. 27, no. 3, pp. 627–635, 2011. [16] R. Froschauer, “Autonomous application and state recovery in IEC 61499 based, distributed automation & control systems,” Master’s thesis, Universtiy of Applied Sciences Wels, Ried im Innkreis, 2005. [17] A. S. Tanenbaum and M. V. Steen, Distributed Systems: Principles and Paradigms, 1st ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2001. [18] PI, “PROFIBUS - System Description,” PROFIBUS &PROFINET International, Karlsruhe, Germany, Tech. Rep. 4.002, November 2003. [Online]. Available: www.profibus.com [19] E. Tovar and F. Vasques, “Real-time fieldbus communications using profibus networks,” Industrial Electronics, IEEE Transactions on, vol. 46, no. 6, pp. 1241–1251, Dec. 1999. [20] PI, “PROFINET - System Description,” PROFIBUS &PROFINET International, Karlsruhe, Germany, Tech. Rep. 4.132, Oct. 2002. [Online]. Available: www.profibus.com [21] S. Biegacki and D. VanGompel, “The application of DeviceNet in process control,” ISA Transactions, vol. 35, no. 2, pp. 169–176, 1996. [22] ODVA, “DeviceNet Technical Overview,” Open DeviceNet Vendor Association, USA, Tech. Rep., Oct. 2004. [Online]. Available: www.odva.org [23] F.-L. Lian, J. Moyne, and D. Tilbury, “Performance evaluation of control networks: Ethernet, controlnet, and devicenet,” Control Systems, IEEE, vol. 21, no. 1, pp. 66–83, Feb. 2001. [24] W. Blume and W. Klinker, “The Sensor/Actuator Bus: Theory an Practice of Interbus-S,” Phoenix Contact, Landsberg, Germany, 1994. [25] A. B¨using and H. Meyer, Interbus - Praxisbuch. H¨uthig, 2002. [26] K. Trkaj, “Users introduce component based automation solutions,” Computing Control Engineering Journal, vol. 15, no. 6, pp. 32–37, Jan. 2004. [27] T. Cucinotta, A. Mancina, G. Anastasi, G. Lipari, L. Mangeruca, R. Checcozzo, and F. Rusina, “A real-time service-oriented architecture for industrial automation,” Industrial Informatics, IEEE Transactions on, vol. 5, no. 3, pp. 267–277, Aug. 2009. [28] G. Menkhaus and B. Andrich, “Metric suite for directing the failure mode analysis of embedded software systems,” Information Systems Journal, pp. 266–273, 2005. [Online]. Available: http: //dblp.uni-trier.de/db/conf/iceis/iceis2005-3.html#MenkhausA05 [29] H. Pentti and H. Atte, “Failure mode and effects analysis of software -based automation systems,” VTT Industrial Systems STUKYTOTR 190, no. 09, p. 37, 2002. [Online]. Available: www.stuk.fi/julkaisut/tr/ stuk-yto-tr190.pdf

16

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C: APPLICATIONS AND REVIEWS

[30] J. H. Christensen. (Access Date March 2011) Basic Concepts of IEC 61499. [Online]. Available: www.holobloc.com/papers/1499\ conc.zip [31] ——. (Access Date March 2011) Design patterns for systems engineering with IEC 61499. [Online]. Available: www.holobloc.com/ papers/1499\ despat.zip [32] R. Brennan, “Toward Real-Time Distributed Intelligent Control: A Survey of Research Themes and Applications,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 37, no. 5, pp. 744–765, Sep. 2007. [33] T. Strasser, M. Stifter, F. Andren, D. Burnier de Castro, and W. Hribernik, “Applying Open Standards and Open Source Software for Smart Grid Applications: Simulation of Distributed Intelligent Control of Power Systems,” in 2011 IEEE Power & Engineering Society (PES) General Meeting, July 24-29, Detroit, Michigan, USA, 2011. [34] K. Thramboulidis, “Different perspectives [Face to Face; ”IEC 61499 function block model: Facts and fallacies”],” Industrial Electronics Magazine, IEEE, vol. 3, no. 4, pp. 7–26, Dec. 2009. [35] V. Vyatkin, Z. Salcic, P. Roop, and J. Fitzgerald, “Now That’s Smart!” Industrial Electronics Magazine, IEEE, vol. 1, no. 4, pp. 17–29, 2007. [36] V. Vyatkin, “The IEC 61499 standard and its semantics,” Industrial Electronics Magazine, IEEE, vol. 3, no. 4, pp. 40–48, Dec. 2009. [37] A. Zoitl and V. Vyatkin, “Different perspectives [Face to face; ”IEC 61499 architecture for distributed automation: The “glass half full” view],” Industrial Electronics Magazine, IEEE, vol. 3, no. 4, pp. 7–23, 2009. [38] IEC 61131: Programmable controllers, Part 3: Programming languages, International Electrotechnical Commission Std. IEC 61 131, 2003. [Online]. Available: www.iec.ch [39] A. Zoitl, T. Strasser, C. S¨under, and T. Baier, “Is IEC 61499 in harmony with IEC 61131-3?” Industrial Electronics Magazine, IEEE, vol. 3, no. 4, pp. 49–55, Dec. 2009. [40] T. Strasser, A. Zoitl, J. Christensen, and C. S¨under, “Design and Execution Issues in IEC 61499 Distributed Automation and Control Systems,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 1, pp. 41–51, Jan. 2011. [41] Y. Li Hsien, P. Roop, V. Vyatkin, and Z. Salcic, “A Synchronous Approach for IEC 61499 Function Block Implementation,” Computers, IEEE Transactions on, vol. 58, no. 12, pp. 1599–1614, Dec. 2009. [42] F. Andr´en and T. Strasser, “An Open Source based Distributed Control System with Decentralized I/Os controlled via Industrial Ethernet,” in Emerging Technologies and Factory Automation, 2011. ETFA 2011. 16th IEEE Conference on, Sept. 2011. [43] W. Leonhardsberger and A. Zoitl, “Using ethernet/ip with iec 61499 communication function blocks,” in Holonic and Multi-Agent Systems for Manufacturing - 5th International Conference on Industrial Applications of Holonic and Multi-Agent Systems, HoloMAS 2011, Toulouse, France, August 29-31, 2011, 2011, pp. 39–49. [44] G. Morn, F. Prez, D. Orive, E. Estvez, and M. Marcos, “IEC 61499 Service Interface Function Blocks to Access Decentralized Peripherals with Profibus DP,” in 13th IFAC Symposium on Information Control Problems in Manufacturing, vol. 13, 2009, pp. 886–891. [45] C. Schwab, M. Tangermann, A. Luder, A. Kalogeras, and L. Ferrarini, “Mapping of IEC 61499 function blocks to automation protocols within the TORERO approach,” in Industrial Informatics, 2004. INDIN ’04. 2004 2nd IEEE International Conference on, June 2004, pp. 149–154. [46] F. Weehuizen and A. Zoitl, “Using the CIP Protocol with IEC 61499 Communication Function Blocks,” in Industrial Informatics, 2007 5th IEEE International Conference on, vol. 1, June 2007, pp. 261–265. [47] V. Vyatkin, M. deSousa, and A. Zoitl, Industrial Communication Systems. CRC Press, 2011, ch. Communication Aspects of IEC 61499 Architecture, pp. 55–1–55–22. [48] R. Brennan, P. Vrba, P. Tich´y, A. Zoitl, C. S¨under, T. Strasser, and V. Maˇr´ık, “Developments in Dynamic and Intelligent Reconfiguration of Industrial Automation,” Computers in Industry - An International, Application Oriented Research Journal, Elsevier Editorial, vol. 59, pp. 533–547, 2008. [49] M. Khalgui and H. Hanisch, “Reconfiguration Protocol for Multi-Agent Control Software Architectures,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 1, pp. 70–80, Jan. 2011. [50] M. Khalgui, O. Mosbahi, Z. Li, and H.-M. Hanisch, “Reconfiguration of distributed embedded-control systems,” Mechatronics, IEEE/ASME Transactions on, vol. 16, no. 4, pp. 684–694, Aug. 2011. [51] W. Lepuschitz, A. Zoitl, M. Vall´e, and M. Merdan, “Toward SelfReconfiguration of Manufacturing Systems Using Automation Agents,”

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59] [60] [61]

[62] [63] [64]

Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 1, pp. 52–69, Jan. 2011. R. Brennan, M. Fletcher, and D. Norrie, “An agent-based approach to reconfiguration of real-time distributed control systems,” Robotics and Automation, IEEE Transactions on, vol. 18, no. 4, pp. 444–451, Aug. 2002. N. Higgins, V. Vyatkin, N.-K. Nair, and K. Schwarz, “Distributed Power System Automation With IEC 61850, IEC 61499, and Intelligent Control,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 1, pp. 81–92, Jan. 2011. G. Black and V. Vyatkin, “Intelligent Component-Based Automation of Baggage Handling Systems With IEC 61499,” Automation Science and Engineering, IEEE Transactions on, vol. 7, no. 2, pp. 337–351, Apr. 2010. F. Auinger, T. Strasser, and J. H. Christensen, “Using IEC 61499 Function Blocks (FB) for Closed Loop Control Applications,” in Proceedings of the International IMS Forum 2004, Cernobbio, Italy, May 17-19 2004, pp. 37–45. J. Lastra, A. Lobov, and L. Godinho, “Closed loop control using an IEC 61499 application generator for scan-based controllers,” in Emerging Technologies and Factory Automation, 2005. ETFA 2005. 10th IEEE Conference on, vol. 1, Sept. 2005, pp. 323–330. T. Strasser, F. Auinger, and A. Zoitl, “Development, implementation and use of an IEC 61499 function block library for embedded closed loop control,” in Industrial Informatics, 2004. INDIN ’04. 2004 2nd IEEE International Conference on, June 2004, pp. 594–599. P. Vrba, P. Tich´y and, V. Maˇr´ık, K. Hall, R. Staron, F. Maturana, and P. Kadera, “Rockwell Automation’s Holonic and Multiagent Control Systems Compendium,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 41, no. 1, pp. 14–30, Jan. 2011. H. Garcia, A. Ray, and R. Edwards, “A reconfigurable hybrid supervisory system for process control,” in Decision and Control, 1994., Proceedings of the 33rd IEEE Conference on, vol. 3, Dec. 1994, pp. 3131–3136. M. Guler, S. Clements, L. Wills, B. Heck, and G. Vachtsevanos, “Transition management for reconfigurable hybrid control systems,” Control Systems, IEEE, vol. 23, no. 1, pp. 36–49, Feb. 2003. C. Shelton, P. Koopman, and W. Nace, “A framework for scalable analysis and design of system-wide graceful degradation in distributed embedded systems,” in Object-Oriented Real-Time Dependable Systems, 2003. (WORDS 2003). Proceedings of the Eighth International Workshop on, Jan. 2003, pp. 156–163. nxtControl. (Access Date September 2011) nxt generation software for nxt generation customers. [Online]. Available: www.nxtcontrol.com R. Spalding, Storage Networks: The complete reference. McGraw-Hill & Osborne, 2003. PROFACTOR Research. (Access Date March 2011) 4DIAC Framework for Distributed Industrial Automation and Control. [Online]. Available: www.fordiac.org

Thomas Strasser (M’09) holds a Ph.D. degree in Mechanical Engineering with focus on automation & control theory from Vienna University of Technology. He is senior scientist at the AIT Austrian Institute of Technology in the domain of Smart Grids with special focus on advanced automation concepts. He was working for more than 6 years as a senior researcher at PROFACTOR research in the field of reconfigurable automation for intelligent manufacturing systems. Dr. Strasser coordinated various national and international projects in the domain of intelligent automation and control systems and he is a member of the IEC SC65B/WG15 maintaining the IEC 61499 standard. In addition, he is member of the IEEE societies IES, SMCS and PES. He is actively involved in the technical committees IES TC-IA, IES TC-SG, IES Standards-TC, SMCS TC-DIS, and the PES Task Forces on Open Source Software and Real-Time Simulation for power systems.

STRASSER AND FROSCHAUER: AUTONOMOUS APPLICATION RECOVERY IN DISTRIBUTED INTELLIGENT AUTOMATION AND CONTROL SYSTEMS

Roman Froschauer holds a Ph.D. degree in Computer Science with focus on software product lines from Johannes Kepler University Linz. Currently, he is working as senior software engineer at AlpinaTec GmbH in the domain of satellite and antenna test systems. He was working for 5 years as a research associate at the Upper Austrian University of Applied Sciences in the field of distributed automation systems. He was coordinator of the national FITIT FRONTICS project. Furthermore he is reviewer for various conferences in the field of automation systems (e.g., INDIN’07, INDIN’09, IFAC’09).

17

Suggest Documents