ScienceDirect

Systems 14th IFAC Conference on Programmable Devices and Embedded October 2016. Brno, Republic Devices and Embedded 14th IFAC5-7, Conference onCzech Programmable Available online at www.sciencedirect.com Systems Systems October 5-7, 2016. Brno, Czech Republic October 5-7, 2016. Brno, Czech Republic

ScienceDirect

49-25CPUs (2016) 001–007 Distributed PLC BasedIFAC-PapersOnLine on Multicore - Architecture and Programming Distributed PLC Based on Multicore CPUs -- Architecture Distributed PLC Based on Adam Multicore CPUsHrynkiewicz** Architecture and and Programming Programming Milik*, Edward

Adam Milik*, Edward Hrynkiewicz** Adam Milik*, Edward Hrynkiewicz** Institute of Electronics, Silesian University of Technology of Gliwice, Poland (e-mail: *adam.milik@ polsl.pl; **[email protected]). Institute of Electronics, Silesian University of Technology of Gliwice, Poland Institute of Electronics, Silesian University of Technology of Gliwice, Poland (e-mail: *adam.milik@ polsl.pl; **[email protected]). (e-mail: *adam.milik@ polsl.pl; **[email protected]). Abstract: The paper presents a complete approach to distributed control system design including architecture and paper programming. system consists of distributed controllers with a including network. Abstract: The presents The a complete approach to distributed controlconnected system design Abstract: The paper presents a complete approach to distributed control system design including A multiple core controller architecture is proposed utilizing independent bit and word units in MIMD architecture and programming. The system consists of distributed controllers connected with a network. architectureCalculation and programming. The system consists of distributed controllers connected with a network. structure. process is efficiently synchronized original implementation of semaphored A multiple core controller architecture is proposed utilizingbyindependent bit and word units in MIMD A multiple core The controller architecture is and proposed utilizing independent bitunits and is word units in MIMD memory program distribution among processing made structure.system. Calculation processpartitioning is efficiently synchronized by original implementation ofautomatically semaphored structure. Calculation process is efficiently synchronized by original implementation of semaphored by proposed compiler. For a purpose of program distribution and scheduling graphisbased memory system. The program partitioning and distribution among processingaunits made representation automatically memory system.program The program partitioning andconversion distribution among processing units isof made automatically of standard Developed model compilation selected standard by the proposed compiler. Foris aused. purpose of program distribution andenables scheduling a graph based representation by proposed compiler. For agraph purpose of program distribution and scheduling a graph based representation languages into independent based form. The distributed controller architecture has been prototyped of the standard program is used. Developed conversion model enables compilation of selected standard of theuse standard program used. Developed conversion model enables compilation of selected The standard with ofinto PFGA devicesis that enables implementation and evaluation of performance. data languages independent graph basedspecific form. The distributed controller architecture has been prototyped languages was intoimplemented independent graph based form. The distributed controller architecture has been prototyped exchange with use of deterministic protocol based on token passing. with use of PFGA devices that enables specific implementation and evaluation of performance. The data with use of PFGA devices that enables specific implementation and evaluation of performance. The data exchange was implemented with use ofofdeterministic protocolHosting based on passing. © 2016, IFAC (International Federation Automatic Control) by token Elsevier Ltd. All rights reserved. Keywords: PLC, LD, IL, SFC, FPGA, distributed system, network, MIMD exchange was implemented with use ofcompiler, deterministic protocol based control on token passing.

Keywords: PLC, LD, IL, SFC, FPGA, compiler, distributed system, control network, MIMD Keywords: PLC, LD, IL, SFC, FPGA, compiler, distributed system, control network, MIMD execution of a program. Analysis of control program 1. INTRODUCTION properties in (Davidson et.al. that execution published of a program. Analysis of 1992) controlshows program 1. INTRODUCTION execution of a program. Analysis of control program Boolean instructions constitute about 69% of a program. A control system1. INTRODUCTION design requires two fundamental properties published in (Davidson et.al. 1992) shows that properties published in (Davidson et.al. are 1992) showssuited that Programmable logic devices like about FPGAs components are design hardwarerequires platformtwo implementing instructions constitute 69% perfectly of a program. A control that system fundamentala Boolean Boolean instructions constitute about 69% of a program. A control systemanddesign requires tools two allowing fundamental implementing Boolean This items fits to control program programming toa for Programmable logic devicesoperations. like FPGAs aretwo perfectly suited components that are hardware platform implementing devices like FPGAs areFPGA perfectly suited components that are and hardware platform implementing a Programmable eachimplementing other. An logic interesting architecture of an for fits direct efficiently translate optimize a control algorithm for Boolean operations. This two items control program and programming tools allowing to for implementing Boolean operations. This two items fits to to control program programming tools allowing to each implementation of ladder architecture diagrams (LD) proposed in expressed a wayandacceptable for aa designer a form other. An interesting of anwere FPGA for direct efficiently intranslate and optimize control toalgorithm each other. An interesting architecture of an FPGA for direct (Welch and Crletta, 2001). Massive parallel processing in efficiently translate andA optimize a control algorithm acceptable in for hardware is expected to implementation of ladder diagrams (LD) were proposed in expressed aa machine. way acceptable for platform a designer to a form implementation of ladder diagrams were proposed of in expressed in a possible way acceptable forIna many designer to a form hardware only possible when (LD) proper deliver highest throughput. applications an andis Crletta, 2001). Massive parallel translation processing in acceptable for a machine. A hardware platform is expected to (Welch (Welch and Crletta, 2001). Massive parallel processing in acceptable for a machine. A hardware platform is expected to sequential program to parallel execution is made. The idea of almost highest instantpossible response of controller is expected. is only possible when proper translation of deliver throughput. In many applications an hardware hardware is only possible when proper translation of deliver highest possible throughput. In many applications an ordered and optimized sequential execution was shown in Programming by expected. reference sequential program to parallel execution is made. The idea of almost instantlanguages response areof standardized controller is sequential program to parallel execution idea of almost IEC61131-3 instant response of controller is expected. (Du et.al., 2010). Presented workis made. waswasThe based manual that isare continuously updated and optimized sequential execution shown on in Programming languages standardized by (Cenelec, reference ordered ordered and optimized sequential execution was shown in Programming are standardized by reference proposed for recovering of ladder 2013, and languages Tiegelkamp, 2010). This standard the transformations (Du et.al., 2010). Presented work operation was based on manualJohn IEC61131-3 that is continuously updatedcreates (Cenelec, (Du et.al., 2010). Presented work was based on diagram to sequential chart (SFC) described in manual IEC61131-3 that is continuously updated (Cenelec, foundation system designThis andstandard implementation. It transformations proposedfunction for recovering operation of ladder 2013, John for andcontrol Tiegelkamp, 2010). creates the proposed1993). for recovering operation of ladder 2013, John and Tiegelkamp, 2010). This standard creates the transformations (Falcione,toandsequential Krogh, Proposed methodology is allows developing alternative constructions of automatic diagram function chart (SFC) described foundation for control system design and implementation. It diagram to sequential function chart (SFC) described in in foundation for control system and design and implementation. It (Falcione, dependent and of rungs order1993). and operation organization in the control resources (hardware programming tools) that Krogh, Proposed methodology allows developing alternative constructions of automatic (Falcione, and Krogh, 1993). Proposed methodology is is allows developing alternative constructionswhile of automatic rungs. Theofother shortcoming of proposed method was an follow and meets(hardware given requirements offering rungs order and operation organization in the control resources and programming tools) that dependent dependent of rungs order and operation organization in the control resources (hardware and programming tools) that application to logic operations only. This problem partially improvedand performance. other shortcoming of proposed method was an follow meets given requirements while offering rungs. The Theaddressed other shortcoming of proposed method was an has been by ideas presented (Ichikawa et.al. follow and meets given requirements while offering rungs. application to logic operations only. Thisinproblem partially improved performance. application to logic operations only.synthesis This problem partially improved performance. 2011). Development of high level tools enabling has been addressed by ideas presented in (Ichikawa et.al. 1.1. Trends and solutions has been implementation addressed by ideas presented insubset (Ichikawa et.al. hardware of alevel C language 2011). Development of high synthesis toolsencourage enabling 2011). Development of high level synthesis tools enabling 1.1. Trends and solutions to developimplementation a method of translating a control program to C hardware of a C language subset encourage 1.1. Trends andofsolutions Improvement PLCs performance is subject of different language hardware equivalent implementation of a C language subset encourage and later synthesizing it program (Economacos to develop a method of translating a control to C researches. There are observed two main trends of thatdifferent can be to develop a method of translating a control et.al. program2015). to C Improvement of PLCs performance is subject and Economacos, 2012, Economacos language equivalent and later synthesizing it (Economacos Improvement as of PLCs performance is subject of hardware different language equivalent and later synthesizing it (Economacos distinguished direct researches. Thereprogrammable are observed solutions two mainand trends that can be Optimization of result is based on design space search for Economacos, 2012, Economacos et.al. 2015). researches. There are observed two main trends that can be and implementations. Both implementations utilize FPGA and Economacos, 2012, Economacos et.al. 2015). distinguished as programmable solutions and direct hardware optimal transformation of an input program to respective C Optimization of result is based on design space search for distinguished as programmable solutions direct hardware devices that enable evaluation of differentand concepts of result is based onseems designto space search for implementations. Both implementations utilize enabling FPGA Optimization structure. Proposed methodology be excessive in of an input program to respective C implementations. Both implementations utilize FPGA optimal transformation its The concepts programmable of anofinput program to respective C deviceshardware that enableimplementation. evaluation of different enabling optimal time andtransformation resources. An idea custom was widely structure. Proposed methodology seemscompiler to be excessive in devices that enable evaluation of different concepts enabling implementations attempts to implement efficient CPU structure. Proposed methodology seems to be excessive in its hardware implementation. The programmable extended by author in andcompiler Hrynkiewicz 2014, time and resources. Anworks idea of(Milik custom was widely its hardware implementation. The programmable structures that closely the IEC61131-3 standard time and resources. Anwas ideadeveloped of custom acompiler was widely implementations attemptsfollow to implement efficient CPU Milik, 2016). There formal method of extended by author in works (Milik and Hrynkiewicz implementations attemptsof to implement efficient CPU extended by author in works (Milik and Hrynkiewicz 2014, requirements. example custom CPU isstandard shown 2014, structures thatAnclosely follow the designed IEC61131-3 translating an input program given in mixed form of SFC, LD Milik, 2016). There was developed a formal method of structures that 2016). closelyThe follow the IEC61131-3 standard in (Chmiel et.al. design is intended follow Milik, 2016). Thereintermediate was developed a formal method of requirements. An example ofCPU custom designed CPU to is shown and IL into abstract form that later is mapped to translating an input program given in mixed form of SFC, LD requirements. An example of custom designed CPU is shown as close aset.al. possible implementation of isinstruction (IL) hardware translatingresources an input program givenofinsharing mixed form ofPresented SFC, LD in (Chmiel 2016). The CPU design intended tolist follow with ability them. and IL into abstract intermediate form that later is mapped in (Chmiel A et.al. 2016). The CPU design is intended follow and IL into abstract intermediate form that later is mapped to language. direct hardware implementation of atolist control to as close as possible implementation of instruction (IL) methodology allowswith obtaining hardware resources ability controllers of sharing with them.throughput Presented as close as possible implementation of instruction list (IL) systems significantly response and times hardware resourcesinwith ability of of sharing them. Presented language.allows A direct hardware reducing implementation of time a control measured fraction microseconds that are allows obtaining controllers with throughput language. A direct hardware of a parallel control methodology increasingallows throughput. This is reducing aimplementation result ofresponse massively methodology with allows obtaining controllers with throughput systems significantly time and times PLC languages. measured standard in fraction of microseconds that are systems allows significantly reducing response time and programmed times measured in fraction of microseconds that are increasing throughput. This is a result of massively parallel programmed with standard PLC languages. increasing throughput. This is a result of massively parallel programmed with standard PLC languages. Copyright © 2016 IFAC 1 2405-8963 © 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. © under 2016 IFAC 1 Control. Copyright Peer review responsibility of International Federation of Automatic Copyright © 2016 IFAC 1 10.1016/j.ifacol.2016.12.001

Adam Milik et al. / IFAC-PapersOnLine 49-25 (2016) 001–007

The dedicated directly in hardware implemented controllers require complex implementation process that is time consuming. Programmable implementation allows for fast and simple implementation that makes it attractive. The control program is represented as sequential control process in form of single threaded calculation process. Observation made to the program structure suggested use of multiprocessor CPU architecture. Initially an idea of splitting instruction stream to bit and word operations was implemented (Chmiel and Hrynkiewicz, 2008). This allows to introduce a partially parallel operation of bit and word processing units that was shown in (Chmiel and Hrynkiewicz, 2010). This idea can be applied directly to existing IL program execution without any additional operations. It was observed that execution performance is dependent of instruction interleaving that enables parallel operation of both units before data synchronization is requested. Significant improvement of execution performance was observed by introducing respective compiler able to optimizing logic dependencies discussed in (Kim et.al. 1999).

Process #1

Process #n Actuators

Sensors

Actuators

Sensors

mPLC CPU Core

mPLC CPU Core

Memory & Semaphores

Memory & Semaphores Network Link

Network Controller

Network Controller

Fig. 1. A control system with distributed PLCs. synchronization a dedicated hardware semaphored memory is designed. Distributed system requires reliable network controllers with deterministic data exchange among all units. In presented solution a dedicated network controller were designed with media independent interface. The dedicated architecture consumes less hardware resources than standard cores additionally implements safety control that is responsible for handling data exchange errors and placing controllers into exception states. Utilized communication protocol is based on deterministic data exchange scheme. The section is conclude with implementation results and performance estimations. There are shown implementation complexity of particular cores and complete controller unit.

Distributed systems introduce additional requirements in opposite to centralized controllers. The centralized control systems offer remote input-output modules connecting to PLC CPU by means of dedicated networks. The idea of distributing control task across of controlled system and processes was shown in (Hashemi Farzaneh et.al. 2013). Distributing controllers across controlled system enables increased reliability and fail safe operation. Control process distribution across multiple PLCs enables task partitioning and reducing overall response time by shortening calculation time. Manual task partitioning and its distribution among CPUs requires extra effort. It is also a potential source of incorrect operation. An idea of task partitioning on the level of function blocks is discussed in (Becekr et.al. 2015). Presented approach attempts to distribute task blocks among computation resources. Proposed model disables detailed scheduling and calculation distribution that requires detailed calculation model in form of graph.

For an automation designer presented control system is seen as single unit. In the section 3 there is shown designed custom compiler and mapper (Fig. 2). It allows a user to concentrate on control algorithm implementation. Formal program analysis and target independent graph representation enables two level program partitioning. Partitioning starts from calculation tasks disjoint assignment to respective controllers. This step enables automatic construction of data exchange packages and network organization. Presented controller utilize multiprocessor unit architecture. The second step schedules respective computation tasks among processing units (program). The semaphored memory is an essential component responsible for processing

It can be observed that solutions are focused on development of one aspect that are hardware or software layers. there are lack of solution that attempt to deliver complex solution utilizing hardware-software co-design process. Summarizing we attempt to answer following question: Is it possible to introduce distributed multiple core PLC CPU and efficiently implement a standard control program in it?

LD

IL

SFC

Analysis Optimization

1.2. Paper outline In this paper we attempt to address a complete problem of developing a distributed control system. Presented solution consists of custom distributed hardware platform shown in Fig. 1 and custom programming tools that flow diagram is depicted in Fig. 2. It is intended for reducing an overall response time utilizing a multiple core CPU architecture operating in network linked distributed control system. The section 2 gives detailed description of designed custom hardware platform. It is based on multiple cores CPU with tight synchronization of calculation threads. For the process

Hardwre Mapping

Synthesizable Verilog HDL

Task partitioning Instruction Mapping Allocation Instruction Stream

Mixed language program input

Graph based program intermediate form

Remote Data Exchange

Multiple Cores CPU distribution mapping and data exchange

Fig. 2. A distributed control system synthesis flow. 2

Originaly Developed and Addressed problems

2016 IFAC PDES 2 October 5-7, 2016. Brno, Czech Republic

2016 IFAC PDES October 5-7, 2016. Brno, Czech Republic


synchronization without additional overhead.

FPGA device architecture. Both processor utilize pipelined Harvard architectures. The processing was accommodated to IL execution model utilizing direct argument addressing:

2. DISTRIBUTED MULTIPLE CORE CONTROLLER ARCHITECTURE

cr B = cr B ( op B ) arg B : op B ∈ O BO

The distributed controller architecture (dmPLC) consist of multiprocessor units (mPLC) linked with network connection. The unit architecture schematically is shown in Fig. 3.

where: OBO, OWO denotes set of bit and word operations respectively, crB, crW denotes current result (accumulator) bit and word register respectively and opB, opW denotes bit and word operation respectively belonging to respective subset of operations. The IL implements parenthesis operations. In order to support nested calculations an embedded stack is implemented in form of ring register inspired by Simatic S7200 family operations (Siemens, 2011). Every time a new argument is loaded to respective cr register its previous content is automatically pushed to the stack without any overhead. The ring register architecture allows for unbalanced stack operations that are result of implied push operation at load. Apart from typical push and pop instructions, the top of the stack is used as an argument with respective instructions that simultaneously pops the stack.

Sys Ctrl.

DI

IA

DA ID

A

DQ

PROG Prog. MEM Mem

Q D A

Bit Mem.

Q

DI

D

DO

A

DA IA

DI

WordDO CPU

A

DA ID

Q D A

Word Mem.

DQ

Prog. Mem

Network Ctrl.

NePRork (seriMl) link

RAMB16 IP

Fig. 3. Typical PLC computation cycle. +1

The multiprocessor architecture utilize two types of computation resources that are a bit (or Boolean) unit and a word unit. An unit architecture has been inspired by researches in the domain of multiprocessor systems and control units discussed in papers (Edwards et.al. 2009, Chmiel and Hrynkiewicz, 2010). In order to synchronize operations of different data type processors their instruction set contains a minimal subset of common instructions:

O= O B ∪ O W O= O B ∩ OW BT

(2)

cr W = cr W ( op W ) arg W : op W ∈ O WO

ICO modules

BIT Bit DO CPUs CPUs

3

Instr Mem

DATA

EXE

JMP RQ

CRB

RDY

Bit Data Mem

LU

RAMB16 Stack

Register – pipeline stage separation

PUSH POP

Fig. 4. Bit CPU block diagram Conditional execution bit path

(1) RQ

Where O is a set of all implemented instructions, OB subset of instructions implemented by bit CPU, OW subset of instructions implemented by word CPU, OBT is a common subset consisting of bit transfer and test operations. Additional there is implemented system control module responsible for transferring data between local input-output modules and memory. Data exchange with other controllers is possible through the network controller. Calculations and data exchange processes are synchronized through the semaphored memory system that is a dedicated part of data memory. Semaphored memory is necessary for exchanging shared variables that synchronize calculation process.

IP

+1

JMP

Instr Mem

CRB

Bit Data Mem

DATA

RDY

Z

C

EXE

RAMB16

Word Data Mem

RQ

RDY

CRW AU

RAMB16 Register – pipeline stage separation

Stack

PUSH POP

Fig. 5. Word CPU block diagram The bit processor block diagram is shown in Fig. 4. The architecture utilizes a pipelined 4 staged data processing. A pipelined approach allows distributing combinatorial operations across all stages that results with reduction of an instruction execution time. A memory access has been designed to operate with arbitration system. The request and ready notification among memory and processor is mainly intended for semaphored operation that controls processing flow between data consumers and producers. The word processor is used for implementing arithmetic operations and

2.1. Processing units architecture The processing unit is constructed from dedicated Boolean and arithmetic processing cores. They constitute multiple cores architecture that enables parallel execution of multithreaded programs. In case of control system the most natural partitioning is data type dependant where bit and word types are the partitioning criterion. The processing units implementation were accommodated to specific abilities of 3



its block diagram is shown in Fig. 5. It is equipped with minimal bit processing path that allows for data exchange with a bit space for synchronizing with bit processing units.

C. Calculation processes synchronization is shown in form of Petri net fragments where consumers are triggered as soon as data is produced (Fig. 6.D) or producer is allowed to deliver the data as soon as all consumers have completed processing of recent data item (Fig. 6.E).

2.2. Multithreaded calculations synchronization

2.3. Network controller unit

Parallel operating units require implementing synchronization mechanisms that assures ordered execution of computations. In multithreaded systems thread synchronization is controlled by semaphores (Silbershatz and Galvin, 2008). A semaphore is controlled by supervisor allowing for synchronizing data flow between consumer and producer processes. It requires exclusive access to memory that prohibits operation of other processes introducing an additional overhead during access (Saglam and Mooney, 2001). It is acceptable when synchronized tasks execution time is longer than semaphore handling time. In control systems there are relatively short tasks consisting of several instructions. This specific property requires respective implementation with acceptable processing time.

The network controller unit is responsible for exchanging data with other units. The semaphored memory system triggers data transfer immediately after precomputation cycle is completed and data blocks are ready for exchange. Data are exchange by means of an Ethernet network accessed through physical unit (PHY) implemented in development boards. The media independent interface (MII) links the PHY unit with an FPGA device. One of the controller in the network is established a master and initializes data exchange process. The network is isolated from other network traffic and operates in token passing system even though utilized architecture is dedicated for CSMA/CD protocol. The structure of data blocks and its origin are known at the design stage that allows for creating a deterministic data exchange process. After powering up a master unit initialize entire system operation. It checks for presence of all slave devices. After successful completion of presence check a run notification is transmitted that places entire system in run mode. In the run mode a cyclic evaluation of program loop is performed. The master unit transmits first data frame. Each data frame passes transmission rights to next module. A response timeout limitation is used for protecting against infinite waiting for data arrival. Excess of timeout period asserts data error bit flag that enables starting exception service.

The problem was solved by introducing a semaphored memory system that operation schematically is explained in Fig. 6. Proposed solution merges a memory space with multithreaded access control. This concept reduces an overhead for controlling data flow to minimum. A variable shared between data producer process and data consumers remains in unassigned or assigned states. An unassigned variable is allowed to be written to while an assigned variable is allowed to be read. The data flow is controlled by associated with memory cell system shown in Fig. 6.A that joins data producer process with single data consumer. For each consumer an independent instance of semaphore system is created. For efficient implementation a distributed RAM cells were used enabling implementation of up to 64 shared cells for 6 input LUT architectures. Two separated memory tails are used to record a data write flag and data read flag. Both operations toggle respective flags content allowing for data read or write that is schematically shown in Fig. 6 B and A.

The custom controller implements minimal Ethernet standard requirements. This minimal approach allows for reducing implementation complexity. In order to deliver data to all controllers in a network a broadcast destination address is used. A token passing is based on recognition of package sender in a deterministic sequence created at design time. A controller implemented in FPGA device is responsible for transmitting and receiving a data. It is configured by a memory array that contains ordered sequence of entries describing exchange block properties. There are gathered network identifier of sender data block sizes and start addresses of bit and word blocks.

Semaphores Unit

Semaphore write

WR WR_EN WR_RDY

AA AB DA QA WE QB

AB QA

MW

MR

QB

AA DA WE

RD RD_EN

Semaphore read

AR[3:0] AR[4]

AW[3:0]

RD_RDY

2.4. Controller operation outline and implementation

CLK

B.

¯¯+¯¯ RD+A4

10

RD&A4

WR ¯¯

11

WR

C.

WR 10

00 RD&A4

¯¯ WR

Memory Cell

Semaphores

Valid Data Valid Data

RD WR RD WR RD RD WR WR

Empty

WR WR WR WR #0 #1 #2 #3

¯¯+¯¯ RD+A4

D.

The distributed controller architecture (Fig. 3) implements independent units responsible for data exchange (System Control, Network Controller) and data processing (Bit and Word CPUs). All modules are synchronized by means of semaphored memory that simplifies synchronization mechanisms and allows for scalability. Table 1 gathers implementation results of particular components of controller. An experimental implementation was made for Xilinx’s FPGAs family. There were selected Spartan 3 (S3) that represents 4 input LUT architecture and Virtex 5 (V5) and 6 (V6) families that are representatives of 6 input LUT

Semaphored Memory Block Data consumers

E. WRi

RD1 WR

RD1

RD2

RDn

WR_RDY RD2

RDn

WRi

Fig. 6. Semaphored memory block architecture and operation 4



of logic and arithmetic expressions. Except of standard nodes a conditional multiple choice node was introduced enabling conditional processing flow. A raw form of an EDFG is created during sequential language analysis (LD, IL or SFC). It assures proper order of expressions (rungs) evaluation or token passing in SFCs. Elementary logic and arithmetic optimizations are applied after generations.

Tab. 1. Controller blocks implementation results Unit

FPGA

LUTs

FFs

S3 V5 V6 S3 V5 V6 S3 V5 V6 S3 V5 V6

52 31 37 354 284 296 285 231 248 941 813 823

63 63 63 168 172 174 185 185 185 593 395 364

Bit CPU Word CPU Network Ctrl. (RX/TX) PLC B2A1N

Distr. RAM 1 1 1 16 16 16 8+1 8+1 8+1 45 45 45

fmax [MHz] 130.5 203.6 431.6 86.2 175.4 280.1 144.9 341.2 454.1 78.5 169.5 245.1

Block RAM 1+1 1+1 1+1 1+1 1+1 1+1 2+1 2+1 2+1 8 8 8

3.1. Mapping of control task for distributed system The variables set consists from subsets that are associated with particular controller physical signals. A controller variables set consists of variables that are arguments and assignment targets. An assignment target variable defines a support set of variables which are referred by its subgraph.

architectures. The controller architecture is scalable and allows selecting desired processing structure. Control programs are dominated by bit operations. In order to balance controller performance of bit and word operations, proposed distributed PLC unit utilize two bit processing units, one word (arithmetic) processing unit and network data exchange unit. It is referred under code name B2A1N. It can be observed that performance of controller is about 13ns per instruction for Spartan 3 and the top performance of 4.1ns per instruction in Virtex6 implementation. It should be noticed that 3 independent instructions can be executed at a time. That results with average performance of 4.33 and 1.37 ns per instruction. The detailed performance of controller requires considering proposed programming methodology presented further.

= V A V AW ∪ V AR V SH =  sup ( v AWi ) − V AR

where: VA is a controller A variables set that consists of VAR (referred) and VAW (assigned) subsets, VSH is shared variables subset constitute of referred variables not belonging to VAR subset. The VSH is calculated as a common part of all controller variables set. If VSH = ∅ then a control system does not require data exchange and controllers operates independently from each other. Usually VSH is non empty set. The VSH set items are exchanged by means of network controller. The VSH consists of subsets associated with respective controllers. In order to satisfy a formal design of EDFG mapping to instructions set and configuring network data exchange the VXCH set of variables is created that is a copy of VSH. Respective VSH and VXCH items are linked by directed edges that allow generating a data blocks for network exchange. Each vSHi read variable node in subgraphs delivering vXW that vSHi ∉ VX is substituted by the read node of vXCHi variable. An exemplary partitioning process is shown in Fig. 8. Read and write nodes (squares) are coloured depending on association with respective controller (A or B). An input graph (Fig. 8.A) is partitioned between controllers A and B. The initial EDFG is rewritten in order to disjoin edges linking variables subsets belonging to different

Developed hardware platform is not suitable for manual programming and requires respective programming tool enabling use of a standard PLC languages as an input (Fig. 2). An input program can be developed as mix of SFC, IL and LD. It is translated into target independent acyclic directed graph form called an Enhanced Data Flow Graph (EDFG). This form has been developed originally by author for implementing control programs as dedicated hardware structures (Milik and Hrynkiewicz, 2014, Milik, 2016). The EDFG has been inspired by (Gajski et.al. 1994) and extended to cover specific requirements of control systems (Fig. 7). There were introduced attributed edges simplifying handling

B.

A.

b

e

c

b

d

0

y

2

1

3

&

&

a

1

5

b

y

C. Legend – Variables origin

C. a

b

c

d

Graph Items Description e

&

AND ADD

OR

OR MUL

I E

Simple edge y

Conditional selection NOT edge

Local

A

B

Shared

AB

BA

6

Controller B

n2 2

1 n1

n2

3b

5

n3 6

4

7a n3

n4

Fig. 8. EDFG partitioning for control distribution 5

n1

3a

3c

7b

Fig. 7. EDFG expansion for instruction mapping

4

Shared variables

Controller A

n4

Complement edge

3

7

7

OR

1

3

2

4 6

5

I0 E0 I1 E1

y

Controller B

Controller A

1

B. a

(3)

i

3. PROGRAMMING

A.

5



controllers and substitute them with respective exchange set variables.

A.

An improvement of VSH creation is shown in Fig. 8.C. This preliminary developed method is under evaluation. It attempts to reduce VSH size by its local pre-processing. It utilize expansion of multiple argument nodes and process the nodes as long as they depends on variables associated with controller (local) signals. This method allows reducing cardinality of VSH in many cases. In case when subset of VSH consists of more items than non-optimized approach the first one is used. There arises a problem of execution time of shared variables pre-processing while distributing variables among multiple controllers allows creating multiple parallel calculations. This factor requires comparison of scheduled block execution in both cases. Currently shared variables allocation precedes schedule process and requires further evaluation.

y

7

2 5

5 7

1

3

1

6

4 4

3

4

3

4

OR

h

AND

b

OR

s0

OR

h

AND

b

AND

f

AND

c

OR

s0

AND

f

AND

c

OR

e

AND

d

OR

s0

ST

AND

d

OR

s0

ST

OR

s0

OR

s0

ST

s1

OR

y

ST

s1

ST

s0

Valid variable access

s0

Empty variable access - wait

CPU Idle

1 1 3

2

3

2

(6)

abcd : cpu 2 s1 =

(7)

The paper presents the distributed multiple core PLC (dmPLC) architecture with respective programming tools. The hardware platform consists of independent data processing units, local and remote data exchange interfaces. Operation of processing units and data exchange interfaces is synchronized through semaphored memory system.

4

6

s 0 = abcd : cpu 1

4. CONCLUSIONS 4

6

(5)

The execution time is equal for both architectures. The B2 unit offers an efficiency of 78.5% (11 active cycles over 14) while the B3 unit offers an efficiency of 61.9% (13 to 21). Typical PLC program consists of many tasks similar to considered one. Task distribution introduces additional cost of partial result exchange and waiting cycles. Waiting for shared variables is efficiently handled by semaphored memory that stalls requestor operation until variable is ready for desired operation (read or write). Semaphored variable does not introduce additional wait cycles observed in multitasking system that is schematically shown in Fig. 10.B.

1

3

a

y =e + s 0 + s1 : cpu 0

E0 I0 E1 I1

6

5

LD

f ( g + h ) : cpu 1 s0 =

D.

2

g

For the B3 architecture three calculation threads were triggered with following assignments:

4

3

7

LD

y = s 0 + e + f ( g + h ) : cpu 0

4

C.

e

The expression was scheduled for two and three Boolean CPUs architectures referred as B2 and B3 architectures. For B2 architecture following decomposition was applied:

2

1

7

LD

= y abcd + e + f ( g + h )

B.

5

a

An exemplary task distribution was shown in Fig. 10 that implements following expression:

(4) 0 op1 ∈ { ADD, OR} cr 0 =  1 op1 ∈ {MUL, AND} where: cri denotes current result of i-th operation in considered sequence, opi denotes i-th step operation in sequence, the 0 index denotes an initial condition of operation. An exemplary mapping transformation is shown in Fig. 9. An input EDFG consisting of multiple argument operations is subject of expansion process (A). An equivalent graph accommodated for PLC mapping is shown in case C.

3

LD

The conditional choice node introduces execution time dependencies – condition must be evaluated before choosing desired operation sequence producing result (cases B and D). The expansion process is an iterative procedure. It starts from variable assignment and traces back to argument nodes. Expansion process utilize an ASAP scheduling approach that allows determining arguments to be expanded.

cr i = cr i −1 ( op i ) arg i

2

g

Fig. 10. Operation scheduling example for two and three CPUs

An EDFG has been partitioned to execution units. Applied optimizations merged nodes into multiple argument operations. A mapping procedure and instruction generation requires transforming EDFG into form where each node is directly mapped into equivalent instruction or instructions sequence that is an equivalent of following model:

1

CPU2

CPU1

CPU0

LD

Legend

3.2. Mapping for Multiple Core Execution

A.

B. CPU1

CPU0

E0 I0 E1 I1

Fig. 9. EDFG expansion for instruction mapping 6



industrial control applications, Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on, Corfu, 2015, pp. 1-6. doi: 10.1109/IISA.2015.7388129 Edwards, S.A., Sungjun, K., Lee, E.A., Liu, I., Patel, H.D., Schoeberl, M. (2009). A disruptive computer design idea: Architectures with repeatable timing, IEEE International Conference on Computer Design ICCD , 47 Oct. 2009, Lake Tahoe, CA, USA, pp. 54–59. Falcione, A. and Krogh, B.H. (1993). Design Recovery for Relay Ladder Logic, IEEE Control Systems, vol.13, no.2, pp.90-98, April 1993. Gajski, D., Dutt, N., Wu, A., Lin, S. (1994). High-Level Synthesis Introduction to Chip and System Design, Kluwer Academic Publishers. Hachtel G. and Somenzi F. (1996). Logic synthesis and verification algorithms. Springer. Hashemi Farzaneh M., Feldmann S., Legat C., Folmer J. and Vogel-Heuser B. (2013): Modeling Multicore Programmable Logic Controllers in Networked Automation Systems, IECON 2013 - 39th Annual Conference of the IEEE, Vienna, 2013, pp. 4398-4403. doi: 10.1109/IECON.2013.6699843 Ichikawa S., Akinaka M., Hata H., Ikeda R. and Yamamoto H. (2011) An FPGA implementation of hard-wired sequence control system based on PLC software. IEEJ Trans Elec Electron Eng, 6: 367–375. doi: 10.1002/tee.20670 John K. H. and Tiegelkamp M. (2010). IEC 61131-3: Programming Industrial Automation Systems: Concepts and Programming Languages, Requirements for Programming Systems, Decision-Making Aids, SpringerVerlag, Berlin Heidelberg. Kim H. S., Lee J. Y. and Kwon W. H. (1999) A compiler design for IEC 1131-3 standard languages of programmable logic controllers, SICE Annual, 1999. 38th Annual Conference Proceedings of the, Morioka, 1999, pp. 1155-1160. doi: 10.1109/SICE.1999.788715 Milik, A (2016) On hardware synthesis and implementation of PLC programs in FPGAs. Microprocessors and Microsystems, 2016, vol.44, pp. 1-15 Milik A. and Pulka A. (2014) On FPGA dedicated SFC synthesis and implementation according to IEC61131, International Conference on Signals and Electronic Systems (ICSES). DOI: 10.1109/ICSES.2014.6948730. Mocha, J. and Kania, D. (2012). Hardware Implementation of a control program in FPGA structures, Electrical Review Dec. 2012, vol. 88, iss. 12a, pp. 95-100. Saglam B. E. and Mooney V. J., (2001) System-on-a-chip processor synchronization support in hardware, Design, Automation and Test in Europe, Conference and Exhibition Munich, 2001, pp. 633-639. doi: 10.1109/DATE.2001.915090 Siemens: SIMATIC S7-300 Instruction list S7-300 CPUs and ET 200 CPUs, Siemens AG Nurnberg, 2011 Silberszatz A., Galvin P. B.: Operating System Concepts, John Willey & Sons, 2008

Semaphored memory system allows for scalability of the system enabling easy integration of processing module. Proposed system was implemented with use of FPGA devices. The performance of single processing unit (bit or word) is comparable to the S7-319 PLC CPU (Siemens, 2011). Multiprocessor operation with automatic program implementation offers significantly higher performance. Respective programming tool prototype was developed that enables use of standard PLC languages. The compiler enables automatic implementation of a program in distributed system addressing all aspects from exchanging the data to distributing calculations among processing cores. The research over a controller architectures and programming tools are continued. It is attempted to introduce a floating point arithmetic core and improve compiler partitioning and scheduling strategies. ACKNWLEDGEMENT This work was supported by the Ministry of Science and Higher Education funding for statutory activities (BK220/RAu-3/2016). REFERENCES Becker M., Sandström K., Behnam M. and Nolte T. (2015) A many-core based execution framework for IEC 61131-3 Industrial Electronics Society, IECON 2015 41st Annual Conference of the IEEE, Yokohama, 2015, pp. 004525004530. doi: 10.1109/IECON.2015.7392805 Cenelec (2013). EN 61131-3, Programmable Controller – Part 3: Programming Languages, Intern. Standard, Management Centre, Avenue Marnix 17, Brussels. Chmiel M., Kulisz J., Czerwiński R., Krzyżyk A. Rosół M. and Smolarek P. (2016) An IEC 61131-3-based PLC implemented by means of an FPGA, Microprocessors and Microsystems 2016 vol. 44, pp. 28-37 Chmiel M. and Hrynkiewicz E. (2010) Concurrent operation of processors in the bit-byte CPU of a PLC. Control and Cybernetics. vol. 39 iss. 2, pp. 559-579. Chmiel M. (2008) On reducing PLC response time. Bulletin of the Polish Academy of Science - Technical Science, vol. 56, iss. 3, Pages: 229-238. Davidson J.W., Rabung J.R. and Whalley D.B. (1992) Relating static and dynamic machine code measurements, IEEE Transactions on Computers, vol. 41, no. 4, pp. 444-454, Apr. 1992. doi: 10.1109/12.135557 Du D., Xiaodong, X., Kazuo, Y. (2010). A study on the generation of silicon-based hardware PLC by means of the direct conversion of the ladder diagram to circuit design language, The International Journal of Advanced Manufacturing Technology, Springer London, vol. 49, issue 5, pp.615-626. Economakos C.; Economakos G.: C-based PLC to FPGA translation and implementation: The effects of coding styles, International Conference on System Theory, Control and Computing, 12-14 Oct. 2012, pp. 1-6, Economakos C, Kiokes G. and Economakos G (2015), Using advanced FPGA SoC technologies for the design of 7

7