Utilizing Parallization and Multicore Architectures for ... - CiteSeerX

Utilizing Parallization and Multicore Architectures for Scheduling Large-Scale Wireless Mesh Networks Song Han, Aloysius K. Mok Dept. of Computer Science University of Texas at Austin Austin, TX 78712 {shan, mok}@cs.utexas.edu Lawrence Waugh Texas Multicore Technologies, Inc. 12912 Hill Country Blvd Austin, TX 78738 [email protected]

Mark Nixon, Deji Chen Emerson Process Management 1100 W. Louis Henna Blvd., Bldg I Round Rock, TX 78681 {mark.nixon, deji.chen}@emerson.com Fred Stotz Advanced Micro Devices 7171 Southwest Parkway Austin, TX 78735 [email protected]

ABSTRACT WirelessHART™ was released in September 2007 and became an IEC standard in April 2010 (IEC 62591). It is the first open wireless communication standard specifically designed for process measurement and control applications deployed in harsh and noisy environments. Compared with wireless community networks like Wi-Fi, these applications have stringent requirements on communication reliability and real-time performance. Missing or delaying of the process data by the network may severely degrade the overall monitoring and control performance. To address these issues, we designed efficient algorithms to construct reliable graph routing and data link layer communication schedules to achieve end-to-end reliable and real-time communication in WirelessHART mesh networks. In this article, we further improve the scalability and efficiency of the proposed algorithms to make them feasible in large-scale WirelessHART mesh networks. We evaluate the performance of the enhanced algorithms on the AMD Embedded G-Series Dual-Core T56N platform and our experiment results show that our algorithms can achieve highly reliable routing, improved communication latency and stable real-time communication in large-scale networks at the cost of modest overheads in schedule generation and device configuration. Texas Multicore Technologies’ (TMT) SequenceL™ language is utilized to leverage the multicore capabilities of the AMD Embedded G-Series Dual-Core processor.

KEYWORDS WirelessHART, Wireless mesh network, Network scheduling, Parallel processing, Multicore processor

I.

Introduction

Monitor and control over wireless networks has become an increasingly important technology in process control applications [1], [2], [5], [6], [7], [8], [9]. Several organizations such as the HART Communication Foundation [3] and ISA [10] have developed standards that are being deployed across a wide number of applications in many industries. To be accepted wireless applications must be secure [2] and they must be designed for use in control applications [5]. In order for wireless technology to be accepted in control applications latency must carefully managed. To minimize latency communications need to be organized so that packets are not delayed en-route from source to destination. To make devices easier to maintain battery life must be extended as long as possible. To minimize the energy usage, devices should be kept in a low-power mode as much as possible – this state is referred to as sleep-mode in this paper. At the communication distances typical in sensor networks, listening for information on the radio channel costs about the same as data transmission. The biggest single action to save power is to turn both the radio and the sensor off during idle times, i.e., put the device to sleep. Turning the device off implies advanced knowledge about when the device will be idle. The approach taken by WirelessHART is to configure the device with knowledge about when it should wake up, perform some function, and go back to sleep. This configuration is performed by the Network Manager and is called data link layer scheduling. In WirelessHART data link layer communications are precisely scheduled using an approach referred to as Time Division Multiple Access (TDMA) [3]. The vast majority of communications are directed along graph routes. Scheduling is performed by a centralized Network Manager which uses overall network routing information in combination with communication requirements that devices and applications have provided. The schedule is subdivided into slots and transferred from the Network Manager to individual devices; devices are only provided with the slots for which they have transmit or receive requirements. The Network Manager continuously adapts the overall network graph and network schedule to changes in network topology and communication demand. The Network Manager communicates network management commands through the Network Layer to the network of WirelessHART Devices. The scope of the Network Manager is illustrated in Figure 1. The Network Manager initializes and maintains network communication parameter values. The Network Manager provides mechanisms for devices joining and leaving the network. It is also responsible for managing dedicated and shared network resources. The network layer provides an interface through which network management functions can be invoked. The Network Manager is also responsible for collecting and maintaining diagnostics about the overall health of the network and reporting to host-based applications. By concentrating the management workload to a centralized Network Manager, the WirelessHART standard allows devices to be as less complicated. Less complicated devices consume less energy which in-turn leads to longer battery life. As a consequence, the computation of the Network Manager is more complicated and exponential to the network size. For a large scale network, this could be prohibitive. This paper looks at this problem and investigate running network manger with parallelism on multi-core processors.

The remainder of this paper is organized as follows. Section II briefly describes the WirelessHART network architecture. Section III presents the fundamental synchronization mechanism applied in WirelessHART networks. Section IV provides the details of reliable graph routing and communication schedule construction in WirelessHART and summarizes algorithms for performing this scheduling. Section V presents SequenceL, a declarative, functional language for fully utilizing multicore platforms. Section VI provides an overview of the AMD Dual-Core T56N platform. Section VII describes how we parallelized the scheduling algorithm using SequenceL. Section VIII presents our design and implementation of the test environment. Section IX summarizes our experiment results. We conclude the paper and discuss the future work in Section X.

Figure 1. Network Management Scope

II.

WirelessHART Architecture

Many of the desirable features of the WirelessHART network such as self-healing, self-organization, and redundant routing are achieved through the establishment and updating of a network communication schedule. The Network Manager is responsible for the creation of this schedule and the associated connections. It is also responsible for the distribution of this schedule to the individual devices in the network. This scheduling function may be broken into the following phases: 1. Support devices joining the network. As part of this the Network Manager is responsible for authenticating and orchestrating the join process.

2. Establishment of routes. As part of this the Network Manager is responsible for the creation of routes that can be used by plant automation hosts, gateways, other devices, and the Network Manager itself to perform communications with the application layer in Devices. 3. Schedule data link layer communications. As part of this the Network Manager is responsible for the establishment of Superframes and Slots that the user layer application of a Device may use to transfer process data, alerts, diagnostics and other traffic to the gateway for access by the plant automation host. The Superframes also include slots for network management and the join process. 4. Scheduling control functions. For Devices that are actuators, interlocks, or any device that affects the process, the Network Manager is responsible for the establishments of Routes, Superframes, and Slots that the plant automation host may use to send target values, modes, setpoints, and outputs to the user layer application in field devices. 5. Adapting the network. The Network Manager will continually adapt the network. The Network Manager continually collects data from devices on the health of connections and traffic patterns and uses this information to adjust routing and scheduling. The effectiveness of the overall network ultimately boils down to a combination of routing and scheduling. The services provided in the protocol stack allow network communications to be established in many ways.

Figure 2. Example of a WirelessHART Network

The WirelessHART Network, shown in Figure 2, supports a wide variety of devices from many manufacturers. All of these devices are classified as a Field Device, Router, Access Point, Gateway, or Handheld. A Field Device is any device directly connected to the WirelessHART network. A Field Device transmits and receives WirelessHART packets and performs the basic functions necessary to support network formation and maintenance. A Router Device is a Device that forwards packets from one Device to another. A Gateway Device connects the WirelessHART Network to a plant automation network, allowing data to flow between the two networks. An Access Point is a device that connects the WirelessHART network to a WirelessHART Gateway and Network Manager. A Handheld Device is used in the installation, control, monitoring, and maintenance of Devices. Handheld Devices are portable equipment operated by the plant personnel. The network is formed and managed by a centralized Network Manager. Network synchronization and management is discussed in sections III and IV.

III.

Time Synchronization Mechanism in WirelessHART networks

WirelessHART is a TDMA-based network protocol and every communication in it is timesynchronized. The basic time unit of communication activity is a fixed-length timeslot that is commonly shared by all Devices. The timeslot provides the time base for scheduling process data transmission. The duration of a timeslot defined in WirelessHART is 10 ms which is sufficient for sending or receiving one packet per channel and the accompanying acknowledgement, including guard-band times for network-wide synchronization. Precise time synchronization is critical to the operation of networks based on time division multiplexing. Since all communication happens in timeslots, the Devices must have the same notion of when each timeslot begins and ends, with minimal variation. Several mechanisms are applied in WirelessHART for time synchronization. In a WirelessHART network, time propagates outwards from the Gateway [3]. When a new device joins a WirelessHART network initially, it has no idea what the current time is. For each incoming data link layer packet, the device records Ta, the time when the packet’s first bit arrives. Because of the strict timeslot structure, the device can derive the start of the next timeslot, T, from the packet’s arrival time according to the following formula where TsTxOffset is the offset in the slot to start the preamble transmission. T = T a + 10ms − TsTxOﬀset Synchronization happens not only in the device join process, but also during a node’s normal operation. A receiving node always compares the start time of the incoming data link layer packet and the expected arrival time measured on its own clock. The difference is the drift between their clocks. The receiver includes the difference in the time adjustment field of the corresponding ACK packet. Each node is designated a time source node. Whenever a node receives an ACK from its time source, it will adjust its clock based on the time adjustment field. If the sender is the time source of the receiver, the receiver adjusts its clock directly from the time difference value. Together, these adjustments provide the network-wide time synchronization in WirelessHART mesh networks.

IV.

Reliable Graph Routing and Scheduling

Forming a WirelessHART mesh network is a combination of network layer routing and data link layer scheduling. Routing is first performed to establish uplink, downlink, and broadcast graphs. Scheduling is then used to allocate network resource. There are two methods of routing packets in a WirelessHART Network— source routing and graph routing. A. Source Routing With source routing, pre-configuration of the forwarding devices is not necessary. To send a packet to its destination, the source Device includes in the network layer header an ordered list of devices through which the packet must travel. As the packet is routed, each routing device utilizes the next Device address from the packet to determine the next hop to use. Since packets may go to a destination without explicit setup of intermediate devices, source routing requires knowledge of the complete network topology. B. Graph Routing The Network Manager contains a complete list of Routes, Connections, and Devices. When devices are initially added to the network, the Network Manager stores all Neighbor entries including signal strength information as reported from each Device. The Network Manager uses this information to build a complete network topology – the topology is a not a complete map of the network – a large number of possible (but suboptimal) links have been removed. The topology is put together optimizing several properties including reliability, hop count, reporting rates, power usage, and overall traffic flow. A key part of the topology is the list of connections that connect devices together. A graph in graph routing is a directed subgraph of the network topology. Every graph in a network is associated with a unique Graph Id. To send a packet on a graph, the source Device includes a Graph Id in the packet’s network header. All Devices on the way to the destination must be pre-configured with graph information that specifies the neighbors to which the packets may be forwarded. In a properly configured network, all devices will have at least two devices in the Graph through which they may send packets. The packet travels along the paths corresponding to the Graph Id until it reaches its destination, or is discarded. A device is configured Graph Id with a Connection table. The Connection table contains entries that include the Graph Id and neighbor address. Redundant paths may be setup by having more than one neighbor associated with the same Graph Id. Using Graph Routing, a device routing a packet must perform a lookup in the connection table by Graph Id, and send the packet to any of the listed neighbors. Once any neighbor acknowledges receipt of the packet (data link level acknowledgement), the routing device may release it and remove the packet from its transmit buffer. If an acknowledgement is not received, the device will attempt to retransmit the packet at its next available opportunity. C. Scheduling Each WirelessHART network contains exactly one overall schedule that is created and managed by the Network Manager. The schedule is subdivided into Superframes. Each Superframe is further

subdivided into frame relative links that repeat as the Superframe cycles. Figure 3 shows how devices may communicate in a simple three slot Superframe. Devices A and B communicate during slot 0, devices B and C communicate during slot 1, and slot 2 is not being used. Every three slots, the link schedule repeats. TS0 A->B

TS1 B->C

Cycle N

TS2

TS0 A->B

TS1 B->C

Cycle N+1

TS2

TS0 A->B

TS1 B->C

TS2

Cycle N+2

Figure 3. Example of a Three-slot Superframe The size of Superframes should follow a harmonic chain, i.e., all periods should divide into each other. Examples of harmonic chains are 1, 2, 4, 8, 16, …, and 3, 6, 12, 24, and as well as any other period that conforms to the expression a*bn. A given WirelessHART network may contain several concurrent Superframes of different sizes. A Superframe is a product of both channels and time slots. Multiple Superframes may be used to define a different communication schedule for various groups of devices or to run the entire network at different duty cycles. D. Generating Broadcast and Uplink Graphs Ensuring reliable and real-time communication in WirelessHART mesh networks is critical for many process measurement and control applications. In a typical WirelessHART network, each device has a designated sample rate to publish its process data to the Gateway through multi-hop transmissions. In the other direction, the Gateway sends the control data back to the devices in a periodic manner. To help relay different types of data, the standard defines three types of communication graphs. The network shares one broadcast graph for propagating common control messages and one uplink graph for devices to publish process data. If needed, each device further has a unique downlink graph from the Gateway for forwarding specific control messages to it. In our previous work [1], we abstract the reliability requirements for packet routing defined in the standard, and design efficient algorithms to construct these reliable graphs. We define a broadcast graph to be reliable if and only if each device has at least two parents from which it can receive broadcast messages. This significantly increases the chance for the broadcast messages to be propagated to the entire network. Different from the broadcast graph, the uplink graph is used by the devices to forward their process data to the Gateway with a required sample rate. It is considered reliable if and only if for each device in the network except the Access Points, it has two children to forward its packet to the Gateway. In cases where the communication between the device and one of its children is broken, the process data can still be delivered to the Gateway through the alternative child. When constructing the reliable broadcast graph, our algorithm maintains a set of explored nodes in the network and incrementally adds the remaining nodes into the explored node set by always choosing the

ones with no less than two incoming edges from the explored nodes and having the smallest average number of hops from the Gateway. This process continues until all nodes in the network are explored. Otherwise an error will be reported. This will trigger the Network Manager to execute appropriate recovery actions. The algorithm to construct the reliable uplink graph is designed according to the same principle. Essentially, the algorithm reverses all edges in the original network topology; it constructs the reliable broadcast graph and then reverses all its edges back. E. Generating Downlink Graphs Different from the broadcast and uplink graphs, the downlink graph is one per device. A downlink graph is defined to be reliable if and only if all the intermediate nodes in the downlink graph have two children to relay the messages to the destination device. Constructing efficient and scalable downlink graphs with low configuration cost in WirelessHART networks are challenging. In [1], we proposed the Sequential Reliable Downlink Routing (SRDR) approach. Instead of constructing a completely new graph from Gateway to a device, SRDR lets each node only keep a small local graph to maintain the reliable routing from its parents. The reliable downlink graph to a given node can be constructed by assembling the intermediate nodes’ local graphs together based on a given order. These local graphs can be taken as building blocks in constructing downlink graphs for different destinations, thus existing device configurations can be reused. This will significantly reduce the overall configuration overhead and improve the downlink routing scalability. Based on these routing graphs, the data link layer communication schedule is further constructed. Our approach allows multiple devices to compete for the retry links to the same device, and split the traffic from one device among all its successors, thus reduces the bandwidth allocation on each of them. By designing the communication schedules on the successors so that their combination has the same communication pattern as the original device, the global communication schedule is then spliced into sub-schedules and distributed to the corresponding devices. These sub-schedules work together and guarantee that the periodic process/control data between devices and the Gateway can be forwarded through multi-hops in a timely manner.

V.

SequenceL

SequenceL is a declarative, functional language that was developed over more than 20 years by NASA and Texas Tech University, before being brought to market in 2009 by Texas Multicore Technologies, Inc., under arrangement with Texas Tech [12], [13]. SequenceL was originally designed as a documentation language, and so allows for the direct translation of algorithms into code, without the developer having to first re-cast the requirements into traditional computational paradigms. In addition, SequenceL self-parallelizes, which means that if you express the algorithm in SequenceL, the SequenceL compiler will generate massively highly parallelized C++ code, which is provably raceand deadlock-free. It will also generate GPU code, if desired, to support a GPU, or APU (hybrid CPU/GPU), target platform. Several differences between SequenceL and more traditional languages such as Python are:



Much of the Python code is required for bookkeeping – like setting up loops, keeping track of index variables, etc. The SL code is more task-based, and most operations can be expressed in 1 or 2 lines. Since variables can’t be re-assigned in SL, there’s no bookkeeping involved. Users only use a variable to hold the results of a calculation they are going to want to refer to numerous times, or for clarity/brevity when writing a lengthy code line.



SL functions tend to be extremely short – so it’s often easier to write 2 functions to do 2 similar things, whereas in procedural languages, functions tend to be longer, so they are often overloaded, and passed arguments which alter their exact meaning.



SL functions often follow a pattern of having a top-level function simply call a helper function with the same arguments, where the helper function does all of the work. The reason for the indirection is to allow SL to perform a Normalize-Transpose (NT) operation, which allows sets of arguments to be passed at once. Normally, in this pattern, all of the “programming” happens in the one or two lines of the helper function. The top-level function serves only to implicitly perform the NT, and then call the helper function to do the work. For this reason, very simple operations – those implemented in 1 or 2 lines – often still require a 2nd, 1-line top-level function to implement.



Since it is a declarative language, SL functions tend to be more self-explanatory. For instance, in the SRDR algorithm [1], traditional code such as Python code often iterates over a range of nodes, does some comparison operation, and then builds up a list of elements that were successful to return. This may also be done using the “filter” function in Python, so as to avoid the explicit “for-loop” iteration, but that usually requires defining helper or lambda functions, is complex to read and understand, and essentially just serves to remove the “for-loop.” You’re still really doing the same thing. In SL, the programmer just writes a function to determine whether an element would meet the criterion, passes that function not one element at a time, but the whole set of elements at once, and gets back those that passed. For this reason, much of SL code tends to be successively applied filters. For instance, you can write a “prime” finding program in 1 or 2 lines, just by defining a prime as something that isn’t divisible by a number between itself and 1, and then just passing that function a list of the first n integers. The returned result is all of the primes n and not exact): returnList.append({"node": node, "parentsInS": parentsInS}) return returnList

The same functionality was delivered by 6 lines of SequenceL: // for any nodeID, return its parents from the set of Edges (normally Graph, but may be DLG edges) parentsOf(nodeID, edgeList(1)) := parentsOf_(nodeID, edgeList); parentsOf_(childID, edge) := edge.src when edge.dst = childID; // for any nodeID, return any parents which are members of the set Set parentsIn(nodeID, Set(1), graph) := parentsIn_(parentsOf(nodeID, graph.edges), Set); parentsIn_(parentID, Set(1)) := parentID when subset([parentID], Set); // given a list of nodeIDs, return those who have at least/exactly n parents in set Set atLeastNparentsIn(nodeID, Set(1), graph, n) := nodeID when size(parentsIn(nodeID, Set, graph)) >= n; exactlyNparentsIn(nodeID, Set(1), graph, n) := nodeID when size(parentsIn(nodeID, Set, graph)) = n;

The first two functions in SL correspond exactly to the first function in Python. This is a good example of the top-level/helper function pattern, where parentsOf() function just takes 2 arguments and passes them along to the helper function parentsOf_(), which defines – trivially – that, given a node and an edge, if the edge’s destination is the node, then the edge’s source must be that node’s parent. But given just those two 1-line functions, you can then just pass the top-level function a node and the entire graph, and it will return a list of all of the parents of the node within the graph. The second Python function - nodesWithAtLeastNparentsInS() – is implemented by the other 4 oneline functions in SL. The first 2 functions follow the top-level/helper pattern described above, and serve the same purpose as the embedded function isInS() in the Python code – they just return the subset of a set of nodes that are parents of the target node – essentially intersecting the parents of a node with a set of other nodes. The last 2 functions exemplify another difference between Python and SequenceL. The programmer wanted to be able to find nodes with either “exactly” n parents, or “at least” n parents. Writing in a procedural language, it would have been extremely repetitive to have a different function to do each of those things – in the 10-line Python function, there was only 1 line that differed in the two functions – the other 9 lines were identical - so he added the “exact” argument which would change the meaning of the function. If true, then the function would return exact matches only. If false, the function would return “at least” matches. However, in SequenceL, the entire function was only one line long, so it was simpler to just write one function to do one thing, and another to do the other. Then the developer could just call whichever one they wanted, rather than calling a more generic version, and passing it arguments to control its functioning.

// given a list of ints (or scalars comparible by "

Utilizing Parallization and Multicore Architectures for ... - CiteSeerX

Utilizing Parallization and Multicore Architectures for ... - CiteSeerX

Suggest Documents

OpenMP in Multicore Architectures - CiteSeerX

A multithreaded communication engine for multicore architectures

Issues in embedded single-chip multicore architectures

Program Execution on Reconfigurable Multicore Architectures - arXiv

Issues in embedded single-chip multicore architectures

Parallel Graph Partitioning on Multicore Architectures - Computer ...

Understanding Stencil Code Performance On MultiCore Architectures *

Scheduling Dynamic OpenMP Applications over Multicore Architectures

Towards a Parallel Tile LDL Factorization for Multicore Architectures

A Unified Model for Multicore Architectures - Brown Computer Science

Design Space Exploration for Multicore Architectures: A ... - Google Sites

Resource Management for Multicore Aware Software Architectures of ...

Multifrontal QR Factorization for Multicore Architectures over ... - Hal

Design Space Exploration for Multicore Architectures: A ... - Google Sites

Evaluating Multicore Architectures for Application in High Assurance ...

Memory Hierarchy Issues in Multicore Architectures J. Savage and M ...

The Bulk Multicore Architecture for Improved ... - CiteSeerX

parallel programming models for heterogeneous multicore ... - CiteSeerX

Enhancing Operating System Support for Multicore ... - CiteSeerX

Architecture-Level Thermal Characterization for Multicore ... - CiteSeerX

Application-Level Optimizations on NUMA Multicore Architectures: the ...

Communication Architectures and Experiences for Web ... - CiteSeerX

Enhancing Operating System Support for Multicore ... - CiteSeerX

Architecture-Level Thermal Characterization for Multicore ... - CiteSeerX