Lookahead revisited in wireless network simulations - CiteSeerX

6 downloads 0 Views 359KB Size Report
ward similar messages transmitted by other stations within. “hearing” range. Issues of ...... ducted on a Sun Enterprise 6500 with 14 UltraSPARC-II. 400 MHz processors and .... Technical Report 990027, UCLA, Computer Sci- ence Department ...
Lookahead Revisited in Wireless Network Simulations ∗ Jason Liu David M. Nicol Department of Computer Science Dartmouth College Hanover, NH 03755 {jasonliu,nicol}@cs.dartmouth.edu Abstract Rapid growth in wireless communication systems motivates the development of technology supporting the simulation of large-scale wireless systems. However, it is widely recognized that wireless communications do not have substantial “lookahead” needed by conservative synchronization protocols. This paper focuses on identifying and exploiting lookahead for such models. We find lookahead in three ways, exploiting characteristics of low power networks, the transceiver logic, and the way in which protocol stacks are typically constructed. We show how these observations allow a variety of conservative synchronization protocols to take advantage of lookahead, describe a synchronization method we use, and empirically examine the performance this method offers on a large-scale simulation of a sensor network intended for homeland defense scenarios.

1. Introduction The design of a wireless communication systems calls for the evaluation of a large number of tradeoffs, against a backdrop of increasing time-to-market pressure of applying each new technology. The size and complexity of such systems create hard requirements for efficient modeling and analysis tools. A simulation toolset must be fast, and must support the evaluation of very large networks. We are interested in the design and evaluation of largescale ad hoc wireless sensor networks, in the context of emergency response [13]. Because of the obvious difficulty in testing a design before deployment, we are involved in developing an integrated scalable simulation environment for wireless ad hoc sensor networks. Our area of particular research interest is routing. The environment is called ∗ This work was supported in part by DARPA Contract N66001-96C8530, NSF Grant ANI-98 08964, NSF Grant EIA-98-02068, and Dept. of Justice Contract 2000-CX-K001.

Simulator of Wireless Ad hoc Networks, or SWAN, and is implemented on top of DaSSF—our conservative parallel simulation kernel [12]. The system model is that of a large network of sensor stations, each equipped with one or more sensors, and a low-power radio. The radio range is small, e.g. 200 meters. The radio is used to transmit sensor information gathered at the station itself, or to receive and forward similar messages transmitted by other stations within “hearing” range. Issues of power consumption and access control loom large in these designs, and routing strategies have an especially important influence on the system’s performance, and ultimate lifetime. Borrowing design elements from other network simulators, such as WiPPET [11], GloMoSim [3], SSFNet [7], and x-kernel [16], SWAN uses the ISO/OSI stack model and provides a standard abstraction for building, at each sensor station, a protocol graph, which consists of a stack of protocol sessions—each modeled as a stand-alone protocol module. The design has its advantage in that each protocol module is insulated from the intrinsics of other protocol layers and from the particularity of the underlying simulator. Indeed our earlier work with SWAN embedded commercially available ad hoc routing software into the simulator, and ran the routing logic executing the code directly. Such considerations and capabilities are very important as we want to separate protocol design issues from the concerns that are related to performance of the simulator. It is well-known that lookahead is essential to the performance of parallel simulations. Loosely speaking, lookahead is the ability to predict what will happen, or not, in the simulated future. It defines the fundamental asynchrony among various parts of the simulation model. It gives a lower bound on when a logical process may execute actions that affect the state of another logical process. Useful for both optimistic and conservative synchronization approaches, it is absolutely critical for conservative synchronization. Lookahead calculations in wired networks frequently rely upon latency across individual communication channels. These can be substantial, measured in tens or hun-

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

transceiver cannot transmit on the same channel, at the same time, as is a message it is actively receiving. An entire message must be received before further actions can be taken by a sensor station. These three observations can be exploited by a variety of conservative protocols, as we will show. The paper is organized as follows. In Section 2, we discuss related work. Section 3 presents an overview of our integrated simulation environment for wireless ad hoc sensor networks. Section 4 describes the lookahead we’ve identified, conservative protocols that might exploit it, and means of extracting it from the IEEE 802.11 protocol as an example. Section 5 describes a simple synchronization method which uses this lookahead, while experiments using this technique are presented in Section 6. Conclusions are given in Section 7.

dreds of milliseconds. However, in a wireless network the same measure of latency is much smaller—in our case the time for a bit to be generated by electronics in the sender, travel 200 meters in a radio wave, and be recognized by electronics in the receiver—about 6 µsecs. The recognition of the first bit of an incoming message may change the state of the receiver. In such a case, in the absence of information about when next a station might transmit, an idle receiver must be prepared to start receiving a message at any time no greater than 6 µsecs ahead of a potential sender’s simulation time. This places high demands and induces high overhead on a conservatively synchronized parallel simulation. Simulation of wireless communication systems often requires a detailed modeling of the transmission medium. This means that a great deal of computation is involved to determine with some accuracy just how long it takes a transmitted bit to be received, at what power it is received, and what contribution that transmission makes to interference of other transmissions. The computational cost of detailed radio medium models may be necessary for high fidelity modeling, can easily be much larger than the cost of modeling the protocols, and may be quite parallelizable, in which case the limited lookahead is not such a great detriment to performance. This very point was made recently by Takai et al. [18]. However, as pointed out in [1], there is an important difference between design models and verification models; the former may have much reduced requirements for fidelity, and hence computational cost, than the latter. Although it is extremely important to have a very accurate and detailed channel model for verification purposes, when doing preliminary design simulations this level of detail is unwanted, unused, and unnecessary [9]. As a result, it is important to determine how to achieve good parallel performance in models that use simplified signal propagation and interference calculations. Since latency of one bit through the air provides inadequate lookahead, we must look elsewhere.

2. Related Work The GloMoSim [3] project is a clear leader in the area of parallelized wireless simulation. It consists of a set of library modules, each of which simulates a specific wireless communication protocol on the protocol stack. The environment is built with PARSEC, a C-based parallel discreteevent simulation language that supports both optimistic and conservative parallel simulation schemes [17]. Originally, a GloMoSim model was viewed as a collection of entities, each representing a single protocol layer within a mobile station and each capable, in principle, of being a logical process. Based on this design, Meyer and Bagrodia devised an approach to improve lookahead by analyzing the data flow of messages among mobile stations [15]. The key idea behind this “path lookahead” was determinism of message paths through a protocol stack. The best case scenario for path lookahead is when a message moves all the way up, or all the way down the protocol stack, without any dependency loops forming between layers in the graph. Since the protocol layers are representable as logical processes, each layer needs to maintain lookahead values with its neighbors in the stack, and use null messages to advance that lookahead. By contrast, the models and protocols we consider with SWAN have tight feed-back coupling between some of the protocol layers, a characteristic which largely negates any potential for path lookahead. Also, SWAN does not find or exploit parallelism within a protocol stack, so protocol sessions are not individual logical processes; this follows similar strategy as was adopted by later redesign of GloMoSim. Later GloMoSim version defines logical processes that are firmly rooted in a spatial decomposition of the physical domain [18]. The lookahead value for a spatial region is calculated as the minimum of (a) the shortest amount of time needed for a radio signal to propagate completely across the spatial partition, and (b) the processing time required for a

In this paper we emphasize model characteristics which can be used to substantially increase lookahead. One is short radio range. Even though it may take only 6 µsec for a bit to travel from sender to receiver on one hop, if the message’s ultimate destination is out of radio range of its source, that message has to be received, buffered, and retransmitted. This increases the lookahead substantially between stations that are out of direct radio range of each other. A second characteristic is that protocols are often expressed as finite state machines, and that a protocol’s future behavior with respect to the protocols above it and below it in the stack can be tuned to the protocol’s state. This regular structure allows us to define a methodology for passing conditional lookahead all the way from the application layer to the physical layer. A third characteristic is that typically a 2

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

radio to begin transmitting a packet. With this small lookahead, logical processes in the parallel simulator must synchronize frequently, no matter what kind of synchronization protocol is adopted. In [18], Takai et al. showed that a detailed channel propagation model not only has significant impact on the results of the simulation, but also can help achieve good performance for parallel simulation. The problem space we are interested in is slightly, but importantly different. The evaluation of routing protocols in early stages of design does not need the accuracy of detailed radio models, and so the small lookahead which is adequate for GloMoSim’s detailed radio simulations is inadequate for us. GloMoSim and SWAN are among the most current reflections of work on parallelized wireless simulation. Interest in the area has been active in recent years. Emphasis and problems considered vary from project to project. It is, however, important to distinguish simulations of ad hoc networks and those of cellular communications. Cellular networks have different radio characteristics than ad hoc networks, principally because ad hoc networks use very low power radios. Low power has ramifications that we will use to our considerable advantage.

The environment model represents the geographic and radio environment, and can be further divided into RF channel model, terrain model, mobility model, and traffic model. The RF channel model, together with the PHY layer in the node model, simulates radio propagation and interference. The terrain model serves to provide geographic information about the environment. The mobility model describes the process by which mobile stations move within the geographic terrain.1 The traffic model describes the process by which sensor stations generate application-level traffic among themselves. Depending on the application, both mobility and traffic can be built inside the node model’s application layer protocol. It is important to recognize the interactions and dependencies between different models. For example, the geography described by the terrain model determines gross radio propagation characteristics. Data packets follow the data path, traveling from the application layer, down to the PHY layer, and eventually out to the radio channel. The data path is bidirectional. Adjacent protocol layers interact through method calls. Each protocol defines three common methods. The push and pop methods are used to transfer protocol data messages between adjacent protocol sessions. When a protocol session decides to send a packet, it invokes the push method of the protocol session below. Eventually, the packet is sent out from the station and distributed to the neighboring stations within radio range. The receiving station determines whether to accept or drop the packet, as a result of the calculated receiving power of the packet relative to the prevailing interference. An accepted packet will bubble up the protocol stack. When a protocol session decides to send a packet up the protocol stack, the pop method of the protocol session above will be called. The third method is control and is used for transferring control messages between adjacent protocol layers. For example, the behavior of some protocols may depend on the state of a lower protocol layer. The control method of the lower protocol session can be used to query its current state. Also, when the state of the lower protocol session changes, it may invoke the control method of the above session to notify the change. A protocol session may also use timers, which can provide time advancement for a protocol session. A timeout callback method is invoked after a pre-specified simulation time has elapsed. From the perspective of a protocol session, the three methods (push, pop, and control) and the use of timers define the only interactions a session has with the rest of the simulation model. This follows the xkernel design, which insulates the intrinsics of a protocol

3. An Overview of the SWAN Model The research reported in this paper is a result of our interest in the design and evaluation of large-scale ad hoc wireless sensor networks that would be deployed when there is a natural catastrophe or a terrorist attack. Our goal is to develop an integrated simulation environment for wireless communication networks that can be executed efficiently on parallel architectures. SWAN is implemented on top of DaSSF, which is a process-oriented, conservatively synchronized parallel simulator based on C++ [12]. DaSSF’s overriding design goal is to support scalable efficient simulation of large-scale complex systems. DaSSF has been shown to deliver excellent performance in simulations of a variety of communication networks [7, 6].

3.1. The Architecture of SWAN The conceptual framework of SWAN consists of two kinds of interacting sub-models: the node model and the environment model. An instance of the node model represents a sensor station. There can be as many instances as the number of sensor stations in the wireless network. The node model represents a standard abstraction for building the ISO/OSI stack. Each sensor station is represented as a protocol graph, which consists of a stack of protocol sessions, each of which implements the logic of a protocol layer in the ISO/OSI stack.

1 At present, we don’t have mobility in SWAN, though it is not especially difficult to include in our model. For the present discussion we assume that stations do not move.

3 Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

from its supporting simulator. The behavior of a protocol can be therefore represented as a state machine. The state of the protocol session at each sensor station may only change through these well-defined interactions. Such design turns out to be very important as we exploit conditional lookahead based on the interactions between protocol sessions.

has useful insensitivities in its behavior. Once a sensor station begins broadcasting a frame (the basic unit of transmission, e.g. 1000 bytes), it sends the whole frame, regardless of any later transmissions that might simultaneously occur. This gives the sensor station what Lubachevsky called an “opaque period” [14] during which its output behavior is completely unaffected by changes in input. A second characteristic derives from the fact that sensor stations transmit at low power. A transmitter’s signal is detectable only by stations within a relatively short distance of it, such as 200 meters. In such a network, a message originating at one station, ultimately destined (after routing) for another that is a kilometer away will have to travel through multiple hops. In a similar fashion, the state change of a station may have to take several steps before it could affect the state of another station that locates at a distance away. Coupling this fact with the sensor station’s opaque period— the requirement that a frame be completely received independent of later transmissions that might simultaneously occur—we can compute much better distance-based lookahead than a bit’s transit time through air. The time required for a state change at sensor s rippling out through the network, traveling through k intermediate hops, and finally affecting the state at sensor t, is at least k times the time required to transmit a full frame. A third characteristic is that the stack model of protocols supports the expression and propagation of conditional lookahead all the way from the application layer to the physical layer. We will say more about this shortly. For the present discussion, it is enough to say that a protocol stack can maintain a lower bound NTT (Next Time to Transmit) on the time at which the next message transmission will take place, assuming no additional messages are received by the station. The NTT value is always at least as large as the station’s current simulation time. Also, we will assume that, if the station is in the process of transmitting, NTT is at least as large as the time when the transmission will stop.

3.2. RF Channel Model The RF channel model describes the propagation of electromagnetic waves in geographical space. In SWAN, the RF channel model collaborates with the PHY layer of each sensor station to simulate radio propagation and interference. Radio propagation models range from simple calculation of path loss as a function of the distance between sender and receiver, to fairly complicated models that involve signal attenuations calculated from either statistical channel impulse response models or detailed radio ray-tracing methods. Different levels of detail in the simulation of radio propagation and interference exact different costs in terms of computation. In SWAN, radio propagation and interference are calculated inside the PHY layer of each sensor station. This layer approximates the packet loss probability and latency as a function of the radio environment and traffic in the region of the receiver. When a sensor station begins to transmit, every other station within a radius R of the transmitter is notified, where R is the maximum distance a radio signal can travel without too much power loss. Each receiving station determines whether the packet was lost according to the calculated signal receiving power relative to the present interference. It is important to note that the propagation and interference model adopted by SWAN is at the packet level. A message represents an entire packet transmitted between sensor stations. The receiving power of the whole message received is calculated and compared against the total receiving power of other messages in transmission known by the sensor station. The resulting signal-to-noise ratio (SNR) is used to determine whether to accept the packet. While there has been research that deals with finer levels of detail in terms of propagation and interference (e.g. WiPPET [11]), the packet-level model is commonly used in wireless ad hoc simulations and it should be sufficient for our purpose for evaluating routing behavior.

4.2. A Classical View Given these observations we can describe the synchronization problem classically, in terms of shortest paths through a graph. Imagine a graph where every sensor station is represented with three nodes, one called “stack”, one called “transmitter”, and the third called “receiver”. Directed edges describe frame flows. The edge from a station’s stack node to its transmitter node is weighted by the stack’s current NTT value. If stations A and B can communicate, then there is an edge from A’s transmitter node to B’s receiver node and an edge from B’s transmitter node to A’s receiver node, with both edges weighted by the latency of a one-bit transmission between them, L(A, B). We still have to account for messages that are received, and re-

4. Lookahead 4.1. Three Useful Model Characteristics We now identify three problem characteristics that increase our ability to find lookahead, and show how to exploit them. The first one we note is that the PHY layer, which models the transceiver logic of the sensor station, 4

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

transmitted. We do this by assuming that, while a station is receiving a frame, the value NTT is at least as large as the time needed for the frame to be fully received. Since the station cannot transmit while it is receiving, this is a reasonable assumption. However, when the station is not actively receiving a frame, we know that the next frame to eventually come in must be fully received by the station before being transmitted. To account for this delay, we direct an edge from the station’s receiver node to its transmitter node. When the station is not receiving a frame, that edge is weighted by F —the minimal frame transmission time. On the other hand, when the station is receiving a frame, the edge weight is defined to be ∞, since NTT covers this case. Figure 1 illustrates a simple case of this, where stations A, B, and C all communicate with each other, but station D is able to communicate only with C. The figure reflects a system state where no frames are being transmitted. The graph is described in such a way that a lower bound on the time when next a station receives a frame can be found by computing the shortest path among all shortest paths from other stations’ stack nodes, to its receiver node. Viewed in this way, we see that any one of a number of conservative synchronization protocols might be used to control computation of that shortest path (or lower bounds on it). Our sole purpose for describing the synchronization problem graphically is to illustrate that the formulation is ideally suited for the Time of Next Event [8], the Bounded Lag [14], or Distances Between Objects [2] algorithms, which take similar views of the synchronization problem. More localized protocols can exploit bits and pieces of these observations. For example, in the classical null message protocol, station si can offer sj (to which it is connected) lookahead equal to the minimum of (i) N T Ti plus the latency of one bit, and (ii) the transmission time of one full frame. This value expresses the fact that if a frame not presently known to si is to be the next one si transmits, it must first be fully received, but that if N T Ti plus a one-bit latency is smaller than a frame’s transmission time, then N T Ti is unconditionally a lower bound on the next transmission time. The Carrier Null Message algorithm [4] may improve upon this calculation by looking at NTT values of stations in si ’s neighborhood. Later we will describe yet another method we implemented in SWAN, and study its behavior. The more important point is that low power sensor networks have lookahead properties which are exploitable by conservative synchronization algorithms.

munication layer. A protocol session is modeled as a state machine. There are five ways the state of a protocol session S may change: 1. The protocol session above S sends down a protocol message using S’s push method. This may cause S to apply operations on the message, for example, either continuing to push this protocol message down the stack (or sending it out as a packet event to the RF channel if S is the PHY layer), or enqueuing the message for later processing. In any case, the state of S changes. 2. The protocol session below S sends a protocol message up by invoking S’s pop method. The popped protocol message was originally received by the station’s PHY layer. 3. A control message is received from the upper protocol session, through a call to S’s control method. For example, the control message can be used by the upper protocol session to query the current state of S. 4. A control message is received from the protocol session beneath S in the stack, again through a call to S’s control method. This is used by the lower protocol session to notify S of a state change. 5. A timer scheduled by S expires, causing a callback function to execute. As a result, S may change its state. The structure of the protocol stack supports a general methodology for expression and propagation of what we call conditional lookahead. The key observation is that analysis of the stack state might yield a lower bound b on the time of the next transmission, but that receipt of a new frame, as yet unknown, might well cause the station to transmit earlier than b. One can determine whether the conditional lookahead becomes unconditional by considering the earliest time at which a new frame can be fully received. Our conditional lookahead is based on the conditional event approach by Chandy and Sherman [5]. Now suppose that each protocol session maintains a timestamp value called Next Potential Push Time (NPPT) which is a lower bound on when next the protocol session will push a protocol message down the protocol stack, assuming that its state does not first change as a result of a call to pop or a call to control from the layer below. This is just like NTT as we have described it, in that NPPT reflects knowledge of frames (or bounds on their arrivals) from everything above the session in the stack, but is conditioned on no changes affecting it from activity below it in the stack. In principle, then, a protocol session can observe the NPPT value of the session above it, analyze its own state, and can produce its own NPPT value as a result. Therefore, if the application layer is able to make a prediction on when next

4.3. Computation of NTT Each sensor station implements a stack of protocols. Each element of the stack represents a protocol session and implements the logic of the protocol for a particular com5

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

stack

A NTT

receiver F

transmitter L(B,A)

stack

B NTT

receiver F

L(A,C)

stack L(A,C) NTT

L(C,B)

transmitter L(C,B)

stack

C

D

receiver

NTT

F

L(C,D)

transmitter

receiver F

transmitter

L(B,A) L(C,D)

Figure 1. Sensor network synchronization graph. it will generate a transmission, its NPPT will reflect that, the NPPT of the transport layer will incorporate the NPPT of the application layer, and so on, potentially being carried all the way down to the PHY layer. However, if say the NPPT of the Network Layer is large and the Link Layer is aware of a frame being received which will be re-routed, the Link Layer’s NPPT will reflect the transmission time of the frame it is aware of, and not the NPPT of the Network Layer. Since S’s NPPT value is a function of the NPPT value of the session above it and its own state, we need only recompute the NPPT at each of S’s state transitions, or when the session above changes its own NPPT value. We allow for the possibility that a protocol or application developer does not provide substantive NPPT values—it is always the case that S’s NPPT value is at least as large as the simulation clock. Observe that the stack’s overall Next Transmission Time (NTT) is simply the NPPT value of the PHY layer. Since no protocol session exists below it, and receipt of a message is equivalent to a call to pop, it follows immediately that NTT is monotone non-decreasing among all state changes of all sessions in the protocol stack, provided that no additional messages are received at the station.

channel [10]. Its primary access method is called Distributed Coordination Function (DCF). The core mechanism used by DCF is called Basic Access Method, which we describe briefly in the following. A packet can be transmitted from one station to another when the channel is idle. When the packet is received successfully, the receiver must send back an acknowledgment (ACK). This is because the transmitter cannot determine whether a packet is faithfully delivered to its destination by just listening to the channel— it may not be able to observe collisions at the receiver. If the transmitter does not receive an ACK within a certain period of time, it presumes the packet is lost and schedules a retransmission. Before a sensor station initiates a transmission, whether it is a data packet or an acknowledgment packet, it must first check whether there is an ongoing transmission over the channel. The transmission can proceed if the channel is idle for a certain time interval called InterFrame Space (IFS). If the medium is busy, the transmission must be deferred until the end of the ongoing transmission and a random exponential backoff is used to avoid possible collision. The IEEE 802.11 MAC layer protocol model used in our simulator is extracted from the GloMoSim code base, and modified slightly to compute conditional lookahead. It is a fairly complex model that contains detailed state transitions for the DCF method, including both basic access method and extended access method using RTS/CTS (Request-ToSend/Clear-To-Send). Our simplified model contains a total of 12 states. Aside from idle state, the states can be classified into three categories: waiting for timers, waiting for transmissions, and waiting for responses (from other sensor stations). We modified the code by embedding statements that update the NPPT value at state changes.

4.4. Conditional Lookahead for IEEE 802.11 We now describe the stack-based conditional lookahead ideas concretely, in the context of a detailed IEEE 802.11 MAC protocol model. One of the challenges is to think through such a protocol to determine how to change NPPT in such a way that ensures that it does not decrease on a state change caused by a timer firing, or a call to push from above. IEEE 802.11 is a protocol standard for wireless local area networks. The 802.11 MAC protocol provides asynchronous, time-bounded delivery service over a wireless

When the MAC protocol is in one of the “waiting for timers” states, it is waiting for a pre-determined period of time to elapse. For example, the state WFBO specifies that 6

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

the MAC layer is in the backoff state. The backoff timer is set to expire at a time when the backoff value will be decremented to zero. The NPPT value, therefore, can be assigned to the time when the timer expires. The backoff timer will expire if there is nothing to happen before then. One may be concerned about the case where the channel becomes busy during this time as another station starts a transmission. In this case, the timer should stop. We know it is still correct to use the timeout value as NPPT since it is by definition conditional, assuming no additional message to receive by this station. For another example, the MAC layer is in the state WFNAV, when the sensor station is waiting for the channel to become idle again. A Network Allocation Vector (NAV) is constructed when the station overhears a packet transmission over the channel not targeted to it, or when an RTS or CTS frame is received. The sensor station must wait until the end of the current ongoing transmission. After that, if there are packets waiting to be sent in the queue, the station still needs to back off for a random amount of time and wait for an IFS before it starts to transmit. The NPPT can be therefore set to include NAV, the pre-sampled backoff value, and the IFS. The state of the MAC protocol is in one of the “waiting for responses” states if the protocol is waiting for its peer to respond with a packet. There are three states in this category: WFCTS, WFDATA, and WFACK. WFCTS means the station has sent an RTS frame to allocate the channel and is waiting for CTS frame from its peer. Similarly, WFDATA is when it is waiting for the data frame when CTS has been transmitted, while WFACK means that the sensor station is waiting for the ACK frame. In any case, the station is waiting for a response from its peer and, if the response does not come back in time, a retransmission will be scheduled. The NPPT can be set to be the timeout for retransmission. When the MAC layer is transmitting a packet, it is in one of the “waiting for transmissions” states. The NPPT is limited to the current simulation time, at the moment when the protocol enters the state. The PHY layer beneath the MAC layer can take advantage of the situation. There is an opaque period during which the sensor station will not send any packet during the current transmission. The PHY layer protocol uses that to set its own NPPT. When the MAC layer is idle and if the radio channel is idle, the MAC layer will send a packet down as soon as there is a packet pushed down to it from the layer above. The NPPT, in this case, can be derived from the NPPT value of the upper protocol layer. This is the only situation in this model where the NPPT from the upper layer is used. If the layer above the MAC layer does not deal with NPPT, the current simulation time will be used. In our experiments, we will show that the lookahead obtained in this way by using NPPT from application level is essential to the overall lookahead improvement, especially when the MAC layer is

E

B G A C

D

processor boundary

Figure 2. Border and interior stations. idle more often. The benefit from application level lookahead knowledge is, however, limited to times when the network traffic is light.

5. A Simple Synchronization Scheme We have implemented a simple synchronization protocol that exploits the ideas developed in this paper. To begin with, like GloMoSim, we assume that the physical domain has been partitioned into relatively large subdomains, defining a logical process to be all simulation activities associated with one subdomain. Exploiting the low power characteristics of the networks of interest, we assume that the smallest dimension of a subdomain is still so large that a transmitted frame cannot cross it. This allows us to classify each station as being “border” or “interior”, depending on whether the station is close enough to the partition border to directly interact with stations in other partitions. Figure 2 illustrates this idea. The dashed circles around stations A, B, and C indicate the extent of radio interaction; these three stations are all boundary stations, while D, E and G are interior stations (E and D are close enough to the boundary that they could be border stations if some station on the other side were placed to interact with them). Now at any point in simulation time, the processor simulating B and C needs to be concerned about when next station A might begin to transmit, just as the processor simulating A needs to be concerned about when next B or C begin to transmit. Each of these stations has an NTT lookahead value, but that value is conditional. However, if F is the time required to transmit a new frame, we know that within the next F units of simulation time the next frame whose 7

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

the station is set to be αC/8Kn, where α is the busy-ness factor we use to control the frame injection rate to the network. Larger values of α define a higher frame inter-arrival rate and therefore a higher traffic load for the network. In the experiment, we vary both α and K. The application layer serves to model the traffic source. However, it also can provide hints to lower layers about the future time of the next packet to send—through NPPT. In the experiment, we consider both cases whether or not to use the the application layer’s NPPT value. The goal is to study the performance dependency on the application-level knowledge in the simulation. Figure 3 shows the average barrier synchronization window size, with and without using hints from the application layer, as we vary the number of processors, the busy-ness factor α, and the frame size. One way to think of the window size is that it reflects a system-wide lookahead compared with the 6 µsec single bit transmission latency. We first consider the effect of application hints. Comparing the two plots on the left column of Figure 3 to the two on the right, it is clear that with additional information from the application layer, the synchronization window size can be much larger than the one derived solely from the state of MAC and PHY layers. With application hints, a better lookahead is obtained when we decrease the busy-ness factor α. This is expected: the smaller α is, the larger the interarrival time becomes, the better the NPPT value is obtained from the application layer. Also, if the channel is idle more often (with smaller α), the NPPT value from the application can be used more often when the state of the MAC layer is idle while its queue is empty. As we see, when the network is highly saturated (e.g. α = 10), the window size is close to that without using hints at all. The trend is reversed when we do not use the NPPT value from the application layer. As we increase the traffic load, we get a larger synchronization window. This is because the channel becomes busier and there are a greater number of collisions and backoffs, in which case the MAC layer can obtain a better NPPT value knowing that it is holding back on the next transmission. Note that an anomaly is observed when α = 0.1 and the frame size is 1 KB; the y-axis of the top right plot of Figure 3 is drawn with a different scale to magnify it. We conjecture that the anomaly could come from an implementation particularity: a better lookahead is obtained when we closely inspect the idle state of the MAC layer. Another important observation is that, as we increase the number of processors, the synchronization window becomes smaller. This is due to the fact that there are more stations involved in the min-reduction to compute the next window size. Moreover, if we increase the frame size, a better lookahead is obtained. This is because that longer transmission delay is effectively used by the MAC and PHY

transmission is heard across the partition boundary cannot be one that presently resides in any interior node. Thus, if the minimum NTT value among A’s, B’s, and C’s is within F units of the present, then that lookahead value is unconditional. There are a number of ways to exploit this observation. We explore one that is suitable for an architecture that has a relatively small number of processors. This method builds a synchronization window in which all processors can execute without further synchronization. At a global synchronization point at time t, each processor scans its list of border stations, finds the least NPPT value among these, and compares it with time t + F . It offers the least of these values to a global min-reduction, the value of which defines the ending point of the next synchronization window. The main attractions of this method are its simplicity and relatively low overhead. Its main drawback are its scaling properties, for the method essentially serializes all communication that occurs on partition boundaries. A way to use these same ideas but achieve better scaling is to synchronize locally, by having two processors—instead of all—compute the same type of value (minimum of NTT, t + F ) with the minimum taken only over border nodes at the boundary between those two processors. We are also exploring that approach, but report here only on the windowing algorithm.

6. Experiments We now look at experiments of the window-based synchronization approach, implemented the algorithm on top of the DaSSF kernel. Rather than change the synchronization protocol inside the kernel, we used a feature of the library which supports user-level barrier synchronization among processors. The system we study implements the conditional lookahead extensions to 802.11, and uses a simple free-space signal propagation model at the PHY layer. The experiment simulates a geographical region of 10x10 square kilometers, over which we deploy 10, 000 stations uniformly at random. We set the radio range (R) to be 200 meters. We assume 802.11 uses direct sequence spread spectrum (DSSS) at the PHY layer and uses parameters according to the specification. 802.11 applies only the Basic Access Method in our study. The shared radio channel supports a maximum capacity of 1 Mbps. Each station may connect to a number of neighboring stations inside the radio range, and generate frames according to an exponentially distributed inter-arrival time. Each time it sends a frame, the station chooses an immediate neighbor randomly among all its neighboring stations as its peer. The inter-arrival rate depends on the number of neighboring stations. If n is the number of stations inside the transmission range of a station (including itself), C is the capacity of the radio channel, and K is the frame size (in bytes), then the inter-arrival rate of 8

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

With Application Hints (Frame Size = 1 KB)

Without Application Hints (Frame Size = 1 KB) 100

300

Synchronization Window Size (usec)

Synchronization Window Size (usec)

350

250 200 150 100 alpha=10 alpha=1.0 alpha=0.5 alpha=0.1

50 0 2

alpha=10 alpha=1.0 alpha=0.5 alpha=0.1

80

60

40

20

0 4

6

8

2

4

Number of Processors

400

400

350 300 250 200 150 100 alpha=10 alpha=1.0 alpha=0.5 alpha=0.1

0 2

8

Without Application Hints (Frame Size = 8 KB) 450 Synchronization Window Size (usec)

Synchronization Window Size (usec)

With Application Hints (Frame Size = 8 KB) 450

50

6 Number of Processors

alpha=10 alpha=1.0 alpha=0.5 alpha=0.1

350 300 250 200 150 100 50 0

4

6

8

2

Number of Processors

4

6

8

Number of Processors

Figure 3. Synchronization window size. In general, the application-level knowledge helps a great deal when the traffic of the network is light. It remains a challenging task to maintain parallel performance when the traffic load is light and the computational granularity is small. When the network traffic load becomes heavier, the MAC layer and the PHY layer take advantage of the better lookahead extracted from protocol state machine. The dependency on the application level knowledge, in this case, is removed. While parallel performance varies with model parameters, we have observed speedups as high as 5 on 8 processors.

layer to derive NPPT. Figure 4 shows the parallel speedup against a sequential simulation with a single event list. The experiment is conducted on a Sun Enterprise 6500 with 14 UltraSPARC-II 400 MHz processors and a total of 7 GB memory. We use GNU C++ compiler with an optimization level of 3. In this experiment, we fixed the frame size to be 1 KB and change the busy-ness factor α for varying traffic load. Better parallel performance is observed when using application-level knowledge to gain larger lookahead. As we reduce the traffic load (by decreasing α), the difference in the average size of the synchronization window becomes larger between the two cases whether or not using the application hints. Smaller window results more frequent synchronization, therefore, larger gap in the speedup curve between each pair as we reduce the traffic load. While it is obvious for the case without application hints that the speedup diminishes, since the lookahead value worsens when α is getting smaller, it seems counter-intuitive that the speedup for the case with application knowledge gets worse as well. We believe that this is due to the decreasing computation load per processor, which is offset by the increasing communication and synchronization overhead as more processors are included.

7. Summary and Future Directions Latency is a common source of lookahead in parallelized simulations of communication networks. However, latency in wireless simulations is tiny. This proves to be a serious challenge when the state of a receiver changes with the recognition of the first bit of a new transmission. Simulations that employ detailed models of the radio are somewhat immune to the problem, as so much computation must be performed to model the environment in a one-bit latency time. However, simulations which do not need such detail in the radio channel are commonly used; for these, other 9

Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE

6

5

[3] L. Bajaj, M. Takai, R. Ahuja, K. Tang, R. Bagrodia, and M. Gerla. GloMoSim: A scalable network simulation environment. Technical Report 990027, UCLA, Computer Science Department, 1999. [4] W. Cai and S. J. Turner. An algorithm for distributed discrete-event simulation – the “carrier null message” approach. Proceedings of the SCS Multiconference on Distributed Simulation, 22(1):3–8, January 1990. [5] K. M. Chandy and R. Sherman. The conditional event approach to distributed simulation. Proceedings of the SCS Multiconference on Distributed Simulation, 21(2):93– 9, 1989. [6] J. Cowie, H. Liu, J. Liu, D. Nicol, and A. Ogielski. Towards realistic million-node Internet simulations. Proceedings of the 1999 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’99), June 1999. [7] J. H. Cowie, D. M. Nicol, and A. T. Ogielski. Modeling the global Internet. Computing in Science & Engineering, 1(1):42–50, 1999. [8] B. Groselj and C. Tropper. The time of next event algorithm. Proceedings of the SCS Multiconference on Distributed Simulation, 19(3):25–9, July 1988. [9] J. Heidemann, N. Bulusu, J. Elson, C. Intanagonwiwat, K. Lan, Y. Xu, W. Ye, D. Estrin, and R. Govindan. Effects of details in wireless network simulation. Proceedings of the 2001 SCS Multiconference on Distributed Simulation, pages 3–11, January 2001. [10] IEEE Computer Society, L. M. S. Wireless LAN medium access control (MAC) and physical layer (PHY) specification, 1997. [11] O. E. Kelly, J. Lai, N. B. Mandayam, A. T. Ogielski, J. Panchal, and R. D. Yates. Scalable parallel simulations of wireless networks with WiPPET: modeling of radio propagation, mobility and protocols. Mobile Networks and Applications, 5:199–208, 2000. [12] J. Liu and D. M. Nicol. DaSSF 3.1 User’s Manual, April 2001. [13] J. Liu, L. F. Perrone, D. M. Nicol, M. Liljenstam, C. Elliott, and D. Pearson. Simulation modeling of large-scale adhoc sensor networks. European Simulation Interoperability Workshop (Euro-SIW 2001), June 2001. [14] B. D. Lubachevsky. Efficient distributed event-driven simulations of multiple-loop networks. Communications of the ACM, 32(1):111–23, January 1989. [15] R. A. Meyer and R. L. Bagrodia. Path lookahead: a data flow view of pdes models. Proceedings of the 13th Workshop on Parallel and Distributed Simulation (PADS’99), pages 12–9, 1999. [16] S. W. O’Malley and L. L. Peterson. A dynamic network architecture. ACM Transactions on Computer Systems, 10(2):110–43, May 1992. [17] B. R., R. Meyer, M. Takai, Y. Chen, X. Zeng, J. Martin, B. Park, and H. Song. PARSEC: A parallel simulation environment for complex systems. IEEE Computer, 31(10):77– 85, October 1998. [18] M. Takai, R. Bagrodia, K. Tang, and M. Gerla. Efficient wireless network simulations with detailed propagation models. Wireless Networks, 7:297–305, 2001.

alpha=10, with hints alpha=10, without hints alpha=0.5, with hints alpha=0.5, without hints alpha=0.1, with hints alpha=0.1, without hints

Speedup

4

3

2

1

0 2

4

6

8

Number of Processors

Figure 4. Speedup (1 KB frame). forms of lookahead must be found. This paper reexamines the problem in the context of a low-power ad hoc sensor network that might be deployed in a homeland defense scenario. We observe three characteristics in this domain which give rise to lookahead on a much larger time-scale. We point out how this lookahead might be exploited by a variety of conservative synchronization protocols. We demonstrate a methodology for expressing and propagating lookahead through a protocol stack—all the way from the application layer to the physical layer—and analyze the IEEE 802.11 protocol in detail to show how to get lookahead from it. Finally, we describe a simple global windowing protocol that uses this lookahead, and study its performance on a model with 10,000 sensor stations. Experiments show that application-level knowledge can play an important role and can be used to derive a large synchronization window. Its importance is obvious when the traffic load of the network is light. A good lookahead in this case is essential to maintaining decent speedup for parallel simulation. When the network traffic is heavy, the reliance on the application-level lookahead is diminished, as the MAC layer and the PHY layer can obtain good lookahead from its state machine. A speedup of 5 out of 8 is observed with only the simple free-space propagation model. Future directions of our research involve performance study that includes real-time scalable routing protocols with larger wireless network models. We are also investigating alternative synchronization methods with better scalability properties.

References [1] L. Ahlin and J. Zander. Principles of wireless communications, 2nd Edition. Studentlitteratur, Lund, Sweden, 1998. [2] R. Ayani. A parallel simulation scheme based on distances between objects. Proceedings of the SCS Multiconference on Distributed Simulation, 21(2):113–8, 1989.

10 Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02) 1087-4097/02 $17.00 © 2002 IEEE