tions a complex state monitoring is preferred. The best solution for this are simulators for wireless networks like ns-2 [12] or GloMoSim [16]. In a simulated test ...
Unified Development and Deployment of Network Protocols ∗ Andr´e Herms and Daniel Mahrenholz University of Magdeburg Universit¨atsplatz 2 D-39106 Magdeburg, Germany {aherms, mahrenho}@ivs.cs.uni-magdeburg.de
Abstract In this paper we describe GEA– an interface that enables the development of event-driven network protocols, their testing in a simulated network and deployment using a single, unmodified code. Normally testing and deployment requires separated implementations which results in a significant development and maintenance overhead. This is due to the different APIs of the underlying systems. As a solution to this problem we propose a common, generic event based interface called GEA. We present its design principles and show by example how to implement a network protocols with it. Finally we compare the execution of the protocol implementation in both environments to show the effectiveness of our approach.
1
Introduction
The development and deployment of a new networking protocol especially for a wireless network is not a one-way process but a cycle of several steps. The design phase results in a formal specification that can be implemented. With this implementation one can test and evaluate the protocol and use the results to revise the design and its implementation. The test and evaluation is more important when the behavior of the environment are not very predictable, which is especially the case for wireless networks. Our project group aims to develop protocols for large scale multi-hop ad-hoc networks that are able to provide quality of service in these highly dynamic communication topologies. For the evaluation of the protocols two challenges arise: Tests should be done in a reproducible way while allowing various setups of the network topology. For scalability tests a high number of wireless nodes is required and for finding rare critical conditions a complex state monitoring is preferred. The best solution for this are simulators for wireless networks like ns-2 [12] or GloMoSim [16]. In a simulated test arbitrary setups can be created that allow extensive testing of various aspects of the protocols. As simulation gives the same results when started with the same inputs, tests are fully reproducible and as the simulated wireless nodes consist only of data their states can be monitored in an efficient way. But certain aspects cannot be evaluated with simulations. The models used in simulations only allow a rough approximation of the characteristics of wireless communication. Especially the modelling of physical characteristics uses very simplified assumptions like line-of-sight propagation in non-disturbed environments. These do not match the real conditions found e.g. in an office environment. For testing, measurements with real hardware in real scenarios seem unavoidable. ∗
This work has been supported by the German Research Foundation (DFG), grant no. NE 837/3-1
1
The consequence is that both methods should be used – simulation and real measurements. Over the last few years we developed RGCP [10]– the Reliable Group Communication Protocol, a realtime capable protocol for a single wireless cell that provides atomic broadcasts. In this time we implemented versions for Windows NT[9], RTLinux[15], Linux, and the network simulator ns-2. From this experience we learned different things. First, although we had a formal specification of the protocol we could not create an implementation that runs on all platforms. This was mainly due to the use of threads in the implementation and other fundamental differences in the underlying APIs. Second, adding new features or experimenting with different parameter settings adds to the divergence of the different implementations. The use of source code version control systems can reduce this effect. But to avoid it, each modification has to be implemented in all versions concurrently. This significantly increases the implementation work. Third and last, as ns-2 does not provide threads, we learned that we could implement the same protocol in a solely event-driven manner. So we can use this as a common ground on all platforms. This experience endorsed us to think about the way we implemented our network protocols. This is especially important as we currently develop protocols for large scale wireless networks with up to 1000 nodes. Due to personnel and equipment constraints, most development has to be done using simulation. But at the same time we are running tests on real hardware to investigate WLAN-specific effects that are not considered in the simulation. Here, a unified implementation improves the development process. So an abstraction layer – GEA– was developed. This also helps in later porting of the protocol to various platforms. The remainder of this paper is structured as follows. Section 2 discusses some related work. Section 3 presents GEA as a platform for event-driven protocol development and deployment. In section 4 we illustrate the development process with an example protocol and show measurement values comparing different deployed versions. Section 5 finally gives an outlook on our next activities regarding GEA.
2
Related Works
The idea of an unified protocol implementation is not new and there are already several solutions for this problem. Nsclick [11] is a combination of the popular network simulator ns-2 and the Click modular router [6], a software library that enables to construct routing software from basic building blocks. They use the Click language to specify routing protocols and execute it on top of ns-2 or the native click execution environment. The main difference to our approach is their additional abstraction layer which restricts the user to certain protocol primitives. Another possible solution is network emulation or real-time network simulation. There a simulator is used to model the properties of a real network and the deployment version of the protocol is run on top of this network. A number of network emulation tools are available that simulate the network with different detail levels. Tools that do not require modifications in the program under test are Dummynet [13], x-Sim [2], the Hitbox pseudo device [4], and ONE (Ohio network emulator) [1] which capture TCP/IP traffic and simulate effects like propagation delays, link bandwidth, and packet drops. A more sophisticated solution is NIST Net [3] that supports a wide range of protocols and various effects that can be observed in a live network. An emulation on a more detailed level is possible using the emulation extension of ns-2 [5]. With our own improvements [7, 8] to ns-2 it is even possible to emulate wireless networks in real-time. But as we also showed this requires a lot of processing power to gain high accuracy, thus placing strong restrictions on the scalability of the emulated model. That means that pure simulation is still essential for highly precise simulations of large wireless networks.
2
3
GEA concepts
The development of a new protocol often starts in a simulation environment. But later the code will reside in a productive environment, maybe in the operating system kernel. As explained in the introduction, the code should remain the same when moving from the development to the deployment phase to minimize development costs and to improve maintainability. To do so, GEA provides a common interface for the protocol code, no matter if it runs in a simulator, as an user-level application, or as a driver in the operating system kernel (see figure 1).
Figure 1: System Architecture using GEA There are two fundamental requirements for this common interface. First it has to provide all functions needed to implement a communication protocol. And second it should be as slim as possible, i.e. it should only contain a minimal set of functions, to improve portability and to reduce the overhead introduced by using it. In the next sections we will take a closer look on the functions that are needed to implement communication protocols and afterwards how these functions can be mapped to the functions provided by the different execution platforms.
3.1
Required Functions
As the minimal set of functionality we identified the following items: transmitting data, receiving data, waiting for a specified time, and concurrent waiting for I/O activity with timeouts. Network protocols communicate via data packets. Therefore they need at least some way of sending and receiving data. The communication to the application using the protocol is not considered here because it is not related to the interface to the execution environment. Additionally, for handling packet loss or broken communication links, a timeout mechanism is required. And to implement a timing scheme, a protocol must be able to wait for an arbitrary time period.
3.2
Design and Implementation of GEA
The following subsections discuss how the required functionality can be implemented in a small and efficient manner. 3.2.1
Concurrent Event Handling
The most important point is how the concurrent handling of sending, receiving, and waiting is done. There are several possibilities: threads, polling, or event notification. Polling is really simple to use. The protocol permanently looks if data is available, can be transmitted, or a timeout is reached. But this uses all of the CPU resources. This can be avoided by polling not continuously but periodically. But this way actions are only possible in the polling slots. This leads to a bad responsivity.
3
Threads could be used to allow concurrent actions inside the protocol software. But we considered this as unnecessary. In fact it could really be disadvantageous for our needs. The use of threads includes the handling of synchronization and protection of critical areas. In a complex system this is hard to manage and mistakes often lead to critical bugs. These are hard to track as they depend on the timing behavior of the underlying system. Our solution is based on event notification. There is one central unit – the event handler. The protocol can register for events like receiving a messages or the occurrence of a timeout. The event handler will notify the protocol when something important happens. We decided to use simple callback functions for this – implemented as C function pointers. The resulting mechanism is efficient, as the protocol code only has to work when there is something to do. Synchronization is not required as only one event is processed at a time. The only drawback is that it must be ensured that the event handlers do not spend too much time. Otherwise the delivery of other events is delayed. 3.2.2
Handles and Callbacks
The central system primitives in GEA are handles. A handle is normally associated with an input our output resources. It provides functions for sending and receiving. But they are also used for event notification. Therefore the event handler provides a function waitFor(Handle *h, AbsTime t, Event e, void *data) which allows to register callbacks. The parameter e is a function pointer like event(Handle *h, AbsTime t, void *data). The parameter t defines a timeout. For receiving data a handle is created. After that a callback is registered with waitFor. When data on the handle arrives the callback is executed. The occurrence time and the data pointer specified on the registration are given as parameter. The callback can then retrieve the packet, process it and maybe register other callbacks. The timeout parameter defines when the handle should stop to wait for data. If this happens the callback will also be called. The protocol has to check the status of the handle to detect if a timeout happened. A special case is waiting without an I/O handle. This can be for a self-generated event or for a certain period of time. For the first one GEA provides so called DependHandles. These can be used like others, but are triggered internally by calling complied on it. The second functionality is achieved using a pseudo handle of type Blocker. A waitFor always returns a timeout after the specified time. The event handler called by the timeout can then execute the delayed commands. 3.2.3
Time Representation
As we use GEA for the development of real-time protocols, the handling of time is very important. This involves that the external communication meets timing restrictions. GEA aids to fulfil this by its handling of absolute time values. Many runtime environments use relative values for time representation. Common are calls like sleep(2) for stopping the process execution for 2 seconds from the current time. Critical for realtime systems is that the current time is not always determined. Calling sleep(1) ten times does not result in sleeping 10 seconds. There is always a delay resulting from the processing in between. Theses delays accumulate and can lead to not intended times. We avoid this by using absolute values for time representation. An data type called AbsTime is used for this. The type Duration is used for representing differences between absolute times.
4
3.3
Mapping to the underlying System
GEA provides an abstraction layer between the protocol and the underlying runtime system. Its functions are mapped to the corresponding below. This is done as efficient as possible to avoid overhead. 3.3.1
POSIX with System Call select
The usual way of implementing event based services in POSIX systems is the select system call. It allows to multiplex I/O events by waiting on a set of file descriptors. An optional timeout parameter is used for the whole waiting operation. The mapping of GEA functions to the POSIX system calls is straight forward. As POSIX systems represent every interface by a file descriptor, the GEA handles are directly mapped to them. The read and write operations to a handle are done by the corresponding system calls. The only point where no direct mapping is possible is the timeout mechanism. GEA maintains a timeout for every handle but the select system call allows only one global timeout. This problem is solved by sorting all active timeouts. The time of the first element is used for the select call. As there are normally only a few handles active and the insertion of new timeouts is an O(log n) operation, there is a negligible overhead. Some POSIX compatible systems provide system calls with a more efficient interface than select. These could also be used, but select provides the best portability. 3.3.2
Mapping to ns-2
The ns-2 is a discrete event based simulator. Thus, it already has an event handling mechanism, which can be utilized. Only a few additions are necessary for the use of GEA. An implicit assumption of GEA is that every instance is bound to a network node. Writing to a network handle corresponds to sending a packet from the associated node. In the deployment version this association is obvious. But the ns-2 contains all nodes of the network. So handles are assigned to a node. A transmission in an event handler is implicitly done by the corresponding node. As processing time of network protocols is not considered in ns-2, there is no overhead from the virtual node’s view. Furthermore, the use of GEA does not add noticeable overhead to the simulation because the event scheduler of the simulator is directly used. This is done by using the C++ interface instead of the Tcl interface, so that the overhead for Tcl code interpretation is omitted.
3.4
Unified Implementations
To get a truly unified implementation we need a container that contains the whole protocol code so that it can be initialized and run unmodified in the different execution environments. As we use C++ we defined a set of header files as common interface. The protocol is compiled into a shared library which uses this interface. The implementation is done specific for the underlying system. By using a dynamic shared library loader we bind the protocol to the specific implementation. This is either an extended ns-2 or a simple program for POSIX systems. Especially for the ns-2 this has the advantage that we do not need to rebuild the ns-2 every time we change the protocol code, which can last up to several ten minutes on a modern PC. At least for the ns-2 and POSIX system we can use the same binary for simulation and real system. As an example the start of a protocol instance in the simulator looks like: $node (0) gea start ./multihop.so -l. The counterpart in POSIX is done by: gea start ./multihop.so -l.
5
Loading protocol instances into the Linux or RTLinux kernel requires the use of kernel modules. These are very similar to shared libraries and so will only require recompilation of the unmodified protocol source code.
4
Example Protocol Implementation
To illustrate the usage of GEA to implement, test and deploy a protocol, we will give a simple example. Our example routing protocol will use multi-hop source routing, i.e. the initial sender (master) specifies a list of IP addresses and stores them in a UDP packet together with a hop counter. Every routing node (client) that receives a packet increases the hop counter and then forwards the packet to the next address in the list. The last address in the list is the address of the master. This way we can measure the round trip times if the packet does not get lost on the way.
4.1
Implementation Details
The figures 2 and 3 show excerpts from the implementations of the client and the master class. The excerpts show only the parts of the code that are relevant for the event processing. The actual packet processing and the initialization are left out for better readability. At startup the client creates two UDP handles – hRecv for receiving and hSend for sending packets. Then it starts to wait on hRecv until a packet arrives or a timeout occurs. Both events trigger the method recv event. In this method the client checks (line 3) if it received a packet on the handle. If so, it reads (line 4) and processes the packet. Then it schedules (line 7) a send event (method send event) to be executed when the handle hSend is ready to send. In any case this method schedules (line 10) an event on itself to be executed when the next packet arrives or a timeout occurs. This guarantees the continuous work of the client. 1 2 3 4 5 6 7 8 9 10 11 12
client.cc void Client::recv_event(Handle *hRecv, AbsTime t, void *) { /* ... */ if (hRecv->status == Handle::Ready) { hRecv->read(packet, maxPacketSize); /* ... */ /* prepare transmission */ GEA.waitFor(hSend, t + SendTimeout, send_event, 0); } /* reschedule recv_event */ GEA.waitFor(hRecv, t + RecvTimeout, recv_event, 0); /* ... */ }
13 14 15 16 17
void Client::send_event(Handle *hSend, AbsTime t, void *) { /* ... */ hSend->write(packet, packetSize); }
Figure 2: Client Code The master works in a similar way. But in addition to the client that works aperiodically, it has a periodic element. The master uses three handles – hRecv and hSend for the packet transmission, and the special handle blocker to create a periodic loop. This loop (method periodic event) 6
does two things. First (line 3) it schedules the next activation of itself with a fixed time period. Second (line 5) it schedules the creation and transmission of the next round-trip packet. This event (method send event) creates a new round-trip packet and immediately sends it to the network. It schedules no new events. The method recv event is triggered when the master receives a packet. This happens when a packet completes it round-trip through the network. It then calculates the round-trip time (RTT) and logs it for the statistics. Finally it schedules an event to be triggered again when the next packet arrives or an timeout occurs. The implementations of the client and the master always use timeouts when waiting on the various handles. In this example the timeout is actually needed only to create the periodic loop in the master. But GEA requires the use of timeouts to ensure that no application can block indefinitely. If there are no pending events left in the GEA event queue, the application stops. 1 2 3 4 5 6
master.cc void Master::periodic_event(Handle *blocker, AbsTime t, void *){ /* schedule next period */ GEA.waitFor(blocker, t + periodTime, periodic_event, 0) /* start transmission */ GEA.waitFor(hSend, t + SendTimeout, send_event, 0) }
7 8 9 10 11
void Master::send_event(Handle *hSend, AbsTime t, void *) { /* create new packet */ hSend->write(packet, packetSize); }
12 13 14 15 16 17 18 19
void Master::recv_event(Handle *hRecv, AbsTime t, void *) { if (hRecv->status == Handle::Ready) { hRecv->read(packet, maxPacketSize); /* calculate RTT, ... */ } GEA.recv_event(hRecv, t + RecvTimeout, recv_event, 0); }
Figure 3: Master Code
4.2
Testing
To test our implementations we setup a group of nine PCs running Linux with a kernel 2.4.25. All PCs were equipped with IEEE 802.11b WLAN cards communicating at 11Mb/s in ad-hoc mode. To test the implementations in a disturbed environment, we setup a parallel WLAN cell in the same channel with two hosts communicating at a high packet rate. In the ns-2 simulation we created a wireless scenario matching the physical environment . To adjust the ns-2 error model we had to measure the raw packet loss rate. To do so we used broadcast ICMP packets (ping) as described in [7]. The loss rates varied much with a mean value of about 15%. We used this value as the loss probability in an uniform error model in the simulation. To run the actual experiment we setup one node as master and all other nodes as clients. The master periodically transmitted a multi-hop packet that contained the list of all clients plus the address of the master as the last destination. The master calculated the round-trip time for all packets that completed the round trip. Additionally it recognized all lost packets. We run this experiment for 1000 times in the simulation and the real system. 7
min [ms] max [ms] average [ms]
Linux 29.95 499.38 66.99
simulation 46.08 139.24 68.41
Table 1: Round trip times of multihop packets The results from the test can be seen in table 4.2. We noticed that the round trip times on the Linux system vary more than in the simulation but the average values are similar. There are several reasons for this. First and most important, the ns-2 error model assumes an uniform distribution of the transmission errors. In the live network we expect a more non-uniform error distribution. But we were only able to measure the average error rate there resulting in a comparable number of retransmissions in both environments. This explains the measured average transmission times. In the live network burst of transmission errors and short error-free periods could create the higher variance of the measured times. Second, the scheduler of the Linux system can create delays although there were no other applications running on the system. And last we have to consider the implementation of the simulated WLAN hardware and MAC layer as a reason for the different behavior. We did not investigated this any further because it was not our goal to evaluate the WLAN simulation of ns-2 but rather the performance of our example protocol in both environments. And here we showed that it performed comparable and so shows the effectiveness of GEA.
5
Conclusion and Outlook
In this paper we presented GEA, an event based interface to implement network protocols that then can be simulated in ns-2 and deployed in a productive environment using a single, unmodified source. We explained the advantages of such an implementation with regards to maintenance and development costs. We also showed that these advantages are connected only with a negligible overhead introduced by GEA with provides an efficient mapping from a uniform interface to different runtime systems. Finally we presented an example protocol to proof the effectiveness of our approach. Further work will concentrate on porting to other execution environments, improving the simulation, and making GEA available to legacy applications. The main drawback of the simulation environment is that the processing time of events is not considered. On a real system too long execution times lead to delays of later events. This cannot happen in the simulation. So for a more realistic behavior of the simulated network this should be added. This requires a monitoring mechanism for the event processing times. A second point is the use of legacy applications with GEA. At the moment only programs can be used that are written for GEA. A normal application like a FTP server cannot be used. This is because it uses the normal interface functions provided by the operation system. A redirection of these to the GEA system should be possible. This could be done by trapping system calls or modifying the system libraries. For using GEA protocols on different platforms, the interface must be ported to other systems. One such platform is RTLinux[14] which provides hard real-time capabilities. Alternatively GEA will be ported to the internal Linux kernel interfaces. This way protocols developed with GEA can run in the kernel context.
8
References [1] A LLMAN , M., AND O STERMANN , S. One: The Ohio Network Emulator. Tech. rep., Ohio University, 1996. [2] B RAKMO , L. S., AND P ETERSON , L. L. Experiences with Network Simulation. In Measurement and Modeling of Computer Systems (1996), pp. 80–90. [3] C ARSON , M., AND S ANTAY, D. NIST Net: a Linux-based network emulation tool. SIGCOMM Comput. Commun. Rev. 33, 3 (2003), 111–126. [4] DANZIG , P., L IU , Z., AND YAN , L. An Evaluation of TCP Vegas by Live Emulation. In Proceedings of ACM SIGMetrics ‘95 (1995). [5] FALL , K. Network Emulation in the VINT/NS Simulator. Proceedings of the fourth IEEE Symposium on Computers and Communications (1999). [6] KOHLER , E., M ORRIS , R., C HEN , B., JANNOTTI , J., AND K AASHOEK , M. F. The Click modular router. ACM Transactions on Computer Systems 18, 3 (August 2000), 263–297. [7] M AHRENHOLZ , D., AND I VANOV, S. Real-Time Network Emulation with ns-2. In Proceedings of the The 8-th IEEE International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2004) (Budapest, Hungary, October 2004). [8] M AHRENHOLZ , D., AND I VANOV, S. Adjusting the ns-2 Emulation Mode to a Live Network. Informatik aktuell KiVS 2005 (2005), 205–217. [9] M OCK , M., N ETT, E., AND S CHEMMER , S. Efficient Reliable Real-Time Group Communication for Wireless Local Area Networks. In 3rd European Dependable Computing Conference (Prague, Czech Republic, 1999). [10] N ETT, E., AND S CHEMMER , S. Reliable Real-Time Communication in Cooperative Mobile Applications. IEEE Transactions on Computers 52 (2003), 166–180. [11] N EUFELD , M., JAIN , A., AND G RUNWALD , D. Nsclick: Bridging Network Simulation and Deployment. In Proceedings of the 5th ACM International Workshop on Modeling Analysis and Simulation of Wireless and Mobile Systems (2002), ACM Press, pp. 74–81. [12] The Network Simulator ns-2. http://www.isi.edu/nsnam/ns/. [13] R IZZO , L. Dummynet: a simple approach to the evaluation of network protocols. ACM Computer Communication Review 27, 1 (1997), 31–41. [14] RTLinux homepage. http://www.fsmlabs.com/. [15] VANDERSEE , S. Effiziente Realisierung in SDL spezifizierter Mikroprotokoll-Architekturen. Diploma thesis, Institute for Distributed Systems, University of Magdeburg, 2004. [16] Z ENG , X., BAGRODIA , R., AND G ERLA , M. GloMoSim: A Library for Parallel Simulation of Large-Scale Wireless Networks. In Workshop on Parallel and Distributed Simulation (1998), pp. 154–161.
9