96
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
Autonomic and Decentralized Management of Wireless Access Networks S. Schuetz, K. Zimmermann, G. Nunzi, S. Schmid, M. Brunner
Abstract—In this article, we apply autonomic and distributed management principles to wireless access networks. Most interesting is the application of autonomic properties and behaviors including adaptive, aware, and automatic operation in a decentralized setting. In particular, we present a generic and autonomic management architecture for decentralized management of wireless access networks, such as GERAN/UTRAN, E-UTRAN, WiMAX or WLAN. For evaluation purposes, we apply this architecture to the management of a Wireless LAN network, and we evaluate the architecture and some of the autonomic management functions through simulations, a prototype implementation and the setup of a real-world testbed for experimentation with the proposed management approach. Index Terms—Autonomic Management, Decentralized Management, Wireless Networks, Radio Access Networks
I. Introduction Installation and configuration of large wireless networks in heterogeneous environments are challenging, time-consuming and error-prone tasks, even for trained people. Once deployed, such wireless networks require continuous intervention to provide the operational environment required for services to run smoothly, recover from faults, or maximize overall performance of the network. This is particularly difficult, because wireless environments are typically very dynamic. First, the number of mobile devices as well as their mobility and traffic patterns changes constantly in a wireless network. Second, wireless networks often use shared, unlicensed spectrum, like IEEE 802.11a/b/g base stations. As a consequence, different wireless access networks share this unlicensed spectrum in an uncoordinated fashion, which can generate additional disturbance on top of outside interference caused by other electronic equipment. To provision effective and efficient network services under these demanding conditions, continuous network management that adapts to environmental changes both reactively and/or proactively is important. Manual network management, whereby a human operator takes care of the continuous observation, operation, maintenance and optimization, is therefore very costly. Since such high operational costs are not affordable by wireless access providers, the need for automated management functionality strongly arises. Few wireless network technologies already include some adequate management mechanisms. However, even though such functions exist, they are typically limited to managing Manuscript received September 15, 2006; revised March 16, 2007 and August 20, 2007, accepted August 21, 2007. The associate editors coordinating the review of this paper and approving it for publication were J. P. Martin-Flatin and A. Keller. S. Schuetz, G. Nunzi, S. Schmid, and M. Brunner are with NEC Laboratories Europe. K. Zimmerman is with Accenture. Digital Object Identifier: 10.1109/TNSM.2007.070905
physical or link-specific characteristics and do not cover management of higher-layer internetworking operation. For example, existing IEEE 802.11a/b/g WLAN base stations can automatically select the radio channel, transmission power and link speed; however, they cannot intelligently configure link-specific characteristics that require coordination between neighboring base stations beyond what they can immediately observe themselves. Even more so, they are not able to autonomously configure higher-layer settings such as routed IP connectivity. This article describes an autonomic approach to the management of wireless access networks. The advantage of an autonomic solution is that new network elements (e.g., base stations) joining an existing wireless network integrate themselves seamlessly. The rest of the system adapts to their emergence dynamically. This enables wireless networks to automatically configure themselves in accordance to some high-level rules or policies that specify what is desired, not how it is accomplished. These rules or policies define the purpose of the network, its overall goals or business-level objectives. A novel idea of this article is the application of autonomic management paradigms in a decentralized manner. Each component must be able to operate fully autonomously, in a stand-alone fashion. When several components detect or sense each others’ presence, they then start to coordinate their management actions to increase the efficiency and effectiveness of the overall system. Existing approaches to management of wireless networks, even if they expose some autonomic properties, are typically centralized. Traditionally, a central network controller periodically analyzes available information and computes a global configuration for the whole network. It pushes this configuration out to the individual base stations in a piecemeal fashion; alternatively these devices pull their respective configurations from the controller. However, such a centralized approach has several disadvantages. First, it creates a central point of failure, which renders the whole system unusable when the controller fails. Second, a central controller limits scalability, because it represents a bottleneck for processing and communication functions, especially in environments that require frequent configuration changes. Third, it complicates the system, because it introduces additional infrastructure, i.e. the central controller. This article presents a decentralized approach for autonomic management of collaborating network elements. The individual network elements aggregate and share network information. They implement a distributed algorithm that uses the shared information to compute a local configuration at each network elements, which is consistent with the overall network purpose and goal. More specifically, we apply this scheme to the management of wireless access networks, whereby the network elements are base stations. An important feature of the specific system is the wireless monitoring component, which provides the necessary feedback for the autonomic logic to take appropriate management decisions.
1932-4537/07 $25.00© 2007 IEEE
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
97
FIGURE 1. System overview.
Section II of this article presents related works. Section III defines the underlying autonomic principles that guide the design of the wireless management system. Section IV describes the basic functionality and operation of the autonomic and decentralized management architecture and its application to the management of wireless access networks. Section V presents the results of the simulations conducted and the experimentation with our prototype system. Section VI discusses the results and lists the experiences gained through this work, before Section VII finally concludes this article.
II. Related Work Management of wireless networks is possible through centralized, distributed or hybrid solutions. Whereas centralized systems use a single master device to configure the base stations, decentralized, distributed solutions avoid such a single point of failure and collaboratively implement a fully distributed management solution. With any of the three approaches, the challenge is that all wireless base stations must arrive at a consistent, system-wide configuration. This section describes existing approaches for all three paradigms and briefly discusses more recent developments that also follow an autonomic approach. Several companies provide centralized management solutions for groups of base stations [1, 2]. The majority of these systems implement link-layer “wireless switches” that connect base stations that act as wireless bridges to a switched wired network. The link-layer switch implements the management component. This centralized, link-layer approach offers traffic and channel management, as well as policy, bandwidth and access control. However, such centralized link-layer solutions also have drawbacks. Link-layer broadcast domains cannot arbitrarily grow due to the scalability issues associated with broadcast traffic. Additionally, the topology of the wired network may not allow direct connection of the management system to the base stations. Solutions that operate at the network-layer overcome this shortcoming.
Decentralized management solutions are popular to configure mobile ad hoc networks (MANETs) [3]. These management systems typically focus on the challenging task of enabling peer-to-peer communication in highly dynamic, mobile environments [4]. Because of their nature — i.e., every base station decides based on its local scope [5] and not because of a central manager — they are closely related to the autonomic approach presented in this article. Ongoing research efforts [6, 7] attempt to design self-configuring solutions for MANETs. However, in contrast to those approaches, the autonomic solution presented in this article focuses on the configuration of stationary wireless access networks for mobile clients with the primary goal of improving efficiency and performance. With respect to decentralized management of infrastructure-based wireless networks, related work [8, 9] focuses on the auto-configuration of base stations with the goal to achieve the best coverage in a given geographical area. Early results suggest using transition rules that are similar to cellular automata to change the local configuration of a base station when receiving the current states of its neighbors. Although these proposed algorithms can support some of the specific applications that the autonomic approach also implements, such as regulating transmission power, they are not a platform for arbitrary management functions. In contrast, the focus of the work presented herein is to develop an autonomic and decentralized management platform that can support many types of management functions — not limited to the configuration of radio parameters. Hybrid approaches to wireless network management, such as the Integrated Access Point of Trapeze Networks [10], push some functionality from a central system into the base stations, which are therefore slightly more complex than the simple wireless bridges of centralized approaches. Although hybrid systems improve scalability, they do not completely address the drawbacks of centralized systems; e.g., they still have central points of failure.
98
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
FIGURE 2. Node overview and autonomic management framework.
III. Underlying Autonomic Principles The high-level properties that every autonomic system and thus also the proposed autonomic network management system should exhibit in order to self-govern its operation and to fulfill its purpose or goal are automatic, aware and adaptive operations [11]. Automatic operation: An autonomic system must be able to bootstrap itself when it starts and configure its basic functions according to the status and context of its environment, without involving a user or system administrator. Moreover, this also implies for the system to be able to self-control its internal resources and functions. This process consequently requires an autonomic system to anticipate the resources needed to perform its tasks and to acquire and use these resources without involvement of a human. For example, a new base station must integrate and configure itself into an existing wireless network without the involvement of a human administrator. Therefore, the base station must automatically configure its frequency, signal strength, network addresses and routing. Aware operation: To allow an autonomic system to configure and reconfigure itself in a dynamic environment, it is important for the system to be self-aware. The system needs detailed knowledge of its components, resources and capabilities, its current context and status, as well as its relation to other systems that are part of its environment, in order to make informed management decisions. As a result, a key requirement of an autonomic system is a monitoring mechanism that provides the necessary feedback to its control and management logic. Continuous monitoring is necessary to identify if the system meets its purpose. The feedback information forms the basis for adaptation, self-optimization and re-configuration. In addition, monitoring is also important to identify anomalies or erroneous operation in the system and hence provides important input for safety and security mechanisms. Finally, for economic reasons, autonomic systems must also be able to monitor the obtained services of their suppliers
and the offered services for their consumers to control or adhere to the agreed level of service. Adaptive operation: The awareness of an autonomic system allows the system to adapt to changes in the operational context and the environment (guided by its current policies and purpose). Because of this, management in an autonomic system never finishes; the autonomic system continuously adapts by monitoring its components and fine-tuning its operation. For a system to be able to adapt, it must be able to control its internal operations, i.e., its configuration, state and functions. Adaptation will allow the system to cope with temporal and spatial changes in its operational context either in short term (e.g., in case of exceptional conditions such as malicious attacks, faults, etc.) or in long term (e.g., for the purpose of optimization).
IV. Autonomic and Decentralized Management Architecture for Wireless Networks This section describes an autonomic and decentralized management system that builds on the autonomic principles defined in the previous section. It defines the basic system elements, specifies the target management functions, the autonomic monitoring system, and finally introduces the developed self-configuration and self-management approach. We focus on the application of the system to wireless network management, but believe that the same architecture with minimal adaptation is also working for non-wireless networks.
A. Basic Components and Assumptions As illustrated in Fig. 1, the autonomic and decentralized management architecture consists of a set of serving network elements running the decentralized and autonomic management framework. The autonomic management system is distributed across all serving network elements; in the application sce-
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
nario discussed here, the serving elements are the wireless network base stations. Base stations exchange information to have an updated view of the current status of their neighbors. In this way, autonomic functions can make local decisions based on the knowledge of their network context. This process is referred to as “synchronization with neighbors” in Fig. 1. The autonomic management node architecture illustrated in Fig. 2 is integrated into the network elements. It contains a local data store of management data that is synchronized with the local control software and hardware (or can be read on demand) over an application programming interface (API). This API is basically an abstraction layer towards the physical resources of the wireless base stations (which is already present in many advanced platforms today). Moreover, the decentralized management framework includes a dedicated component for synchronizing management information with neighboring network elements. It is assumed that a network element can have different neighbor relationships depending on the application. For example, in Fig. 1 one relationship is based on potential radio coverage overlap, while another one is built among stations that implement a certain feature. A base station has typically one or several radio interfaces and at least one fixed line uplink interface. In many cases the uplink is also wireless, but emulating fixed line interfaces such as ATM over wireless. Due to the higher reliability of the uplink interface, it is assumed that this interface is typically used for the communication at the management plane.
B. Management Functions as Autonomic Applications The primary task of the autonomic and decentralized management system is to coordinate and control the element functions and operation. When applied to a wireless network application, this includes the coordination and control of radio properties, such as frequency assignment and transmission power, among a group of neighboring base stations and to implement system-wide functions, such as load balancing. By exchanging utilization information with neighboring network elements, base stations can distribute load by increasing or decreasing transmission power. A fully loaded base station, for example, can push clients at the edge of its coverage area off to other base stations by lowering its transmission power. An integrated sensing or monitoring sub-system provides the necessary feedback information to the autonomic applications to enable adaptation to changes in the operational environment and context. A second task of the autonomic management system is self-protection of the network elements and thus of the overall network. Self-protection also relies on the integrated sensing or monitoring component. For example, traffic monitoring and analysis can be used to detect potential security threats and informs the management system, which in turn can take the appropriate actions to protect the network element. Protection from rogue network elements and MAC address spoofing are examples of attacks that can be detected and counteracted against in this way in a WLAN environment. The management system blacklists those malicious nodes and disseminates their presence throughout the system, warning the overall wireless network. It is important to note that the proposed management system is a platform for autonomic and decentralized management that can support many other management functions. The platform offers common functionality, such as information exchange, transactional semantics or security functions that can provide many different management capabilities.
C. Autonomic Support Functions The autonomic management system requires a set of supporting functions as part of the framework. First of all, neighbor-
99
hood management is a process that discovers and maintains a list of neighboring network elements. An autonomic element can define different neighbor relationships with other nodes: each application running on the autonomic element is responsible to build such relationships. For example, to define the neighbor relationship in terms of wireless coverage, the system needs to find neighboring base stations, which have overlapping cell areas. Such a list of neighbors is also supported in the management framework of WiMAX [28], where it can be used for handover, but the list needs to be populated through the management interface (as defined by IEEE 802.16g). The delegation of this process to an autonomic module would clearly cut the costs associated with a manual configuration of the neighbor list. Second, the policy system is in charge of guiding the autonomic applications by imposing rules on them, and to configure thresholds or other constraints for the autonomic behavior. Note that the policy system here does not directly interacting with the networking part of the device as usually proposed in the literature [25–27]. Finally, security is of major concern. And again depending on the deployment scenario or the application different mechanisms are required. For certain scenarios co-operative trust systems, which slowly build-up trust between individual entities in an autonomic fashion based on the perceived behavior of other entities, can be used. In such cases, trust is established over a long time in which a neighbor proofs to function correctly and does not try to cheat. In commercially sensitive deployments, trust will be established through configuration of the appropriate keying material and security mechanisms when installing the equipment.
D. Monitoring and Remote Control of Autonomic and Decentralized Processes An autonomic network element is able to change some of its parameters autonomously according to changes of the environment (e.g., based on a change of network conditions or information delivered from neighbor nodes). A network of such autonomic elements promises increased scalability, low installation costs and better performances. However, such solutions introduce new possible causes of failures due to the inherent complexity of a fully distributed configuration algorithms: common problems are loop configurations, oscillations of information changes and network traffic overload. The lack of a sophisticated monitoring system for autonomic processes and remote control functions is expected to cause serious doubts for operators to adopt autonomic solutions in existing networks. Especially in the beginning, when autonomic systems are deployed, it is crucial for operators to be able observe the behavior of the autonomic system and interfere with the autonomic process until they have developed trust in the new technology. For this purpose we propose that an autonomic node should maintain a north-bound interface towards a central monitoring application. This interface should be regarded only as an optional element. Its use is further discussed in paragraph G. In general, monitoring of the autonomic process can be accomplished through observing and filtering local and neighbor synchronization processes (see Fig. 3). The monitoring process should allow tracing of the actual changes in the configuration (i.e., inform the operator about the exact values/parameters that have been changed) or just the frequency of parameter changes and/or samples of the configuration state. The filtering mechanism should allow the operator to block the synchronization of some values/parameters in order to overrule the autonomic process (e.g., to stabilize the autonomic behavior) and to overwrite the synchronization of par-
100
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
FIGURE 3. Monitoring of the decentralized autonomic management process.
ticular settings through manual configuration of the value/parameter. In the remainder of this section we apply the proposed autonomic management system to a wireless network based on IEEE 802.11a/b/g Wireless LAN base stations. Note however that the same algorithms can be adopted for the management of other radio networks, such as 3GPP UTRAN or evolved UTRAN [20].
E. Monitoring of the Wireless Network for Self-Awareness The basic functionality of a wireless monitoring system is the ability to capture and analyze network traffic, e.g., to identify interference or security attacks, and providing the results to the management system. This feedback forms the basis for the autonomic behavior — self-configuration and self-management — of the system. Monitoring of a deployed wireless network helps detecting many causes of problems. For example, inter-channel and cross-channel radio interference can significantly decrease the effective data rate of the network. Monitoring of traffic helps detecting such interference and can hence trigger re-configuration of the wireless network. Traffic measurements can also help to secure and protect wireless networks. For example, analysis of the measurements can detect intrusion attempts and verify that friendly networks are sufficiently protected, i.e., access is authenticated and/or data is encrypted. In addition, the monitoring system can also be used for the self-organization of the distributed management plane, for example to sense the neighboring base stations in the same radio coverage area. One particular challenge for the selforganization of the distributed management plane is the detection of neighboring base stations (with overlapping coverage areas) that are unable to sense each other. Consequently, their configurations will not be coordinated, leading to an
inconsistent overall network configuration. Such base stations should be aware that they are neighbors and coordinate their configurations. Gathering and analysis of feedback information can address such overlap problems. For example, if clients periodically inform the network of other base stations within their radio range (which is already done in 3G networks, but not in WLAN), the management system can update the neighbor relation when a client enters an overlap area, eliminating or at least significantly reducing the overlap problem. Figure 4 illustrates this feedback process. Moreover, the direct feedback from the monitoring system enables detection of interference or spotty coverage and can help identifying rogue base stations, or can aid the location tracking of clients. The autonomic management system presented in this article uses monitors that provide the results of their continuous measurement efforts as feedback to the autonomic control process through storing the information in the local database. However, for some tasks, like detecting neighbors in a WLAN, a base station needs to scan the environment across all different radio frequencies. Since dedicated measurement nodes are expensive, the measurement typically needs to be done on an operational system. In case multiple wireless interfaces are available, one interface could periodically be switched to monitoring mode to scan the different frequency bands. When identifying a potential problem, the system could focus its monitoring efforts on the detected occurrence to track the problem at hand. In many cases a base station only has operational interfaces, but they can still be used in time of no or low activity.
F. Self-Configuration and Self-Management Although each base station should be able to operate in a stand-alone fashion, autonomic management of a wireless net-
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
101
FIGURE 4. Integration of terminal monitors into the management system.
work to provide connectivity to a geographic region requires collaboration among the involved base stations. This collaboration occurs through periodic information exchange across the management interfaces (e.g., the uplink interface), which allows each individual base station to synchronize its local management information database and adapt its configuration consistently with its peers. When a base station starts up, it first enters a probing phase which includes monitoring the environment, and then it configures its interfaces (including the uplink) and self-organizes itself in the radio access network by finding its peers. After the base station has successfully started, it performs a channel scan to detect other base stations in its immediate neighborhood and determines their identifiers. In other words, the base station identifies its current context as well as its relation to other systems of the embedded environment. Finally, it contacts these neighbor stations over its management interface and, after successful authentication and authorization [12], integrates itself into the network-wide information exchange. For the case that adjacent base stations boot up simultaneously, a short randomized delay is introduced to de-synchronize the self-configuring process (this might happen after a power outage event). During the bootstrapping phase, the base-station also autoconfigures its network components for communication, routing and addressing, i.e., it obtains a subnet delegation for its wireless service area and configures its uplink interface, routing table and DHCP server for the wireless network appropriately [13]. Once configured, the base station starts to participate in the information exchanges with its peers, which provide the basis for the collaboration among the base stations and the autonomic management of the overall wireless network. The systems can use different kinds of exchange mechanisms for different types of information. Information that is globally important, such as encryption parameters or attack status, is disseminated throughout the complete wireless network using an epidemic communication mechanism [17]. Information that is of local significance only, such as radio frequencies, transmission power or link utilization, is only disseminated locally among the affected neighboring base stations. This differentiation by information type is crucial for large autonomic systems in order to improve the scalability properties of the system. Information that requires global dissemination to all cooperating base stations is a problem because of scalability issues and because of consistency. System-wide parameters, such as wireless protocol, WLAN SSID, security parameters, or attack status are examples of such global information. The system can efficiently disseminate global information using epidemic communication paradigms. Unlike with simple broadcasting, which does not scale to large networks, with an epidemic approach information updates can be piggy-backed onto the
periodic information exchanges between neighboring base stations. This technique prevents broadcast storms when global state updates are frequent. Disseminating a global configuration change throughout the network in a consistent manner requires transactional semantics. This is a well-known challenge in distributed networks and a wide variety of approaches exist [18]. A distributed locking scheme, whereby a network element that wants to update global information first needs to acquire the lock for that information element before it can disseminate the update, has been used and evaluated in our prototype system (see section V).
G. Central Network Management System Despite the fact that the proposed management scheme is decentralized, and thus allows continuous management through the individual network elements in the event of failures, it is advantageous to have a central component — a centralized NMS — that handles those management functions that are best handled by a centralized entity because of cost, efficiency or administrator interaction. For example, the management of the system policies that drive the autonomic operation of each individual network element is best defined centrally. Hosting this role on a central node, which has a global view of the network domain, is more efficient than allowing each network element to obtain a global view of the network domain. The centralized NMS offers also an ideal point of access for human administrators to interact with the network system. The NMS providing the human machine interface offers, for example, a graphical user interface to allow the human administrators to monitor the network. It also contains some traditional management functions for fault management in case human involvement is required. Particularly important are the monitoring capabilities for human administrators of the autonomic management system at the beginning, when the system is deployed. It is crucial for the administrators to gain confidence in the system through observing and testing its autonomic operation. Finally, in order to allow administrators to express the purpose or intent of the system, the policies that constrain or control the autonomic behavior and decisions are entered into the distributed system through that centralized system.
V. Evaluation This section presents a first evaluation of the management functions through simulation and a feasibility study of the proposed management system through prototype implementation and experimentation.
A. Quantitative Analysis This section presents evaluation results of the characteristics
102
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
30
Time [sec]
25 20 15 10 Mean Standard Deviation
5 0
1 5 10 15 20 30 40 50 60 70 80 Topology Size [Number of Base Stations]
90
100
Connected Networks [%]
FIGURE 5. Initial convergence times of a group of base stations (random topologies considered).
100
Fitted Mean
80 60 40 20 0
0
20
40 60 80 Clients Providing Feeback [%]
100
FIGURE 6. Terminal-supported measurements.
of the autonomic management system. The current simulator is a home-grown Java based simulator. The measurements in this section use a simulator to investigate groups of up to 100 base stations. 1) Initial Self-organization Convergence During the probing, every base station periodically scans, detects and contacts its neighbors to initiate an epidemic information exchange. This operation takes approximately 2 seconds on a single base station. This study investigates how these single operations from each base station contribute on a global scale to complete the epidemic information exchange. In this sense we talk here of “convergence”, i.e. the process of achieving unique common information globally in the network; the convergence time is made up of the single contributions of the probing phase. We are interested to see in particular the behavior of the convergence time with respect to the dimension of the network. The following scenarios simulate a group of base stations that all start up within a few seconds of one another. The experiments measure convergence times of 500 repetitions and calculate mean performance and standard deviations. Each experiment uses a randomly generated, connected base station topology, i.e., the aggregate coverage area of the base station group is not geographically partitioned. The number of base stations is a parameter of the experiment and grows up to 100 in increments of 10, with two additional group sizes of 5 and 15 to investigate behavior for small groups. Results are shown in Fig. 5. For smaller groups of 1-20 base stations, the mean initial self-organization time quickly increases from 17 to approximately 20 seconds. For larger groups of 20-100 base stations, the mean initial self-organization time remains between 20 and 25 seconds. As a result, the network convergence time does not grow significantly in relation to the number of base stations present. Although the simulations only demonstrate this effect for small groups of up to 100 base stations, this trend is expected to continue for larger groups. This behavior results from the fact that the initial con-
figuration of the base stations depends only on neighboring cell information and hence there is no need for interaction on the global scale. The convergence time is thus related to the density of nodes in our simulations, i.e. the number of neighbors. In fact the time depends on the maximum number of neighbors present in the network. In the results shown in Fig. 5, the convergence time becomes asymptotic at around 7 nodes, which is the maximum number of neighbors in our simulations. Next, we evaluate the effect of including terminal-support for detecting neighbors. While feedback from terminal clearly provides additional information to learn the complete topology of the network, it is not clear the effectiveness of this approach. For example some applications require a critical population of users to achieve considerable results. The dimension of population with an autonomic feature, like the feedback from terminals, is a fundamental estimator of the initial costs for the complete autonomic system. The topologies in the simulated scenario above include some base stations with overlapping coverage area but being outside of each others radio range build the basis for this evaluation, like shown in Fig. 4. Such network topologies appear to be partitioned for the distributed management system because some base stations fail to discover their radio neighbors. The experiments measured the percentage of topologies that become closed, i.e. those topologies that are initially partitioned but get connected when integrating external information from the terminals. For each experiment, 1000 repetitions with a separate random topology for each repetition were executed and the mean and standard deviations were calculated. The number of base stations and mobile nodes in the network was fixed: 100 base stations and 500 mobile nodes. The measurement parameter was the percentage of mobile nodes that are providing measurement input and varied from 1 percent to 100 percent. This analysis allows investigating how many external nodes have to provide input such that the overall management system shows significant benefits. Figure 6 shows that with the use of 20 percent of the mobile nodes acting as sniffers, the first networks can be connected. With increasing percentage of sniffers, the percentage of connected network topologies increases more or less linearly. When all mobile nodes act as sniffers, around 50 percent of the network topologies are closed. On a first glance, this result seems to be low. However, it must be considered that only those nodes that are within an overlapping coverage area can provide information unknown by the management system. For the experiment, mobile nodes did not move during the experiments and can therefore only provide information for one position within topology. In other words numbers of Fig. 6 represent a worst case scenario, because user mobility would certainly increase the effect of the terminal feedback. Further investigations should include mobile nodes movement. In that case, it is expected that mobile nodes can provide much more input as they can traverse the coverage area of base stations, and thus have a higher probability to detect overlapping coverage areas. The results shown here highlight the strength of the decentralized approach in term of scalability with a growing number of attending base stations and the effect of terminal support for the initial self-organization. For a detailed evaluation of that scheme see [21]. 2) Channel Allocation Channel allocation is the classical application for wireless LAN deployments, since the automation of this is helping to get an overall better network performance trying to avoid as much as possible interference, specifically showing up in the 802.11b standards.
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
103
FIGURE 7. Quality measure of channel allocation function. FIGURE 8. Power adjustments in the first version. We performed a set of simulations showing the benefit of the scheme and some variants of the algorithm and its behavior under different topologies. The channel allocation mechanisms we used are 1) use the middle channel of the largest distance of known neighbors (middle_dist), and 2) use channel such that it fits three channels away, when this is not possible choose a channel in the middle of the largest distance (best_dist_first). The second algorithm has been used in the implementation presented in section VI.B after we gained the experience with the simulation. As performance metric we use the sum of an estimated interference between access points. For the estimation of the interference we use the overlapping coverage area weighted by a geometric function of the difference of the channels/ frequencies. In the following simulation, we assume 60 seconds simulation time, 50 ms network delay, boot-strap of access points equally distributed within the first second of the simulation. The simulation has been run 50 times for each data point. Figure 7 shows the two mention algorithm, compared to a random allocation of channels. Note that we assume that the neighbors are manually configured with the knowledge of the planned network. This factors out effects of automatic neighbor detection as shown in the previous section. The topology used, is a grid topology, 200 access points, and the randomness denotes the random distance from the original perfect grid as measured with randomness 0. It shows that the middle distance algorithms perform worse, because you give away space in the spectrum, where finally a better allocation would be possible. It is interesting to see that this performance indicator of the network shows the same behavior for the random algorithm and for the autonomic case with respect to the network topology. This implies that a self-configuring algorithm cannot completely compensate the behavior, but performs significantly better than with a random allocation. 3) Distributed Load Balancing Load Balancing is a critical application for an autonomic wireless base station (see [22] for an extensive study). From one side load balancing can dramatically increase the performances of a wireless network, because spare resources are assigned to the overloaded base stations. On the other side, load balancing continuously adapts the configuration of base stations to the current load and an uncontrolled algorithm might lead to instability in the network. For that application, we control the power of the base station in order to change the configuration and allow congested base station getting fewer clients and non-congested should getting more. This is also known as the “cell-breathing method” [23]. Note that depending on the technology and business model better solutions are possible, but this one is relatively simple and shows
good performance and adaptability. We conducted a simulation to verify the dynamicity of a distributed algorithm. The algorithm has been designed through an analysis of use cases: basically certain actions are enforced when an overload condition is met. The simulations analyzed a large hotspot with 20 base stations over 40 minutes. We report here the results of 7 base stations in a connected area over a period for limit of space. Figure 8 shows the change of the variable “transmission power” over time as a result of the autonomic adjustments at the distributed base stations. Around minute 124, a peak of load is generated in the area covered by those base stations. The dynamics of the graph clearly shows that oscillations appear when the algorithm adapts the transmission power to the load conditions. This happens, because a closed loop is present in the entire system: when base station AP5 is overloaded and therefore reduces its transmission power, the neighboring base stations increase their transmission power; these in turn can reach the overload limit as well, and start to reduce the power immediately and so on. Each one of these single actions is certainly correct, but the entire behavior over time is not consistent. In other words the autonomic element is “too adaptive” with respect to the phenomenon under control. To make the system more stable, we enforced an oscillation avoidance mechanism. This is based on a counter that blocks opposite commands within a certain time window. For instance, if a base station has recently increased its transmission power, a command to reduce it again is blocked during this time window. This approach has the advantage of a simple oscillation avoidance mechanism, but at the same time it is effective to reduce the impact of instability at service level. In fact, Fig. 9 shows the behavior of the enhanced mechanism, and it reveals to be much more stable. On the other hand, the performance metrics at service level were marginally affected. The number of users rejected increased by 5.7 percent, while the number of users moved between two base stations was reduced by 79 percent. Clearly, this is a good tradeoff between the maximization of radio resources and performance at service level. Note that moving a user between base stations usually degradates the services for a short period of time.
B. Implementation on commercial WLAN Access Point While a simulation approach was used to acquire some performance and scalability measures of the proposed management system, we also implemented it to run on commercial WLAN Routers. The implementation was used as a proof-ofconcept and a first feasibility analysis, but clearly it could not be used for scalability evaluation with a hundred base stations, as done in paragraph A. Note that that commercial WLAN
104
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
FIGURE 9. Effect of oscillation avoidance.
access point is limited in memory and CPU. The WLAN access point in the decentralized management system is an IP router for a delegated IP subnet, able to operate stand-alone. It needs at least two network interfaces; one to provide wireless services to its clients and a second interface (wired or wireless) for uplink connectivity. The reason being, not to use precious wireless bandwidth for control and management traffic. Base stations automatically distribute the available address space among them, configure subnets for client connectivity on their wireless interfaces and configure the addressing of their wired uplinks. IP auto-configuration occurs through an integrated mechanism that was developed as part of an earlier research effort [13]. 1) Description of the Implementation The hardware platform for the prototype is the Linksys WLAN Router WRT54GS (v1.1) (http://www.linksys.com). It is equipped with a 200 MHz Broadcom 4712 MIPS CPU, 8MB FlashROM and 32 MB of RAM. It offers wireless service through a WLAN 802.11 b/g interface and provides uplink connectivity through a 10/100 Fast Ethernet card. The WRT54GS is — by default — running a Linux operating system, but it can only be accessed via a Web-based management interface. Therefore, we flashed the WLAN Routers with the OpenWRT [24], a small Linux distribution for embedded devices that also supports the Linksys WRT54GS. It provides a cross-compiler toolchain for custom software development and allows access to the WLAN Router via SSH. The implementation is programmed in C and compiled using the OpenWRT cross-compiler. The software comprises several modules. First, a scanning module is responsible to gather information about the immediate radio neighborhood of a WLAN Router. It uses the wireless tools library to access the wireless network interface. A full scan of the 802.11b/g frequency band returns a set of neighboring MAC addresses plus SSIDs, etc. Second, a neighborhood management module takes the scanned information as input to contact the neighboring WLAN Routers and maintain the neighborhood relationships. Third, a communication module is responsible for (de)marshalling messages and sending/receiving management information for synchronization. Fourth, an information management module coordinates the exchange of management information between the WLAN Routers according to the neighborhood relationships. Additionally, it parses the scanned and exchanged information to build up the local management information database. Last, a configuration management module evaluates the
FIGURE 10. Testbed setup.
management information database to derive a suitable configuration for the WLAN Router and sets the wireless parameters for the interfaces appropriately. 2) Testbed Setup Based on the previously described implementation, we setup a testbed to show feasibility of the approach in a real environment. Figure 10 shows the setup with three access points running our autonomic management software, a visualization machine, and one or more (manually controlled) interfering access points. The effects of the autonomic behavior can be monitored by the radio visualizer scanning the network from a central place seeing all the access points. External access points (not shown in the figure) or the interfering base station (under our manual control) trigger the change of behavior of the network and shows how the network is reconfigured when the interfering access point is configured to channels already used by the network. The visualization system shown in Fig. 10 shows the current radio environment. Apart from the usage for demonstration, it also has some other nice feature for monitoring wireless networks and could be built into the access point when enough memory is available, and if a free non-serving WLAN card is available for constant monitoring. Besides the interception of wireless traffic, the measurement system collects additional information from every node that is detected, including the node’s wireless mode (infrastructure or ad hoc), frame and byte counts, associated base station information such as the Service Set Identifier (SSID), physical-layer information such as signal strength and statistical information about higher-layer protocol use. In addition to many other measurement systems, this system also collects sequence number information for each captured frame. Analysis of patterns in the sequence numbers of captured frames is an effective technique to determine the presence of various anomalies and attacks [14, 15]. As said before, those functionalities are possible to put on an access point as well, but requires a more powerful version has we had it, and which is normally not worth building commercially. For intrusion detection, the measurement system maintains also a database of known MAC addresses that it correlates with the IEEE’s “organizationally unique identifier” list [16]. This can help to identify spoofed MAC addresses. If the occurrence rate of new MAC addresses is past a configurable
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
threshold, the autonomic management system is informed to take appropriate action.
VI. Discussion The proof-of-concept implementation and the testbed setup confirmed that the applications designed for the autonomic management system for wireless networks can be supported on commercial access points. This has been done by developing the autonomic management module that interfaces the legacy API of the device, as shown in Fig. 2. The experiences we have gained with that make us confident that the architecture presented in Section IV can also be used for other network management scenarios. For example, the next generation radio access network of the cellular world, i.e. the evolved UTRAN [20], which aims at a highly distributed or flat architecture, could be a good target of such a framework. However, the implementation showed some limitations in the current setup that could appear as potential problems for other autonomic management systems in the wireless domain in the future. The main problem we have experienced was the accuracy of the wireless measurements; the scanning module provided rather unstable results: sometimes it detects some nodes in its radio environment during one scan, but not during the next. We assume that the instability of the scan results could be caused either by a limitation of the wireless-tools library, the WLAN interface driver, or by the hardware itself. Note that the instability is not only due to radio fluctuation, as the problem also occurred between base stations located next to each other, i.e. signal strength being really strong. As the wireless measurements provide the basis to maintain the neighbor relationships, they were also rather unstable, leading to certain problems in reaching a consistent configuration throughout the testbed. We have applied some heuristics to reduce the problems of unstable scan results, e.g. by introducing a “scanning history”. A node is in that case only removed from the list of neighbors if it cannot be detected during a certain timeframe. This naturally keeps nodes in account for a longer time even if they got removed. A good tradeoff must be found, between inaccuracies and timeliness of the information. This experience brings up a new issue that has been neglected so far by many autonomic models, i.e. the trustworthiness and the reliability of the information sensed by the autonomic element. While operations performed remotely on a central station are inherently more static (since they are based on a fixed management interface and other common limitations), a localized decision of an autonomic element is more sensitive to wrong information sensed. We believe that this problem will have bigger impact in the deployment of future autonomic systems, when more applications will be included and merged together. A second problem was due to the fact that the process of scanning the environment, evaluating the results, computing a new configuration and setting the new configuration took overall about 6–7 seconds. Assuming that we used rather highly frequent changes in the testbed environment to see the effects, the new configuration settings might already be based on old data. This effect would even increase due to the fact that the prototype implementation is single-threaded and therefore updates sent by the neighboring base stations are not evaluated during the scanning and reconfiguration process. However, we don’t believe this problem to be severe for real-life systems, as the scanning and reconfiguration process will typically be performed less frequently in a deployed wireless network than in our testbed setup. Another interesting behavior that has been shown with the use of the testbed is the adaptation of the autonomic base stations to the existence of other base station not belonging to
105
the autonomic system. In fact other bases stations were present in the neighboring offices and were polluting the environment, but clearly they could not be involved into the self-adaptation process, because they were not running the autonomic applications. Nevertheless, the important result was that the base stations were able to recognize the “compatible” neighbors through the uplink interface, identify (through sensing) the configuration of the “non-compatible” base stations as fixed, and create an optimized configuration taking in account these fixed configurations as constraints. The ability of autonomic base stations interwork with legacy equipment is an important aspect for the success of the new technology. Since network operators are not keen in sustaining big initial expenses for a complete installation of new feature-added devices, backwards compatibility is essential and allows operators to gradually deploy the new technology. However, as we have learned through our experiments, it is important that autonomic systems are aware of legacy equipment, so that they can adapt and optimize their operation accordingly. The results obtained through the performed simulations are important too. The first one is the proof of scalability for large numbers of self-configuring devices. The asymptotic convergence time is certainly a satisfactory performance for a management process, as the performance of a centralized solution increases typically with the size of the network under control. The advantage of the autonomic approach is that it can break the whole process into local decisions, whose maximum convergence time depends only on the degree of directly collaborating nodes. Another important outcome concerning the cost of deployment has been presented with the results of terminal feedback. While this is a useful feature that compensates the limits of the scanning module, it is of course not feasible to substitute all the terminals in a network. As a result, we provide a worst-case scenario to measure the effectiveness of a reduced population of terminals integrated with the autonomic management system. The positive results, which show already benefits of the scheme with as little as 30 percent of the terminals supporting the scheme, provide incentives for a partial deployment of such newly featured terminals. The load balancing application showed how the autonomic management approach performs in a highly dynamic and mobile environment. The scope beneath this application is that load balancing should not be implemented as standalone functionality, but rather be integrated into a common autonomic engine. We have therefore investigated the advantages and problems of this approach by stimulating and analyzing scenarios where continuous adjustments are required. In these situations the problem of oscillations arises, but the proposed stabilizing mechanism has shown to be effective.
VII. Conclusion This article presents a decentralized, autonomic management architecture and its application to the management of wireless networks. The proposed system is a platform for autonomic management that offers generic mechanisms supporting various management functions. This common functionality includes mechanisms for information exchange, transactional semantics or security functions, which are required to realize many different management capabilities. A novel feature of the autonomic management system is the integrated wireless monitoring component as part of the base stations. This component determines common causes of problems through real-time analysis of live network measurements. The monitored feedback provides the system with the necessary awareness of its status and defines context for autonomic control. The feedback also provides a basis for individ-
106
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 4, NO. 2, SEPTEMBER 2007
ual base stations to automatically bootstrap and manage themselves. The evaluation of the autonomic management systems focuses on the scalability performance of the initial self-organization and the analysis of the various management applications. Although the current autonomic management system implementation is targeted towards WLAN networks, the general idea of decentralized, autonomic management certainly applies to other wireless networks. The current system provides a decentralized management middleware built on generic methods for information dissemination that can be easily adapted to other network technologies and support many different management functions. Based on the experiences with this system, we believe that another degree of generalization can be done. Specifically, the set of autonomic management system components for reuse can be better aligned in a general solution. Furthermore, we have not studied the interworking of the autonomic base stations with an autonomic centralized management system. So far, the centralized management system is only used for configuring the policies, monitoring the autonomic process, but not for any autonomic control based on overall network knowledge. For this we anticipate the problem resulting from interacting control loops with potentially contradicting results and decisions. Finally, as detected in the real implementation, the scanning is not error free. The study of that effects of erroneous measurement results used in an autonomic decision engine is a very interesting future work item to be studied, potentially even in a general way.
Acknowledgement This document is a byproduct of the Ambient Networks and the Autonomic Network Architecture projects, partially funded by the European Commission under its Sixth Framework Program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of said projects or the European Commission.
References [1] Airespace Corporation: Putting the Air Space to Work, White Paper, 2003. [2] Aruba Wireless Networks: Getting a Grip on Wireless LANs, White Paper, 2003. [3] C.-K. Toh, Ad Hoc Mobile Wireless Networks, Protocols and Systems, Prentice Hall Inc., New Jersey, USA, 2002. [4] L. Ji et al., “On Providing Secure and Portable Wireless Data Networking Services: Architecture and Data Forwarding Mechanisms,” Proc. Int’l. Conf. Mobile Computing and Ubiquitous Networking (ICMU’04), Japan, 2004. [5] Advanced Cybernetics Group and Meshdynamics: Challenges for 802.15 WPAN Mesh. White Paper, 2004. [6] H. Zhang and A. Arora, “GS3: Scalable Self-Configuration and Self-Healing in Wireless Networks,” Proc. 21st Annual Symp. Principles of Distributed Computing, Monterey, California, USA, 2002, pp. 58–67. [7] B. Krishnamachari et al., “On the Complexity of Distributed Self-Configuration in Wireless Networks,” Telecommunication Systems, vol. 22, no. 1–4, 2003, pp. 33–59. [8] F. J. Mullany et al., “Self-Deployment, Self-Configuration: Critical Future Paradigms for Wireless Access Networks,” Proc. 1st Int’l. Wksp. Autonomic Commun. (WAC 2005), Berlin, Germany, 2004. [9] L. T. W. Ho, L. G. Samuel, and J. M. Pitts, “Applying Emergent SelfOrganizing Behaviour for the Coordination of 4G Networks Using Complexity Metrics,” Bell Labs Tech. J., vol. 8, no. 1, 2003, pp. 5–26. [10] Trapeze Networks: Defining An Integrated Access Point, White Paper (2004) [11] J. Kephart and D. Chess, “The Vision of Autonomic Computing,” IEEE Computer Mag., 2003. [12] K. Zimmerman, “An Autonomic Approach for Self-Organising Access Points,” Diploma Thesis, University of Ulm, Germany, 2004. [13] S. Tobella, J. J. Stiemerling, M., Brunner, “Towards Self-Configuration
of IPv6 Networks,” Proc. Poster Session of IEEE/IFIP Network Operations and Management Symp. (NOMS’04), Seoul, Korea, 2004. [14] J. Wright, “Detecting Wireless LAN MAC Address Spoofing,” White Paper, 2003. [15] J. Wright, “Layer 2 Analysis of WLAN Discovery Applications for Intrusion Detection,” White Paper (2002). [16] IEEE, IEEE Organizationally Unique Identifier (OUI) List, Dec. 2004. [17] A. Demers et al., “Epidemic algorithms for replicated database maintenance,” Proc. 6th ACM Sympos. on Principles of Distributed Computing, Vancouver, Canada, 1987, pp. 1–12. [18] A. Tannenbaum and M. van Steen, Distributed Systems, Principles and Paradigms, Prentice Hall Inc., NJ, USA, 2002. [19] G. Nunzi, M. Brunner, and S. Schuetz, “Generic Monitoring and Intervention on Self-Configuring Networks, Application Session,” NOMS’06, Vancouver, CA, 2006. [20] H. Ekstroem et al., Technical Solutions for the 3G Long-Term Evolution, IEEE Commun. Mag., vol. 44, no. 3, Mar. 2006. [21] K. Zimmermann et al., “Self-Management of Wireless Base Stations,” Proc. IST mobile summit, Greece, 2006. [22] G. Nunzi, S. Schuetz, and M. Brunner, “Design and Evaluation of Distributed Self-configuring Load-Balancing,” submitted to IM’07, 2007. [23] Y. Bejerano and S.-J. Han, “Cell Breathing Techniques for Load Balancing in Wireless LANs,”,Proc. Infocom ’05, 2005. [24] OpenWRT, http://www.openwrt.org [25] M. Masullo and S. Calo, “Policy Management: An Architecture and Approach,” Proc. 1st IEEE Int’l. Wksp. System Management, LA, USA, Apr. 1993. [26] J. Moffett and M. Sloman, “Policy Hierarchies for Distributed Systems Management,” IEEE JSAC, special issue on Network Management, vol. 11, Dec. 1994. [27] R. Wies, “Policies in Network and Systems Management — Formal Definition and Architecture,” J. Networks and Systems Management, vol. 2, no. 1, 1994. [28] IEEE 802.16g “Air Interface for Fixed and Mobile Broadband Wireless Access Systems — Management Plane Procedures and Services”, Draft 4, 8 Aug. 2006. Simon Schuetz (
[email protected]) received his Diploma degree in Computer Science and Business Administration from the University of Mannheim (Germany) in 2004. He then joined NEC Laboratories Europe in Heidelberg, Germany, where he is mainly working in the context of EU/IST research projects. Simon has published several scientific papers in international workshops, conferences and journals, and is also participating in IETF standardization activities. Kai Zimmermann is currently with Accenture, Munich, Germany. He received a Master of Computer Science (M.Comp.Sc.) from the University of Ulm, Germany. His master thesis work he performed at NEC Europe Ltd. in Heidelberg, Germany. Giorgio Nunzi (
[email protected]) is a Research Staff Member at the NEC Laboratories Europe in Heidelberg, Germany. He is working primarily on network management for radio access networks and he published several scientific papers in the area of network and service management. He is active in 3GPP and member of Technical Program Committees of different conferences, such as IM/NOMS. He received a M.Sc. ("Laurea" degree) in Electronic Engineering from the University of Perugia (Italy) in 2003. Stefan Schmid (
[email protected]) received his M.Sc. in Computer Science from the University of Ulm (Germany) in 1999. He then joined the Computing Department at Lancaster University (U.K.). In November 2002, he graduated from Lancaster University with a Ph.D. in Computer Science. In March 2004, he joined NEC Laboratories Europe, where he is currently leading the Next Generation Networking research group. Stefan is also a member of the IEEE and has published many scientific papers in international workshops, conferences and journals. Marcus Brunner (
[email protected]) is chief researcher at the Network Laboratories of NEC Europe Ltd. in Heidelberg, Germany. He received his Ph.D. from the Swiss Federal Institute of Technology (ETH Zurich), while working in the Computer Engineering and Networks Laboratory (TIK) of the Electrical Engineering Department. He got his M.S. in Computer Science from ETH Zurich in 1994. Aside from the involvement in different national and international projects, his primary research interests include network architectures, programmability in networks, network and service management. He is actively participating in different standards organizations such as the Internet Engineering Task Force (IETF), the Distributed Management Task Force (DMTF), ETSI TISPAN, and 3GPP. Additionally, he is an active member of the network management research community with being in the Organization and Technical Program Committees of major network management conferences such as NOMS, IM, DSOM, Policy Workshop, IPOM. Currently, he is TPC co-chair of NOMS'08. He is in the editorial board of the IEEE Transactions on Network and Service Management and the Journal of Network and Systems Management. Also in the networking area he is in the TPCs of IEEE Globecom, ICC, LCN. He is currently IEEE Globecom'08 symposium co-chair on Next Generation Networking and Internet Protocols.