An Active Model-Based Prototype for Predictive ... - Semantic Scholar

1 downloads 0 Views 741KB Size Report
Atropos-MIB and Atropos-LOADPRED-MIB [2]. The SQ surface plot in Fig. 1 shows the predicted traffic load values cached in the SQ of AN-1 as a function of LVT ...
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 23, NO. 10, OCTOBER 2005

2049

An Active Model-Based Prototype for Predictive Network Management Stephen F. Bush and Sanjay Goel

Abstract—If current trends continue, the next generation of enterprise networks is likely to become a more complex mixture of hardware, communication media, architectures, protocols, and standards. One approach toward reducing the management burden caused by growing complexity is to integrate management support into the inherent function of network operation. In this paper, management support is provided in the form of network components that, simultaneously with their network function, collaboratively project and adjust projections of future state based upon actual network state. It is well known that more accurate predictions over a longer time horizon enables better control decisions. This paper focuses upon improving prediction; the many potential uses of predictive capabilities for predictive network control will be addressed in future work. Index Terms—Atropos, network management, network prediction, simple network management protocol (SNMP).

I. INTRODUCTION

I

T IS AN ACCEPTED tenet in system management that greater complexity leads to greater overhead and higher rates of failure. This will be an increasing problem as enterprise management systems become more complex due to rapid advances in more specialized communication media and protocols. To provide a robust network, it must be self-aware; it should sense anomalies in order to correct and protect itself through local interactions. This should be done as an inherent feature of network operation. The focus in this paper is on a framework for distributed projection of management variables, irrespective of implementation. The definition and goal of proactive network management used in this paper is the precise projection of faults as soon as possible before they occur. Atropos has been implemented in an active network [5], which provides a framework for code within packets to execute upon intermediate network nodes. Atropos is a prototype system; an actual production system may use any number of possible alternative implementations. The insight gained into relative increase in speedup and lookahead, as well as prediction accuracy versus overhead, lacking in the current literature, is addressed here. A less flexible implementation in legacy systems might be achieved by building dedicated network component models directly into legacy network devices such as today’s routers. However, these models would be immobile, not easily updated or removed, most likely requiring

Manuscript received May 25, 2004; revised April 14, 2005. S. F. Bush is with General Electric Global Research, Niskayuna, NY 12309 USA (e-mail: [email protected]). S. Goel is with the School of Business, University at Albany, State University of New York, Albany, NY 12222 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/JSAC.2005.854108

the network device to be taken down when models are changed or updated. It would be very difficult for nonactive systems to transmit models within the network in a dynamically changing algorithmic form. A better mechanism for using Atropos to manage legacy networks would be to provide an active network overlay capable of monitoring legacy nodes. Atropos could reside in the active network overlay providing a predictive management service for the legacy network. This has the added benefit of preparing the legacy network for transition to a fully active network. While details of a particular use for predictive management is outside the scope of this paper, examples of potential control decisions using predictive management capability include mobile wireless location management [7] and network security [4]. An Atropos load projection model allows resources and routing to be better managed by anticipating traffic in order to optimize load distribution within the network. This framework allows “What if…?” scenarios to become an integral part of the network. Finally, Atropos-enhanced components are enabled with the ability to protect themselves by taking proactive, evasive action, such as migrating to safe hardware before anticipated disaster occurs. The most significant contribution of this paper is evaluating the ability of the Atropos prototype to couple distributed and parallel simulation with network operation. A prototype framework is proposed into which code representing network component models can be injected. The models operate in a distributed manner, simulating ahead of the real network, but with continuous verification and correction based upon actual network state. In optimistic logical process synchronization techniques, e.g., time warp [15], [21], causality can be relaxed in order to trade model fidelity for speed. The framework introduced here relaxes prediction accuracy for speed. If the system that is being simulated can be queried in real time, prediction accuracy can be verified, and measures taken to keep the simulation in line with actual performance. Section II presents a review of the relevant literature. Section III describes the Atropos system and its architecture. Section IV provides a detailed analysis of different Atropos components, as well as its operational behavior. Section V provides a summary of the paper along with the future research direction. II. LITERATURE REVIEW Previous work in prediction of communication network resources has been motivated by requirements to support adaptation of distributed applications. In order for an application

0733-8716/$20.00 © 2005 IEEE

2050

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 23, NO. 10, OCTOBER 2005

distributed across a network to perform optimally, it must control and adapt to resources such as bandwidth, latency, and processing. Knowledge of future network resource availability allows an application to optimize scheduling of events [19]. An example is determining the best location and time for migration of a service in a distributed application. Several systems and components supporting predictive capability have been designed and implemented to execute on the legacy Internet. Three notable implementations discussed here are Network Weather Service (NWS) [23]–[25], Remos [8], [17] and the Resource Prediction System (RPS) [9], [10]. All these systems have independently determined common high-level architectures in order to meet common sense requirements, namely: scalability, flexibility, robustness, and minimum overhead. These systems implement a framework, while decoupling themselves from actual predictive techniques or applications. They assume common elements, namely: sensors, persistent state, modular models/predictors, and an application API. Sensors in each of the previously developed systems mentioned above are software components designed to monitor network or host resource usage information. Passive sensors monitor and record real-time resource information, while active sensors stimulate the system and monitor the response. For example, an active host CPU load sensor might execute a CPU intensive program solely to record the response. Note that the term “active” in this case should not be confused with active networks, the environment in which Atropos resides. Remos sensors have made extensive use of simple network management protocol (SNMP) information similar to Atropos. Modelers/predictors in the previously developed predictive systems referenced above refer to modular programs that implement algorithms using information provided by sensors for application specific purposes. Atropos allows predictive algorithms to be injected into the network as active packets, which remain resident within the network. Persistent state has been used in previous predictive systems to hold predicted data until needed by modelers/predictors, as well as bind system components with their names and locations. In the Atropos prototype, SNMP is the standard management interface that supports access to both current and predicted management information base (MIB) values. Access may occur from any location within the network. Also, as this is a prototype, our goal is ease and consistency of use. If a significantly better and more ubiquitously deployed management standard were to become available, it could be utilized without affecting the fundamental predictive framework. While Atropos has analogs to the common architectural components in the previous work mentioned above, it differs because it is designed for use within an active network. This paradigm shift, which allows dynamic code changes and insertions deep within the network, enables another dimension in which the overhead of the predictive system is distributed. Legacy predictive systems are restricted to implementation at the application layer of the network and have relatively restricted control of the network itself. Previous predictive systems recognize the tradeoff between the intrusiveness of sensors and predictors and the accuracy of the resulting predictions. Because the Atropos system injects code into the network, it must also be keenly aware of this

issue. The authors of NWS [24] note the potential for sensors detecting their own activity and take steps to avoid it. This problem is also discussed with regard to the Atropos operation. Atropos takes another significantly unique approach because it encourages distributed and parallel prediction algorithms. Atropos collects SNMP information and predicts network behavior as well as distributes the prediction algorithm closer to the site at which samples are collected, minimizing the overhead of data transport to a central prediction site. Because Atropos decouples the predictive framework from the actual algorithm that implements prediction, RPS libraries could be utilized for experimentation and implementation of new prediction algorithms. Atropos has similarities with NWS in regard to adjusting prediction accuracy by continuously comparing predictions with past actual values. NWS uses the approach of running multiple simultaneous predictors and using the best most recent predictor for the next prediction. Atropos has previously avoided this approach because its research is focused upon minimizing model size. However, minimizing model size and prediction error of multiple models running simultaneously would be an interesting application of the minimum description length (MDL) [13], [22] technique. III. ARCHITECTURAL CONSIDERATIONS Atropos provides a framework to accomplish fault prediction using a coupling of concepts from distributed simulation and active networking. Results collected from a single Atropos node for load prediction are shown in Fig. 1. In today’s management systems, a MIB maintains current state values. In Atropos, load is predicted as real-time, called wallclock, advances. Thus, future values are available on the node, as well as current values. In Fig. 1, the local virtual time (LVT) (future time), runs ahead of wallclock time (current time). Predicted load values are refined until wallclock reaches the LVT of a particular value. This data was collected using the SNMP [18], [20] by polling an active execution environment that was enhanced with Atropos. The contents of the Atropos MIB are documented in [2]. Valleys occur between the sampled values in the surface plot shown in Fig. 1 because no interpolation is attempted between samples. A diagonal line on the LVT/Wallclock plane from the right corner to the left corner separates LVT in the past from LVT in the future; future LVT is toward the back of the graph, past LVT is in the front of the graph. Starting from the right-hand corner, examine slices of fixed wallclock over LVT; this illustrates both the past values and the predicted value for that fixed wallclock. As wallclock progresses the system corrects for out-of-tolerance predictions. Thus, LVT values in the past relative to wallclock are corrected. By examining a fixed LVT-slice, the prediction accuracy can be determined from the graph. Consider a line defined by an LVT-slice equal to a value of LVT time units. Load values remain constant for all values of wallclock equal to or greater than time units. The final value, which remains invariant, is the actual value that occurred, while previous values were attempts to refine predictions. A. Atropos Models and Virtual Messages The architecture is designed to utilize benefits of active networking: the ability to utilize fine-grained executable models

BUSH AND GOEL: AN ACTIVE MODEL-BASED PROTOTYPE FOR PREDICTIVE NETWORK MANAGEMENT

2051

Fig. 1. Load prediction application results on a single node.

Fig. 2. Temporal overlay.

in the network to enhance communication. A virtual message is a packet, either active or passive, that carries state information anticipated to exist in the future. A streptichron (from classical Greek meaning to “bend time”) is a virtual message in the form of an active packet facilitating prediction by carrying information in algorithmic form affecting a node’s notion of time. Fine-grained executable models, necessary to represent future behavior, carried by streptichrons are introduced in the form of active packets. The models are called fine-grained because they are intended to represent small portions in space and/or time. Executable code representing future behavior has the potential to be more compact than transmitting equivalent nonexecutable data in a piecemeal fashion. The justification for this can be found in the relationship among compression, prediction, and computation defined via Kolmogorov complexity. Kolmogorov complexity [16] is the optimal algorithmic compression bound of string ; better prediction enables a greater compression ratio of a given model. Applications based upon estimates of Kolmogorov complexity can be found in [4] and [14].

concurrent and parallel operation as possible within the network for projecting future events. Each node processes its portion of the simulation ahead of wallclock time at the same time as other nodes. The Atropos framework enables an individual node to simulate its operation forward in time on multiple nodes. For simplicity, the simpler case of each node simulating its own future is presented. This can be logically viewed as a virtual overlay network running temporally ahead of the actual network. As shown in Fig. 2, a virtual network, modeling the actual network can be viewed as overlaying the actual network. The axis labeled “space” represents the efficient use of large numbers of nodes, while the axis labeled “time” represents the LVT of each node. The overlay shown above the actual system is sliding ahead of the actual system in time. A motivating factor for this approach is apparent when Atropos is viewed as a model-based predictive control technique, where the model resides inside the system to be controlled. The network environment is an inherently parallel one; using a technique that takes maximum advantage of parallelism enhances predictive capability.

B. Temporal Overlay

C. Atropos Architectural Components

The goal of incorporating an optimistic parallel and discrete event simulation as part of network operation is to allow as much

The architecture is comprised of three main components: 1) driving processes (DP); 2) logical processes (LP); and

2052

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 23, NO. 10, OCTOBER 2005

3) virtual messages. From a high-level perspective, driving processes create virtual messages and logical processes process and forward virtual messages. The driving process samples the values to be predicted and generates a prediction. The actual mechanism used for predicting output from any application is application dependent and is a modular, dynamically injected, component. For the experiments in this paper, load and packet destinations along each link were monitored via SNMP. A simple curve-fitting algorithm was used to extrapolate future load within the driving process. The CPU prediction model and performance results, particularly important to active networks, can be found in [11] and [12]. The logical process manages the execution of the virtual overlay on a single node and is primarily responsible for virtual time management of the temporal overlay network, via rollback. Rollback is the process of setting the LVT backward in selected components of the temporal overlay by sending anti-messages stored in the send queue (QS); see [15] for details on the fundamental operation of the rollback mechanism. Two types of messages can induce rollback. These two types of messages are out-of-order virtual message arrivals and out-of-tolerance messages caused by prediction inaccuracy. A tolerance is set on the maximum allowable deviation between predicted values and actual values. If this tolerance is exceeded, a rollback to Wallclock time occurs. The logical process notion of the current time increments based upon the value of the virtual messages receive time (TR) is discussed in detail shortly. A sliding lookahead window is maintained so that a bound exists on the logical processes virtual time progression into the future. Logical processes operate by reacting to the arrival of virtual messages. Fig. 3. Atropos logical process and message structure.

D. Virtual Message Structure Atropos messages contain the send time (TS), receive time (TR), anti-toggle (AA), and the actual message payload itself (M). TR is the time this message is predicted to be valid at the destination logical process. The TS is the time this message was sent by the originating logical process. The “A” field is the antitoggle field and is used for creating an anti-message to remove the effects of false messages as described later. A message also contains a field for the current real time (RT). This is used to differentiate a real message from a virtual message. A message that is generated and time-stamped with current time is called a real message. Messages that contain future event information and are time-stamped with a time greater than the Wallclock are called virtual messages. If a message arrives at a logical process out-of-order or with invalid information, that is, out-oftolerance, it is called a false message. A false message causes a logical process to rollback. The receive queue, shown in Fig. 3, maintains newly arriving messages in order by TR. The receive queue is an object residing in an active node’s SmallState, which is state that is left behind by an active packet. Once a virtual message leaves the receive queue, the virtual time of the logical process, known as LVT, is updated to the value of the TR of that virtual message. E. Operational Description The operation of the system is explained by following the flow of messages from their source, the driving process through

the logical processes. Virtual messages ultimately originate from driving processes, shown in the virtual overlay of Fig. 2, that predict future events and inject them into the system as virtual messages. Following the arrows upward in Fig. 3, real messages enter the physical process and virtual messages enter the logical process. The state of the logical process is periodically saved in the state queue (SQ) shown as the simulation cache in Fig. 3. A particular sample, shown in Fig. 3, contains the value and LVT at which the value existed. These state values are used to restore the logical process to a known valid state when false messages are received. State values are continuously compared with actual values from the physical process to check for prediction accuracy, which in the case of load prediction is the number and arrival time of predicted and actual packets received. If the prediction error exceeds a specified tolerance, shown as in Fig. 3, a rollback to an earlier LVT occurs. Continuing upward along the arrows in Fig. 3, any virtual messages that are generated as a result of the physical process or model computation proceed to the send queue. The QS maintains copies of virtual messages to be transmitted in order of their TSs in order to implement rollback. The QS is required for the generation of anti-messages during rollback. Anti-messages annihilate corresponding virtual messages when they meet to correct for previously sent false messages. Annihilation is simply the removal of both the actual and the anti-message. After leaving the QS, virtual messages travel to their destination logical process.

BUSH AND GOEL: AN ACTIVE MODEL-BASED PROTOTYPE FOR PREDICTIVE NETWORK MANAGEMENT

2053

TABLE I ATROPOS PARAMETERS

Fig. 4.

Experimental configuration.

An important part of the architecture for network management is the fact that the SQ is the SNMP management information base. Unlike legacy SNMP values, these values are expected to occur in the future. The current version of SNMP has no inherent standard mechanism to indicate that a managed object is reporting its future state; in current SNMP operation all results are reported with a timestamp that indicates the current time. In working on predictive active network management there is a need for managed entities to report their state information at times in the future. These predicted times are unknown to the requester. A simple means to request and respond with future time information is to append the future time to all management information base object identifiers that are predicted. The Magician [5] execution environment is used in the implementation described in this work. Atropos messages are encapsulated inside Magician SmartPackets following the active network encapsulation protocol [1] format. Several alternative active network architectures were considered before proceeding. At the time of our platform evaluation, Magician appeared to offer the most programming conveniences in terms of considering support for SmallState as resource control options that would ensure that active packets would not interfere among different users or applications as well as exceed per-node processing requirements. IV. ATROPOS PERFORMANCE RESULTS The Atropos experimental validation configuration for the initial test included in this section is a feed forward network consisting of a host containing the driving process and four intermediate active network nodes containing logical processes, as shown in Fig. 4. AH-1 and AH-2 are host nodes and AN-1–AN-5 are active network nodes. The edges between the nodes represent links between the labeled ports on each node. All nodes are Sun Sparcs running the Magician active network execution environment. The Atropos system parameters were configured as shown in Table I. In this experiment the model is predicting the packet rate at each node, of an audio application generating packets from AH-1. All values plotted in the following graphs were obtained by sampling the appropriate Atropos-MIB and Atropos-LOADPRED-MIB [2]. The SQ surface plot in Fig. 1 shows the predicted traffic load values cached in the SQ of AN-1 as a function of LVT and wall-

clock. As wallclock approaches any given LVT, the predicted load values converge toward the actual load. General operation is illustrated in the next four graphs where all measurements, unless otherwise indicated, are from node AN-4. Fig. 5 shows the reduction in tolerance versus time that is preprogrammed axis is the tolerance that is into each logical process. The demanded between the predicted value and the actual value of an SNMP packet counter. This value is decreased purposely in this experiment in order to create a greater demand over time for accuracy and, thus, create a challenging validation of Atropos system under gradually increasing stress. In Fig. 6 the proportion of out-of-tolerance messages is shown as a funcaxis is the proportion of messages tion of wallclock. The that arrived at a specific node out of tolerance, that is, the actual value exceeded the predicted value by an amount greater than the tolerance setting. As wallclock progresses, the tolerance is purposely reduced causing greater likelihood of messages exceeding tolerance. This is done in order to validate the performance of the system as stress, in the form of greater demand for accuracy, is increased. Fig. 7 shows the prediction error as a function of wallclock. The axis is the difference in the number of packets received versus the number of packets predicted to have been received. This graph verifies that the system is producing more accurate predictions as the demand for accuracy increases. However, the axis of Fig. 8 shows the lookahead decreasing versus wallclock. The expected lookahead time is the difference between wallclock and the LVT at a particular node. The maximum distance into the future that the Atropos system was able to predict, as a function of wallclock, is shown in Fig. 8. The demand for greater accuracy has reduced the distance into the future that the system can predict. Finally, in Fig. 9, speedup, the ratio of virtual time to wallclock of the real system, is shown as a function of wallclock. The speedup is reduced as the demand for accuracy is increased. As previously mentioned, only for purposes of this experiment, the tolerance is being reduced as wallclock progresses, causing the accuracy to increase, while loosing performance in terms of speedup and lookahead. Considering the case when the out-of-tolerance proportion is 0.2, lookahead and speedup decreased due to the tightened tolerance. While it is true that speedup is only slightly faster than one, that is, Atropos is running only slightly faster than wallclock speed, it is also 150 s ahead of real-time after 1000 s; this

2054

Fig. 5.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 23, NO. 10, OCTOBER 2005

Tolerance setting decreases as wallclock increases, thus demanding greater accuracy.

Fig. 6. Demand for greater accuracy causes the proportion of out-of-tolerance messages to increase.

Fig. 7. Predictions become more accurate.

is important because it is allowing the management system to peer 150 s into the future to look for potential problems. All measured values are reset to zero after each change of the tolerance setting. Measured results are presented with no explicit application or direct fitness evaluation. For example, a load prediction error of 200 packets in 500 s shown in Fig. 7 may be large for one application and small for another. Note that all measured values are reset after each change in tolerance shown in Fig. 5. A context independent means of determining predictive benefits of local short-term predictions is needed. Such a fundamental metric appears to be missing from network management in gen-

eral; this question needs to be examined at a more fundamental level and in the context of a general-purpose feature available to all management variables (e.g., SNMP MIB variables) rather than having individual applications reinvent their own prediction/lookahead metrics and mechanisms. MDL principals, namely the sum of model size and model error appear highly relevant in this case. Fig. 10 shows input model accuracy (MA), the tolerance demanded by specific application requirements (TD), model message overhead (MO), and lookahead (LA). In general, one would like LA/MO to be as large as possible for any given TD and MA. One would also like MA/TD, determined by inputs to Atropos, to be large.

BUSH AND GOEL: AN ACTIVE MODEL-BASED PROTOTYPE FOR PREDICTIVE NETWORK MANAGEMENT

Fig. 8.

Fig. 9.

2055

At the expense of lookahead.

At the expense of speedup.

Speedup and lookahead are relative measures; both the actual and predicted values used to determine speedup and lookahead come from the same implementation in these experiments. A constant tolerance yielded a constant speedup and, thus, presented a relatively uninteresting plot. Varying the tolerance allowed us to capture changes over time as a single, more informative plot. An out-of-tolerance proportion of 0.2 corresponds to a prediction error tolerance of between 250 and 31.25 packets. ANEP packets are 1000 bytes long. The expected speedup in this time period was 1.15 and the expected lookahead was 160 s. Thus, is one willing to tolerate between 500 and 250 packets of prediction error with no out-of-tolerance events, or does one require between 250 to 31.25 and suffer a proportion of 0.2 out-oftolerance events? Even with a 0.2 proportion of out-of-tolerance packets, the actual audio output rate had a good quality-of-service (QoS) and predictability when Atropos was enabled. A. Load Prediction Model The goal of these experiments is to examine the feasibility of a coupled network and general-purpose predictive mechanism capable of continuously projecting and correcting predictions for any enterprise management variable. There are almost certainly more efficient and accurate models for many potential traffic types including audio, video, etc.; nothing would preclude a model from being injected into this framework and enhancing its operation. This section examines the load prediction model, which serves as a prototype Atropos model. The load prediction model injected into Atropos for this test is a simple linear regression prediction model. In terms of Kolmogorov Complexity, this is the estimated model hypothesis used to predict load data. The

minimum description length principal seeks to minimize both the model size in combination with model error, both of which are significant overhead factors. As previously discussed, the model is encapsulated within the driving process. The driving process operates by sampling the application’s network traffic input, in this case an active audio application, generating a prediction based upon the model, and transmitting the prediction as an Atropos virtual message. The model had the prediction error shown in Fig. 10. Virtual messages transmitted in this experiment are representations of load from an application generating traffic along the single active packet flow. The Magician execution environment was running on a 100 MB Ethernet LAN with minimal background traffic. The load models in the intermediate active nodes participating in the application flow forward virtual messages using simple estimates of predicted queuing delays based upon previously predicted traffic load at their respective nodes. The logical process at each node maintains the required tolerance for accuracy as explained earlier in this paper by either continuing forward operation or rolling back. Two significant impacts upon performance are: 1) the prediction tolerance setting, which controls the demand for prediction accuracy and 2) the accuracy of the particular set of models injected into Atropos. The frequency of out-of-tolerance rollback will increase given less accurate models or tighter tolerances. A more accurate model should allow for tighter tolerances with fewer rollbacks. Thus, out-of-tolerance rollback frequency is a function of model accuracy and tolerance requirements, as shown by the tolerance setting in Fig. 5, model error in Fig. 10, and rollback frequency in Fig. 6. In [6], it is shown that a Kolmogorov complexity estimator can be used to quantify the relationship among the accuracy of the model, the size of an algorithmically compressed virtual message, and the rollback frequency. B. Atropos Overhead The utility of Atropos to a particular user involves the ability to predict faults contrasted with the overhead required in performing the prediction. If the predicted results are within the user specified error tolerance and the user fully utilizes the predicted results, then the utility for prediction is high. The question of overhead versus benefit becomes one that depends upon the perceived utility of predictive capability and depends significantly upon the manner and application in which it is used. When reliability and survival are at stake, prediction would likely have

2056

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 23, NO. 10, OCTOBER 2005

Fig. 10.

Model load error.

Fig. 11.

Number of virtual messages versus wallclock.

Fig. 12.

Number of anti-messages versus wallclock.

a high utility. If performance were the primary goal and reliability less important, then unless prediction is used to boost performance beyond the overhead it induces, prediction may be perceived to be of lower utility. With regard to performance, efficient information transmission of models, that is, algorithmic representation of information, can be far more efficient than transmitting nonexecutable data [3]. Fig. 11 displays the number of virtual messages versus wallclock and Fig. 12 displays the total number of anti-messages. This value is reset every time the tolerance is tightened (every 5 min in this case). Figs. 11 and 12 are indicative of virtual message overhead, particularly due to rollback. Focusing on the time period during which the proportion of rollbacks was 0.2 shown in Fig. 6, the system maintains a slightly faster than one speedup and a forward projection of approximately 2 min, but at the cost of pre-

cisely the number of additional messages. Fig. 11 is a superset of the messages in Fig. 12. The number of anti-messages is initially large because the system had successfully predicted relatively far into the future; because of this, the out-of-tolerance rollback required emptying a large QS in the form of anti-messages. The goal in this particular instance of the Atropos prototype is to project the number packets received at a node into the future. A perfect model would have no out-of-tolerance error, while a poor model would yield higher error rates. It is the combination of model accuracy and per-node prediction tolerance setting that determines the out-of-tolerance proportion. V. CONCLUSION This paper presented the Atropos architecture and the mechanism by which that architecture provides a network prediction

BUSH AND GOEL: AN ACTIVE MODEL-BASED PROTOTYPE FOR PREDICTIVE NETWORK MANAGEMENT

service utilizing the capability of active networks to inject finegrained models into the communication network. The goal is to reduce the management burden caused by growing complexity by integrating management support into the inherent function of network operation. In this paper, management support is provided in the form of network components that, simultaneously with their network function, collaboratively project and continuously adjust projections of their future state based upon actual network state. The capability of Atropos models, injected into the network as an active application, was shown with regard to modeling load and propagating projected state information. Overhead was presented in terms of additional CPU and bandwidth. Greater demand for prediction accuracy was met at the cost of performance, that is, the ability of Atropos to predict farther into the future. While this paper has focused on network traffic and load prediction, the goal is a general projection service for any set of management variables and supporting models. REFERENCES [1] Active Network Encapsulation Protocol (ANEP), D. S. Alexander, B. Braden, C. Gunter, A. Jackson, A. Keromytis, G. Minden, and D. Wetherall, Eds., Active Networks Group, 1997. [2] Atropos-MIBs. [Online]. Available: http://www.research.ge.com/~bush sf/atropos-mibs.html [3] S. F. Bush, “Islands of near-perfect prediction,” in Proc. Virtual Worlds Simulation Conf.’00 and 2000 Western MultiConf., 2000, pp. 1–6. [4] S. F. Bush and S. C. Evans, “Kolmogorov complexity for information assurance,” GE Corporate Research Development, Tech. Rep. 2001CRD148, 2001. [5] S. F. Bush and A. B. Kulkarni, Active Networks and Active Virtual Network Management Prediction: A Proactive Management Framework. Norwell, MA: Kluwer, 2001. [6] S. F. Bush, “Active virtual network management prediction: Complexity as a framework for prediction, optimization, and assurance,” in Proc. DARPA Active Netw. Conf. Exposition (DANCE 2002), San Francisco, CA, May 29–30, 2002, pp. 534–553. [7] S. F. Bush, V. S. Frost, and J. B. Evans, “Network management of predictive mobile networks,” J. Netw. Syst. Manage., vol. 7, no. 2, pp. 225–246, Jun. 1999. [8] T. De Witt, T. Gross, B. Lowekamp, N. Miller, P. Steenkiste, J. Subhlok, and D. Sutherland, “ReMoS a resource monitoring system for network aware applications,” School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-97-194, 1997. [9] A. Dinda and D. R. O’Hallaron, “An extensible toolkit for resource prediction in distributed systems,” School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-99-138, 1997. [10] P. A. Dinda and D. R. O’Hallaron, “Host load prediction using linear models,” Cluster Comput., vol. 3, no. 4, pp. 265–280, Jun. 2000. [11] V. Galtier, K. L. Mills, Y. Carlinet, S. F. Bush, and A. Kulkarni, “Predicting resource demand in heterogeneous active networks,” in Proc. MILCOM 2001, McLean, VA, Oct. 28–31, 2001, pp. 1–5. [12] , “Prediction and controlling resource usage in a heterogeneous active network,” in Proc. 3rd Annu. Int. Workshop Active Middleware Serv., San Francisco, CA, 2001, pp. 35–44. [13] Q. Gao, M. Li, and P. M. Vitanyi, “Applying MDL to learning best model granularity,” ARXIV:physics/0 005 062, May 23, 2000. [14] S. Goel and S. F. Bush, “Kolmogorov complexity estimates for detection of viruses in biologically inspired security systems: A comparison with traditional approaches,” Complexity J. (Special Issue: Resilient Adaptive Defense of Computing Networks), vol. 9, no. 2, pp. 54–74, Nov.-Dec. 2003. [15] D. Jefferson et al., “Distributed simulation and the time warp operating system,” Comput. Sci. Dept., Univ. California, Los Angeles, CA, 870 042, 1987. [16] M. Li and P. Vitanyi, “Introduction to Kolmogorov complexity and its applications,” Springer-Verlag, New York, 1993. [17] B. Lowekamp, N. Miller, R. Karrer, T. Gross, and P. Steenkiste, “Design, implementation, and evaluation of the remos network monitoring system,” J. Grid Comput., vol. 1, no. 75–93, pp. 75–93, 2003. [18] M. T. Rose, The Simple Book, An Introduction to the Management of TCP/IP Based Internets. Englewood Cliffs, NJ: Prentice-Hall, 1991.

2057

[19] M. Samadani and E. Kalthofen, “On distributed scheduling using load prediction from past information,” in Proc. 14th Annu. ACM Symp. Principles Distrib. Comput., 1996, pp. 261–261. [20] W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, 3rd ed. Reading, MA: Addison-Wesley, 1999. [21] P. Tinker and J. Agra, “Adaptive model prediction using time warp,” in In SCS, 1990, pp. 1–13. [22] C. S. Wallace and D. L. Dowe, “Minimum message length and kolmogorov complexity,” Comput. J., vol. 42, no. 4, pp. 270–283, 1999. [23] R. Wolski, “Dynamically forecasting network performance using the network weather service,” Cluster Computing, vol. 1, no. 1, pp. 119–132, 1998. [24] R. Wolski, N. Spring, and J. Hayes, “The network weather service: A distributed resource performance forecasting service for metacomputing,” J. Future Generation Comput. Syst., vol. 15, no. 5–6, pp. 757–768, 1998. [25] , “Predicting the CPU availability of time-shared UNIX systems,” in Proc. 8th IEEE Symp. High Perform. Distrib. Comput., 1999, pp. 105–112.

Stephen F. Bush received the B.S. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, the M.S. degree in computer science from Cleveland State University, Cleveland, OH, and the Ph.D. degree from Case Western Reserve University, Cleveland, OH. He has been the Principal Investigator for many DARPA and Lockheed Martin sponsored research projects including: Active Networking (DARPA/ITO), Information Assurance and Survivability Engineering Tools (DARPA/ISO), and Fault Tolerant Networking (DARPA/ATO), and most recently, Energy Aware Sensor Networks (DARPA/ATO Connectionless Networks Program). He coauthored a book on active network management, titled Active Networks and Active Network Management: A Proactive Management Framework (Norwell, MA: Kluwer). Before joining GE Global Research, he was a Researcher at the Information and Telecommunications Technologies Center (ITTC), University of Kansas. He has worked many years for industry in the areas of computer integrated manufacturing and factory automation and control. He is an internationally recognized researcher in algorithmic communications network theory with over 30 peer-reviewed publications. He has implemented a toolkit capable of injecting predictive models into an active network. The toolkit has been downloaded and used by more than 600 institutions. He continues to explore novel concepts in complexity and algorithmic information theory and to refine the concepts and toolkit for applications ranging from network management and ad hoc networking to DNA sequence analyses for bioinformatics applications. Dr. Bush received the Achievement for Professional Initiative and Performance Award for his work as Technical Project Leader at GE Information Systems in the areas of network management and control, while working towards his Ph.D. degree at Case Western Reserve University and a Strobel Scholarship Award from the University of Kansas where he completed his Ph.D. research.

Sanjay Goel received the Ph.D. degree in mechanical engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1999. He is an Assistant Professor in the School of Business at the University at Albany, State University of New York (SUNY), Albany, NY. He is also the Director of Research at the Center for Information Forensics and Assurance, SUNY. Prior to joining SUNY, he worked at the General Electric Global Research Center. He has several publications in leading conferences and journals. He teaches several classes including computer networking and security, information security risk analysis, enterprise application development, database design, and Java language programming. His current research interests are self-organized systems for modeling of autonomous computer security systems using biological paradigms of immune systems, epidemiology, and RNA interference. He also actively works on distributed service-based computing, network security, and active networks. His research includes use of machine learning algorithms to develop self-learning adaptive optimization strategies and use of information theoretic approaches for classification of data for use in applications such as portfolio analysis and information assurance.

Suggest Documents