support monitoring, interpretation and control of complex dynamic network .... module may then determine appropriate graphical rendering (colored green if ..... (implemented as windows), and the mapping relation is used to navigate across.
Design of the Netmate Network Management System Alexander Dupuy, Soumitra Sengupta, Ouri Wolfson and Yechiam Yemini* Computer Science Department, Columbia University, New York, NY 10027
Abstract The Network management, analysis, and testing environment (Netmate) project addresses research and experimental issues in distributed network management of large, heterogeneous networks. This paper describes the Netmate architecture, and its model for network management information, which emphasizes the definition of generic network objects and relationships chosen specifically for efficient network management. The model, with its powerful abstraction mechanisms, simplifies the development of Netmate visual, analytic, and auxiliary tools for monitoring, analysis, planning, and testing of complex computer networks.
1. INTRODUCTION As computer networks become larger (with thousands of elements), more heterogeneous (supporting multiple protocols at many networking layers), and complex (with subtle interactions and inter-relationships between protocols), network management has emerged to be a crucial requirement. For the network to be the information highway for an enterprise, there must exist comprehensive management tools to ensure network integrity and smooth operation. The Network management, analysis, and testing environment (Netmate) project pursues, as its long term goal, development of fundamental enabling technologies and a comprehensive software environment for network and systems management. We focus on novel technologies and effective tools to support monitoring, interpretation and control of complex dynamic network behaviors. These tools are unified as parts of the Netmate environment (figure 1). Netmate includes software for agents capable of observation/control of networked devices, protocols, and systems (OCP); a Modeler that maintains a management information base (MIB) with a data model of complex networked entities, collects and provides dynamic real-time network data to management
*Research supported by DARPA contract #F-29601-87-C-0074 and N.Y. State CAT contract #NYSSTF CAT(89)-5.
analysis applications (Auxiliary Systems); a network simulation tool for planning and testing of protocols; and a visualization tool that provides human experts with navigation aids to simplify monitoring and control of complex network scenarios. In the Netmate architecture, we focus on experimental studies in order to develop a realistic understanding of the management issues and demonstrate the resulting technology concepts.
Network Simulation
OCP
OCP
Modeler Graphical Interface
Modeler
MIB
MIB
Auxiliary System
?? Figure 1: Netmate Architecture The fundamental issues investigated by Netmate include: 1. Understanding network dynamic behaviors: Why do networks fail? What are the mechanisms by which faults evolve, and how can they be observed and controlled? Modeling and analysis of dynamic (vs. equilibrium) and transient behaviors of networks along different lines of approaches (statistical, neural network-based, inference-based) are expected to develop analytical models to capture and analyze network fault scenarios. 2. Manager-Agent interaction protocol/architecture: How to build manageable networked systems? What language and protocol constructs should be used to specify and delegate management instructions from a manager to an agent? A decentralized management structure supporting a delegation protocol and development of generic agents are necessary for efficient manageability [2].
3. MIB organization: How can we collect/organize/present complex real-time network operational behavior and configuration data? This issue deals with the abstraction modeling of network information that is suitable for analysis and visualization. In this paper, we present the model proposed in Netmate with emphasis on the Netmate Structure for Management Information (SMI) as determined by the need for efficient network management functions. 4. Interpretation: How can we correlate/interpret observations of complex network behaviors to diagnose their causes and control their evolution? Due to inherent inconsistencies in the network data, an important requirement in analysis of network behavior is that of incremental diagnosis based on temporal and incomplete observations [10]. 5. Visualization: How to provide visual navigation and control of complex scenarios? The visual model of network data is primarily concerned with providing good visual abstraction mechanisms in order to reduce the complexity and the large volume of collected information. Section 2 presents the Netmate view on the requirements of network data modeling. These requirements decide the SMI for the network model in the Modeler component of Netmate, which is elaborated in Section 3. We discuss other components of Netmate in Section 4. The current status of the Netmate project and conclusions appear in Section 5.
2. MODELING NETWORK INFORMATION Modeling is necessary for creating a common framework for collection, storage and retrieval of network information. The inherent purpose of these operations is to be able to analyze the information (by human experts or by automated analysis modules) in order to ensure safe and efficient operations of the real network. The current trend is towards very large and interconnected networks. This implies that the information necessary to manage the network is very large. For example, network entities must be identified at all layers in order to manage network protocols, therefore, the number of such entities is several times larger than the number of physical devices. Furthermore, the management would need to access the dynamic network information repetitively, with the series of values over time available for detailed analysis. Real-life networks are also heterogeneous in all layers. This implies that there are many classes of information with different properties and requiring different treatment. For example, it is common for an enterprise to have both Ethernet and Token-Ring with protocols such as TCP/IP and DECnet on Ethernet and TCP/IP and SNA on Token-Ring operating concurrently.
Thus, the primary function of the modeler is to be a repository of both large volumes of static and dynamic network information about a large class of network entities and protocols. Furthermore, the model must support several levels of abstractions for the collected information in order to allow efficient analysis procedures. The efficiency, size, and heterogeneity requirements dictate that the abstractions be realized as the definition of objects and relationships in the MIB. A modeler has additional responsibilities regarding coordination of the management entities themselves. For example, an user interface may query the modeler for information, which, as a result, may schedule a query to the real network. When a reply is received from the network, the modeler must correlate that reply with the original query from the user interface. The analysis modules may specify a trigger so that when new information arrives from the network about a specific device, a specific management module function is invoked. There are many crucial issues in the management of the management protocols, which, due to lack of space, are not discussed in this paper. The objectives and functions of a generic network management system are well described in the current OSI model [5]. Many proprietary management systems, through experience, have also defined the requirements for their individual systems [7, 9]. In order to support a consistent nomenclature, a common MIB is proposed as part of the standards specifications. The two standard data models are the Internet SMI [6] and the OSI SMI [4]. While these two standards provide a common structure for management data for heterogeneous networks, they do not sufficiently address the manageability needs of these networks as discussed in the following section.
3. NETMATE SMI The central question in constructing the SMI is: What generic objects and generic relationships characterize the domain of network configurations and operations? These objects and relationships should abstract the common behaviors exhibited over a large number of network entities and protocols. Once such comprehensive objects are determined and implemented, it is then possible to construct common logical deductions that are uniformly applicable for different management functions. For example, a generic property called status may be associated for all network entities with a set of possible values including up and down. A user interface module may then determine appropriate graphical rendering (colored green if up, red if down) based upon this property regardless of other specific properties of any object. A transition from up to down state may automatically trigger an analysis module associated with the object. This property, common to all network entities, permits generic inferences to be specified and carried out.
The generic object and relationship definitions should be capable of supporting a rich set of functions which are directly related to efficient network management. Determining such objects and relationships requires experimental dissemination of networking protocols and their relation to the existing, common-place management practices. It is, however, impractical to prescribe a closed set of definitions which do not allow future modifications to the SMI. In fact, extensibility of the SMI is an important requirement to model current network entities, most of which have additional specific properties. The basic idea of the Internet and ISO SMI standards is to allow a mechanism for the definition and naming of variables containing management information (essentially name-value pairs). Additional structuring is provided in the form of tables of variables which can be defined (although the Internet SMI doesn’t support nested tables). The ISO SMI has an object-oriented model, with variables specified as attributes of objects, an inheritance mechanism for defining object classes, as well as a single relationship between objects: containment.
3.1 Network Model In discussing the Netmate SMI, we will use an example network (figure 2) to examine a number of network management scenarios and how the model would support management operations. The example shows a link layer network of an Ethernet (E) with two nodes (A and B) and a Token-Ring (T) with one node (D). E and T are connected by a bridge (C), which is also connected to a serial line (S). In the physical layer, E is divided into two segments (E’ and E’’) joined by a repeater R. In the application layer, an application client (D’) communicates with a server (B’). The Ethernet and the nodes connected to it belong to corporation G; the Token-Ring and the node connected to it belong to corporation H. Using this simple network as an example, we will elaborate a few non-trivial problems associated with network modeling, and present Netmate solutions with qualitative comparisons of the Netmate model with other standard models. Netmate uses the object-oriented paradigm [3] to represent network objects, data and relationships. The generic objects in this paradigm are class and object, which is used to represent all entities. The generic relationships are is-a and is-a-kind-of relationships, and the latter, applicable on the class objects, constructs the usual class hierarchy. The class hierarchy is a proper beginning to model networks; however, by itself, it does not succinctly address all generic relationships in the network domain. In the following, we present the our model with careful distinctions and examples. Netmate SMI (figure 3) currently defines four important object classes: Layer, Node, Link, and Group; and five relationships: is-in-layer, is-connected-to, is-member-of, is-part-of, and is-implemented-in-terms-of. The class Group, and relationships is-part-of and is-member-of have applicability in other domains such as real-time process control.
Physical
Link
Application
H D T
A
D'
E' C E
R E''
S
B G
B'
Figure 2: Example Network Layer. An instance of the Layer class represents a network protocol layer with clear functional distinctions and operational boundaries (e.g. the ISO Reference Model for networks). It is a generic class because almost all network protocols are implemented in layers. Netmate does not limit itself to any specific layering scheme; instead, it allows layers to be defined for any consistent set of objects that communicate using a common protocol. For example, an Internet network may have TCP, IP, and Link layers, and may coexist with SNA network which has SNA-LLC and SNA-Session layers. The network in the figure 2 is operating at three layers: application, link, and physical. Node. An instance of the Node class, within a specific layer, represents a hardware, firmware, or software element (e.g. Ethernet interface in link layer, IP Gateway in IP layer, LU6.2 node in SNA-Session layer) which obeys the protocol rules in that layer. Nodes have the subjective semantics of being endpoints of communication. In figure 2, nodes A, B, C and D are in the link layer, R is in the physical layer, and B’ and D’ are in the application layer. Link. An instance of the Link class, within a specific layer, represents the communication between nodes within the same layer, where the communication is governed by the protocol rules in that layer (e.g. Ethernet in link layer, Telnet session in application layer, SNA conversation in SNA-Application layer). E and T in figure 2 are examples of links. Unlike the Internet and ISO SMIs, links are given equal importance to nodes in Netmate, even when information about links is available only through nodes. This is so because the model distinguishes the different properties between nodes and links, and makes the distinction explicit through class definitions. For example, in Token-Ring Source Routing protocol, max_packet_size is a property of a connection (i.e., the link), and not of any specific node on that connection.
Network Object Name
Layer
Element Layer Groups Mappings
Elements
Simple
Group Elements
Components
Node Connections
Token Ring Interface Adapter Addr Nearest Upstr
Link Connections
TCP Virt. Circuit Conn State Loc Port Rem Port
Figure 3: Network Model Class Hierarchy Group. An instance of the Group class, within a specific layer, is a collection of nodes and links (and other groups) which exhibit common operational semantics (e.g. Ethernet objects in NY office, Objects on regional T-1 network, All NFS servers). In figure 2, the corporations G and H form examples of groups. Groups serve many purposes: they may be used for various administrative purposes such as common maintenance contracts, for constructing a set of objects to investigate during a fault analysis period, for constructing a buildingby-building network picture on the user interface, etc. In Internet and ISO SMIs, all network entities are simply objects. The Netmate SMI enhances the classification hierarchy as described above, and then by similarly enhancing the relationships, permits efficient management analysis, as described next. is-in-layer. A node, a link, or a group may belong to a single layer, which is represented by this relation. Using this relation, a common layer-specific behavior may be prescribed for the nodes and links, and some other behaviors disallowed. For example, it is a priori possible to assert that the link layer node A in figure 2 may not communicate with the physical layer node R using the
Ethernet (link layer) protocols. Such assertions may be assumed by analysis modules, for which the is-in-layer relation provides the information which nodes, links, and groups belong to which layer. This relationship is realized in Netmate by the Layer property in Element class (which is inherited by Group, Node, and Link), and its inverse, the Elements property in Layer class (figure 3). is-connected-to. A node connects to a link in the same layer. This is a very basic relationship in a network. A node (link) may connect to more than one link (node). The Connections property of Node and Link classes represent this relation. The Internet and ISO SMIs do not support direct objects for links, and they do not keep an explicit representation of the is-connected-to relationship in their model. Consider a network problem (figure 2) where B’ is unable to communicate with D’. An analysis process may attempt to construct all possible connection paths from B to D, and test components in each path to determine the problem. The program may discover that these nodes are on different types of links, and then attempt to find a bridge (among all bridges in the network) that may have both nodes in its forwarding table property. Using only the information from the nodes, it will have to algorithmically deduce which Ethernet and Token-Ring segments are part of the connection between B and D to construct the connection paths. This process becomes more complex if the nodes are separated by more than one bridge. Also note that in absence of the connections property, the analysis program has to understand individual nodes’ vendor- and protocol-specific properties to construct the paths. The process is simplified in Netmate SMI. The Connections property in the model is maintained as nodes get connected to the links in the network. Thus, the relation, at the beginning of the analysis, has the information that E is connected to A, B and C, and T to C and D. It is then fairly easy to use a graphtopological algorithm to construct the connection path B-E-C-T-D before testing any individual component in the network. Also note that with this information, the path searching algorithm need only look at the Connections property in the MIB, and not any protocol-specific properties of the nodes. The same information may be used to create a display of the network as shown in figure 2. is-member-of. This relation represents the collection of groups, nodes and links into a group, and is realized by the Groups property of the Element class and the Elements property of the Group class. An element may belong to many groups. Semantically, deletion of a group does not always imply deletion of its members, but only the deletion of the membership relationship, implying the basic independence of element and group objects. Consider the case when many nodes in corporation H are using the application service at B’, and C is inoperative. When invoked as a result of D to B communication failure, if the analysis process determines the cause, it need not perform the analysis again for any other node in H. This correlation is achieved
by grouping all nodes in H under a common criterion: the nodes on the TokenRing. In absence of groups, as in the standard models, the analysis will have to be performed for each such D-B pair. is-part-of. This relation represents the collection of (sub)nodes into a node, or of (sub)links into a link, or of (sub)layers into a layer, and is realized by the Components property of the corresponding classes (figure 3). In contrast with groups, the existence of subnodes are entirely conditional upon the existence of the containing node, i.e., subnodes may not exist independent of the containing node. Furthermore the values of the properties of the containing node are usually aggregations over the values of the properties of its subnodes. Consider in the example that the communication problem occurs at C due to unavailability of memory buffers for the bridging of E and T. Having detected the problem at C, the analysis process needs to examine the traffic behavior on all individual interfaces in C, including the one to S, since sum-total use of buffers at C depends on all three interfaces. Using the Components property, it derives that C has three subnode components, each with an independent set of properties contributing to the sum-total behavior at C. Also note that if the interface to S fails, C may continue to bridge between E and T, but if C fails, none of the interfaces may function. The ISO SMI supports this relationship. is-implemented-in-terms-of. This relation represents the well-understood notion that elements in one layer use the services of elements in other layers, and therefore, are functionally dependent on the well-being of elements in the other layers. The relation is realized by the Mappings property of elements (figure 3). Assume that in the example network, E is found to be faulty. The analysis process, upon querying A, determines that there exists a path from A to D. The problem thus is in the segment that connects B, or at B itself. To investigate further, the process uses the Mappings property of E to identify the physical layer elements (E’’ and R) which must be operational for E to function properly. The process then checks the hubs to detect cable faults and R. An important observation is that this investigation of the physical layer is not warranted until the fault has been isolated to the specific link layer object. In general, the fault diagnosis process may function within a single layer, identifying fault domains using a layer-specific tool, then use the mappings to investigate other layers using perhaps a totally separate set of tools. Due to absence of such a functional-dependency relation in the Internet and ISO SMIs, the process must construct its own dependency information for each individual problem. In addition to the support of generic object classes and relationships, Netmate SMI is extensible. It allows definition of specific properties for specific layers, nodes, links, and groups, taking advantage of the is-a relationship extensions to defined object classes. Indeed, without such definitions, no specific information may be stored about individual network entities. Figure 3 shows the specific
classes such as Token Ring Interface which has specific properties such as Adapter Address and Nearest Upstream Adapter Address, and the subnode of C in figure 2 connecting to T is an instance of this class. A significant number of network queries (model accesses) are likely to access the fundamental relations mentioned above. Thus, if the model fails to provide sufficiently rich set of these fundamental relations, it would mean that applications will incur significant unnecessary cost of constructing these relations. The Internet and ISO SMI models lack such relationships, which are essential for efficient, automated network problem analysis.
4. MODEL DRIVEN TOOLS Each client of the Modeler services is considered a tool, capable of existing independent of the modeler. This is a design consideration to develop a modular management system. Nevertheless, since the management information ultimately resides in the Modeler, the tools are driven by the need to access and modify that information. User Interface. The function of the User Interface is to construct a visual model of the network information and provide graphical techniques for navigating the model. It mirrors the abstractions realized in the modeler and converts them into graphical forms. The network layers are depicted as separate views (implemented as windows), and the mapping relation is used to navigate across views. The groups are represented as rectangular boxes containing the elements in the group, and may be iconized to reduce clutter on the screen. The connection relation is depicted as lines between node and link icons. A screen display of an SNA and an Token-Ring network at Columbia-Presbyterian Medical Center generated by the Netmate User Interface appears in figure 4. Observation and Control Point. OCPs, or proxy agents, are the primary source of network information for the Modeler. As such, their design is critical for issues of heterogeneity and efficiency. One of their most important roles is to perform data reduction on the information which will be stored in the Modeler database. For example, status information which is basically static does not need to be updated in the database unless its value changes; additionally, for variable information, only changes beyond some threshold may need to be reported. Another useful transformation which can be done is to convert polling of devices into alerts, and vice-versa. The OCP-Modeler communication is explained in [2] in detail. Simulation Tool. In order to support network planning and testing functions, Netmate supports a network simulation tool. An obvious use of a simulator in a network management system is to provide the capability for ‘‘what-if’’ scenarios; for example: take a snapshot of the current network state from the Modeler, load it into the simulator, alter the network configuration, and observe the effects in
Figure 4: Example of User Interface the simulated network. From a network protocol research perspective, integration of the simulation with a management system is also useful; real network information such as packet counts and rates, traffic levels and loads become readily available. Auxiliary Systems. A complete management system should support networkrelated auxiliary functions, and Netmate maintains information which is useful to many other systems. The network configuration database can easily be used for Inventory Control, identifying specific hardware and software systems in the network. Similarly, if vendor and price information is added, such a comprehensive system may be used by the Purchasing department. Netmate design also includes a Trouble Ticketing system, which is used for keeping track of reported network problems.
5. CONCLUSIONS Currently, a prototype Netmate system has been developed at Columbia, with a functional Modeler, User Interface, and OCP. The Modeler is implemented using an object-oriented database; the User Interface is an X window system application written in C++. One OCP has been developed, with a generic SNMPbased interface. A Modeler with object-oriented structures based on a relational
database and a new OCP, written in C++, which will provide the data transformation and reduction capabilities mentioned above, are under development. The Nest network simulation tool developed at Columbia [1] will be used as the basis for the simulation component. Netmate addresses management needs of network administrators by supporting a comprehensive set of distributed tools. The object-oriented Netmate data model is well suited to accommodate current as well as future heterogeneous networks and protocols. The model features generic object classes and relationships that are crucial for efficient network management for large and complex networks. The modularity of Netmate system lends toward an elegant and efficient network management solution.
6. REFERENCES 1. A. Dupuy, J. Schwartz, Y. Yemini, D. Bacon. "NEST: a Network Simulation and Prototyping Testbed". Comm. ACM 33, 10 (October 1990), 63-74. 2. Y. Yemini, G. Goldszmidt, S. Yemini. How to Build Manageable Systems: The Manager-Agent Delegation (MAD) Model. Proc. of the IFIP TC6/WG 6.6 Second Inter. Symp. on Integrated Network Management, Washington, DC, April, 1991. 3. R. Gupta and E. Horowitz (Ed.) Object Oriented Databases with Applications to CASE, Networks, and VLSI CAD. Prentice-Hall, Englewood Cliffs, NJ, 1991. 4. International Organization for Standards. Information Processing Systems - Open Systems Interconnection - Structure of Management Information. International Organization for Standards, 1990. 5. S. M. Klerer. "The OSI management Architecture: an Overview". IEEE Network 2, 2 (March 1988), 20-29. 6. M.T. Rose and K. McCloghrie. Structure and identification of management information for TCP/IP-based internets. Network Information Center, SRI International, Menlo Park, CA, May, 1990. 7. D.B. Rose and J.E. Munn. "SNA Network Management Directions". IBM Syst. J. 27, 1 (1988), 3-14. 8. S. Sengupta, A. Dupuy, J. Schwartz, Y. Yemini. An Object-Oriented Model for Network Management. In Object Oriented Databases with Applications to CASE, Networks, and VLSI CAD, Prentice-Hall, Englewood Cliffs, NJ, 1991. 9. M. Sylor. "Managing Phase V DECnet Networks: the Entity Model". IEEE Network 2, 2 (March 1988), 30-36. 10. O. Wolfson, S. Sengupta, Y. Yemini. Active Databases For Communication Network Management. Submitted to SIGMOD 1991.