Multi-dimensional Hashing for Fast Network Information Processing in SDN Min Luo*, Xiaorong Wu+, Yulong Zeng+, Jianfei Li+, Ke Lin+, ManBo+, Wu Chou* *Shannon Lab, Huawei Technologies, Inc., Santa Clara, USA + Shannon Lab, Huawei Technologies, Inc., Shenzhen, China Corresponding Author Email:
[email protected] Abstract—To realize the potential benefits of SDN, tremendous amount of network information has to be collected, stored, processed and retrieved for the central routing and resource allocation optimization in near real-time. However, the current SDN-OpenFlow controller and management applications did not address this issue properly and could not easily and efficiently retrieve and utilize the available analytics on-time. Their “network information bases” are rudimentary, and could not facilitate the required fast processing and accessing for required intelligent information for large networks. We present an innovative mechanism that architects such information into a multi-dimensional model, utilizing prevailing technologies in large scale real-time data processing, the advances in in-memory SQL and NoSQL and the Online Analytical Processing (OLAP). Data and their hidden characteristics and inter-relationships can then be stored inmemory or cached using key/values as in NoSQL, while more static information could be persisted into local file system or SQL database, and globally in a centralized SQL database (or a data warehouse) that can store large amounts of historical data that can be easily processed with OLAP or data mining for intelligence. Extensive experiments and comparison with traditional relational database or in-memory hashing table clearly demonstrated the feasibility, capability and performance advantage of the proposed approach. Keywords: Software defined networking, in-memory computing, Online Analytical Processing, Multi-dimensional model, Distributed Hashing, Network Analytics, Routing, and Resource Allocation.
I.
INTRODUCTION
By decoupling the control and forwarding plane, software defined networking (SDN [1]) makes it possible to centrally control a large scale network intelligently, resulting in significantly improved resource utilization, reliability and reduced management complexity. However, such centralized control and management capability would require new algorithms and supporting information that can fully make use of the available global network topology, state, and more importantly the application or traffic patterns or user behaviors related to type, locality, and time etc. Tremendous amounts of network information has to be collected, stored and efficiently processed in order to get all the necessary and also available, but sometimes hidden characteristics or features that can be used to guide the routing and resource scheduling decisions. However, except the topological information, most of the required information are dynamic, and could change frequently over time. There is lack of research on how to architect and organize such information so that they could easily and efficiently provide the necessary analytics and intelligence in the current SDN-
OpenFlow controllers or management applications. Almost all current designs and implementations of the “network information based (NIB)” are rudimentary, thus could not facilitate fast processing and intelligence acquisition as required, a critical issue for large scale networks. Adaptive and Dynamic Multi-path Computation Framework (ADMPCF) [2] is a newly publicized approach that can improve resource utilization, mitigate congestion with better balanced load while observing all practical constraints and guaranteeing QoS. However, such a framework needs to provide efficient and effective information and network analytics that are required for optimization, not only for prototypical studies, but for largescale networks (500-1000 or more nodes) in near real-time, and should be able to provide rapid responses of multiple concurrent requests. Though centralized multi-path routing approach has been receiving lots of attention, and numerous researches have been reported [3, 4, 5], they are still primitive and lack of the required solution scalability, and proven applicability and quality needed for real world. Most of the algorithms proposed so far only consider traditional traffic flows as defined by the “famous” 5-tuples (MACs and IP addresses for both source and destination), and constrained only by the available bandwidths. Some precalculates the optimal paths, or provision a few typical set of optimal paths for major traffic pattern shifts or changes. The rudimentary data structure used in those algorithms also could consume large amounts of CPU and memory resources, insufficient for large-scale networks. As large scale networks are complex, optimization even over some simplified scenarios are proven to be NP-hard, and would lead to poor performance [6, 7, 8, 9]. Facing the above dilemma, some researchers proposed path cache and search method [10] but still only used previously mentioned basic information. In addition, they all used either simple inmemory data models or relational databases for NIBs to store and retrieve necessary but basic information. To support near real-time routing and resource scheduling for a large scale network, the traditional relational databases [12] (such as Oracle, DB2, MySQL, etc.) would not work well, since the normalized data models would introduce inefficiency for acquiring data from multiple tables and then join the resulting data sets. In addition, they could not support the required multi-dimensional (client, application, locality, mobility, time, network state, etc.) data analytics or mining that are needed to support truly intelligent and adaptive decision making. The inevitable result of preparing and processing those multi-dimensional data would result in the establishment and use of a large number of data tables, expensive indexing and joins that need to be performed on
hefty datasets. The multi-dimensional data, if stored in the 2dimensional relation model, will not only cause massive increase in storage space, it will also lead to the significant degradation of the system performance [13]. Many existing controllers use the key/value storage structure directly, similar to NoSQL [14], but it is mostly suitable for storing and retrieving unstructured data, good only for simple data manipulation, and does not support the required multidimensional data association, storage and searches [15]. In this paper, we present an innovative mechanism that organize those information in a layered architecture, utilizing prevailing technologies in large scale real-time data processing, and also the development in SQL, NoSQL and Distributed Hashing Table (DHT). With such an architecture, data needed to support real-time processing will be stored inmemory or cached using key/values as in NoSQL, while more static information could be persisted into a local file system or a SQL database, and globally in a centralized SQL database (or a data warehouse) that can store large amounts of historical data with related patterns and other analytics. In order to facilitate fast processing, we also utilized the multidimensional models with required association properties to bind many aspects of that information together. Following the online analytical processing (OLAP), we established a set of dimensions (such as tenant, application, QOS, Priority, etc.) hierarchically, then created “route facts” and measures (the so called “scores”) that effectively tie those key/value pairs together. This approach enables us to conduct typical dimensional analysis, such as feature discovery, slicing and dicing that truly enable fast processing and retrieval. With such a multi-dimensional approach, sorting, searching and slicing and dicing become very easy, based on certain specified or required criteria or rules. Globally unique identifiers are assigned to the network information with a hash algorithm that was used to store and retrieve the dimensional information with all the associations in the hash table. In this way, we can correlate network information, especially various performance indicators, with many factors through the set operations such as union, intersection, difference, and use such information from different dimensions and multiple perspectives. It incorporates a flexible management mechanism for dimensions, and enables adding/deleting dimensions or dimension composition dynamically without modifying the structure or the contents of the original data. It makes data modeling easier, and greatly reduces the time for complex processing. The rest of the paper is organized as follows. In section 2, we will briefly review how existing controllers designed and implemented the NIB. Section 3 will cover the adaptive and dynamic multi-path computation framework (ADMPCF) [2]. Section 4 discusses the main concepts and models of the proposed MDDHT approach. Section 5 will present some experiments and comparisons against basic DHT or traditional relational data base approaches. Finally, section 6 will summarize our findings and give some future improvement and research directions.
II.
REVIEW OF RELATED TECHNOLOGIES
In this section, we will review several related technologies, including controllers and their data models, and online analytical processing. A. Review of Existing Controllers and their Data Models SDN-OpenFlow controllers, such as NOX [16], Floodlight [17], Maestro [18],[19], OpenDaylight [20] and ONIX [21], all utilize different mechanisms to manage the network information base, and provide ways to access the network information such as topology, link or port statistics. Though these controllers have their own merits and can be used in many experimental academic or industrial environments, they all have serious limitations in storing, processing and querying different network information quickly. In general, they are not well equipped to exploit the hidden patterns/characteristics in the data that enabled adaptive optimization of routing and resource allocation, especially not for large scale networks. NOX [16] does not store the network state in a single database, rather there are a few components that are responsible for creating the network view, and making it available to other components, such as Discovery, Topology, and Authenticator. APIs are provided for the interaction between those components and the network view. To make the network view consistent and available to all controller instances, it requires expensive writing operations for the required consistency. NOX only stores the slowly changed and static information to achieve the desirable performance. Such information mainly includes the switch-level topology, the locations of users, hosts, middle-boxes, and other network elements and services. It does not include the current state of the network, for example the congestion state that could change frequently due to fluctuations in traffic flows. Even when this dynamic information is obtained by extending NOX, there still exist no mechanisms to quickly retrieve data and exploit the valuable hidden patterns in network state or traffic flows. ONIX [21] is a distributed network control platform. A copy of the network state tracked by ONIX is stored in NIB as an expanded graph of all network entities. It offers two data stores, a replicated transactional database and a memorybased one-hop DHT that is used for volatile states - more tolerant of inconsistencies. However, at its most generic level, the NIB holds a collection of network entities, each of which holds a set of key-value pairs and is identified by a flat, 128-bit, global identifier. These network entities are the base structure from which all types are derived. Apart from soft-state triggers, the NIB only provides get/put APIs. These simple APIs can only be applicable for simple data manipulation, and makes it difficult to process complex relationship information. Floodlight [17] does not use any database to store the network information either, it just stores information using the TreeMap data structure (in java), and provides some APIs to programmatically interact with the data in the TreeMap. Therefore, it can provide similar key/value based information storage and retrieval mechanism as with NoSQL. However, these APIs can only be applicable for
simple data manipulation for a single table such as insert, delete, update or query, and it cannot process or store relationship or association information among various tables. Such required operations could cost much more. For example, to query all port information of inactive switches, one needs to get all IDs for all inactive switches from the SWITCH table first, and then query the PORT table with that identification to obtain resulting port information. Thus, it is difficult to specify and process the complex relationships between paths and various associated factors effectively and efficiently using this TreeMap data structure. Maestro [19] provides the view abstraction for grouping related network state into a subset, and for accessing the state in that subset. Each view is a Java class that can be dynamically defined and created by programmers. A view allows use of arbitrary data structure to represent a particular subset of network state. All view instances consist of a serial number, a table structure for holding the complete data of this view instance, and an optional list of updates that have been applied to this view instance. Maestro also provides a “delta-view”, while if an application can remember the serial number N of a view instance that it used last time, Maestro can generate the delta information based on current serial number M and N using the list of updates. But maestro does not keep updating entry forever, so Maestro will remove old update entries when necessary [19]. If the stored serial number M of one application is so old that the related update entries have been removed, it has to use the complete view to do computations. Maestro does not provide a fine-grained attribute-value lookup and insertion interface. When it comes to multi-path routing, various features and indicators need to be dynamically combined. As various features change continually, it may cause numerous updates of entries, resulting in unacceptable overhead. OpenDaylight [20], as the most well publicized open source controller, stores all network information such as topology, switch, port, and basic statistics in a clustered key-value cache structure Map. It also provides clusterContainerService to manage those caches. Currently it uses an open source Infinispan for the NIB. Infinispan is a data grid platform with basic node level synchronization. Its basic structure is inherited from java.util.Map with such simple operations as addition, deletion and search. Using the simple MAP for data storage, OpenDaylight supports basic storage and retrieval of basic routing information, but it could not support more complex data manipulation. In addition, it could only consider bandwidth and hop counts to find shortest paths, and could not support the fast multidimensional data association or analytics required for more sophisticated multi-path routing utilizing other critical factors such as tenant, application, locality, time, QoS, and dynamic network state updates, etc. B. Online Analytical Processing (OLAP) OLAP has become the foundation for intelligent solutions including business performance management, planning, budgeting, forecasting, reporting, and knowledge discovery.
Most of business activities are run on decisions based on multiple variables. To analyze and report on the health of a business and plan future improvements, many variables or parameters must be grouped and tracked on a continuous basis, and these variables or parameters are the diverse “dimensions” in OLAP. For example, sales in units or dollars may be tracked over time: by year/quarter/month, etc; over location: by Country/Region/City, etc; over products: Category/Model, etc. Sales “measures” might logically be aggregated and displayed along those dimensions or their combinations. Specific to data networks, selecting a path from the source to a destination while satisfying a number of resource and QoS constraints could also be considered as such a business activity, where we use metrics such as “endto-end delay”, “resource utilization”, “congestion” to measure the effectiveness and efficiency of routing and resource allocation algorithms under the ever changing operating environments. Unlike relational databases, OLAP does not store individual transaction records in a two-dimensional, row-bycolumn format. It uses “cubes”, a multi-dimensional database structure, to store arrays of consolidated information. The data and formulas are stored in an optimized, pre-indexed multi-dimensional structure, while different views of the data and the calculated “measures” can then be created and presented on demand. OLAP can perform multi-dimensional analysis of business data and provide the capability for complex calculations, trend analysis, and sophisticated data modeling. For example, it can be used for analyzing and discerning the hidden patterns or relationships between data items by applying analytical operations (such as roll-up, drill-down, slicing and dicing) in a multi-dimensional cube. It enables business users and systems to gain better business insight and understanding that eventually lead to better decision making. In SDN, the centralized management applications, with the support of the controller, need to make proper routing and resource allocation decisions subject to various constraints (e.g. delay, bandwidth) and real-time network states (e.g. link utilization, available node, link capacity, bandwidth). Mathematically, such problems are all proven to be NP-hard [22]. [2] proposed an adaptive and dynamic multi-path computation framework (ADMPCF). In this paper, by taking advantages of OLAP, we propose a multi-dimensional distributed hash table model (MDDHT) to enable fast data processing, querying and intelligence gathering to enable ADMPCF. Even though MDDHT was designed to support the distributed control, this paper will only discuss its use in a single server or a cluster of servers acting as a centralized controller [23, 24]. III.
OVERVIEW OF THE ADAPTIVE MULTI-MATH COMPUTATION FRAMEWORK
The Adaptive Dynamic Multi-Path Computation Framework (ADMPCF) is to provide an integrated resource
control and management platform with an adequate set of applications integrated for better routing and resource allocation in centrally controlled or loosely coupled distributed software defined networking, especially for large network systems [2]. As depicted in Figure 1, it was designed as an open and easily extensible solution framework that can provide the necessary infrastructure and a set of complementary algorithms for data collection, analysis, and optimization that enables adaptive solutions when the network topology, states, and more critically application traffic change rapidly.
Figure 1. Adaptive dynamic multi-path computation framework
One of the foundational components in ADMPCF is SNOS_DW. It is to store historical tenant, application, flow, or packet level information, and the various network-wise characteristics that could be extracted. In addition, the PathDB cube is used to keep a historical perspective on various paths used or available under changing operating conditions. A small portion of the PathDB is synchronized with the “real-time” path DB in the Controller component. At the initialization phase, the Path & Resource Optimization component (PRO) will pre-compute multiple paths and allocate resources for the projected traffic, comprehensively utilizing the traffic characteristics, network topology, resources (bandwidths), and states. Those paths, together with their dimensional information will all be saved into the MDDHT cube. In order to make ADMPCF adaptive to various changes in the network, some of the algorithms will be run in parallel in order to continuously find better paths, reassess consequences when certain resources get assigned or reassigned, and trigger necessary re-optimization based on the global network information, and then refresh the results back into the real-time path cube. But the key concept behind ADMPCF is to find good paths that satisfy valid constraints from the PathDB, without invoking the time-consuming optimization algorithms. This is especially critical for managing large networks with thousands of nodes for carrier networks or tens of thousands of nodes for large enterprise data centers. In general, when ADMPCF receives a new flow
forwarding request, it will first identify all relevant information, then queries the MDDHT cube for existing “good” route(s) while satisfying all the constraints. Only when it fails to find a proper one, or the overall system performance metric starts to derivate from the projection by a certain threshold, it would then trigger the re-optimization process. The Controller will convert the identified “good” path to flow entry tables for all the nodes in the selected route. IV.
THE MDDHT APPROACH
We propose an in-memory multi-dimensional distributed hashing table (MDDHT) mechanism for storing, processing and querying of flow, path and their characteristics associated with various decision factors such as tenants, applications, time, locality, mobility, QoS requirements, priority, network topology and status (herein all of them are referred in general as “network information”). All those factors can play a significant role in better routing and resource allocation decisions, their time varying and mobility characteristics can help the system adaptively adjust such decisions to optimize resource utilization, even to help reduce unnecessary usage of some routes and devices for green computing or networking. Following the OLAP principles, we organize all the information in a hierarchical and dimensional structure. We create “route facts” with characteristics across those factors, and can then aggregate or drill down for detailed analysis in order to reveal hidden patterns with some of those factors and their combinations that can further be leveraged in the routing and resource scheduling optimization algorithms. In this way, the MDDHT framework can be very flexible and enables the use of network information from different dimensions with multiple perspectives effectively. It can also provide very fast response for retrieving such associated information, switch among various analytical perspectives dynamically, and analyze over multiple perspectives comprehensively. Once a MDDHT cube is established, ADMPCF can get “good” (or even optimal) forwarding routes satisfying different constraints just by matching various characteristics stored as dimensions in the cube, and most of the time without the need to invoke the complex and time-consuming algorithms. In this way, the overall response time will be decreased substantially. With the effective use of available multi-paths, the proposed ADMPCF with MDDHT inherently supports network load balancing, and achieves higher resource utilization, QoS guarantee with much improved manageability of the expensive network resources and better user experience. A. In-Memory Multi-dimensional DHT Information Model As illustrated in Figure 1, there are two major components in such a multi-dimensional model, namely facts and dimensions, where in MDDHT, the path represents the fact and various factors such as tenants, applications, time, and locality are the dimensions. By associating the causative factors and their values with the fact, we can view the “path” fact from
different perspectives, e.g., Tenant T’s path and application A’s path, or from the combined perspective of T and A .
linkB, linkC, linkD and their corresponding identifier were added.
Figure 1. In-memory multi-dimensional information model
Initially from the network planning phase, a substantial set of path entries can be established by the path precomputation process, as described in [2]. Each path entry is assigned a globally unique identifier by applying a hash algorithm. We extend key/value pairs by adding a measure attribute, named “score”, to store the path weight which was defined and preprocessed, and we can sort path according to these scores. We also use this (key, value, score) triplet to represents the name of dimension, the value of dimension and corresponding path identifier respectively. Thus, path information can be easily associated with a dimension just by adding an entry containing the path identifier to the related dimension table. Consider a case including the tenant, application, and priority dimensions, as shown in Fig. 3. The left part shows the dimensional information, and the right shows the paths that are related to the dimensional information, and the connected lines in the middle represent the relationship between the above two. Specifically, paths applicable for tenant A include L1, L6、L7and L10 with link list List, List, List, List respectively. Path L1 can be utilized by Tenant A, for voice applications or other high priority applications.
Figure 2. Data structure of multi-dimensional model
B. Application of MDDHT Such a data structure provides a flexible management mechanism for dimensions, enables easy addition and deletion of dimension or dimension (re)-composition dynamically without modifying the original data. Consider adding the linkID dimension based on data structure shown in Figure 2, the only work needed is to add (key, value, score) to save the value of linkID and corresponding identifier of the path, as depicted in Figure 3, where linkA,
Figure 3 Add LinkID as dimension
This mechanism can sort every dimension based on certain required criteria and rules, and automatically reorder each dimension with any change of the selected dimension value. Thus querying paths with certain range of a dimension becomes very easy. Take the priority dimension as an example: MDDHT firstly sorts the priority information in descending order (see Figure 4), and gets all the path identifier sets in the range of priority (low to medium, for example), then union the path identifier sets. In this way, all the paths corresponding to medium and low priority can be obtained quickly, then we can get the optimal path L7, according to the path weight.
Figure 4 Examples of query according to range of priority
Once the flow, path and various factors are structured with this data model, handling search request with certain factors is also very easy and fast. Take the LinkID for example: first we can get the path identifiers L7, L9, L10 corresponding to the dimension name and value (linkB), then we can get all the paths related to linkB, based on the established relationships between forwarding path and the identifier, namely List, List< links9>, List< links10> (see Figure 5).
continually consumes the allocated bandwidths, in order to quickly drive the network with sufficient traffic load. The simulated traffic requests from this matrix were then taken one by one or by blocks that were “pushed” into the network. Figure 5 Query Path with LinkID
Next, we can perform set operations such as union, intersection, difference on different dimensions with specific dimension value, so the resulting paths can meet all the requirements concerning various factors and their combinations. Let’s take four dimensions (Tenant, Application, Priority, and LinkID) as an example to illustrate how to use network information from different dimensions and multiple perspectives. The process is shown in Figure 6, with only partial data. We assume Tenant A could possibly use paths L1, L6, L7, L10, paths L4, L6, L7, L10 correspond to video application, path L6, L7, L9 are used for high priority traffic, and path L7, L 9, L10 uses linkB. As shown in Figure 6, we first get all the path identifier sets according to the values of the dimensions, and union or intersection operation on the result sets and eventually get all the target path or paths that meet satisfying all the requirements related to those four dimensions.
A. Simulated Experiments for a 512 Nodes Network With the 512 nodes network and the randomly generated traffic matrices, a series of experiments were designed to study how MDDHT would perform. The user interface of the prototype is shown in Figure 7.
Figure 7 The User Interface of the Prototype
Figure 6 Finding paths in four dimensions
C. MDDHT in a Clustered SDN-OpenFlow Controller As presented in [24], MDDHT was an integral part of the SmartNet Openflow Controller (SOX) [23, 24]. A well known critical issue or for the fast adoption of recent SDN developments lies with the single point failure and performance bottleneck inherent in a centralized controller. As discussed in [23, 24], SOX was designed to first provide highest possible performance, in terms of packet-ins per second on a single server [24], then extended to a cluster of equal mode servers for much needed fail-over and load balancing capabilities. MDDHT naturally fits well with such a controller cluster. V.
PERFORMANCE EVALUATION
Extensive performance evaluation for MDDHT has been conducted. In this section, we will present some interesting result and comparisons. First, we evaluate the performance of the multi-dimensional mechanism compared with the traditional relational databases and basic DHT (key, value), tested with a topology of 512 simulated nodes [2]. We randomly generated a traffic matrix of 9610 requests with such attributes as bandwidth requirements, QoS and priority. We snormally would also specify the duration of each requests. To simplify, here we let each new request
Specifically, the experiments were conducted as follows: • With any given traffic matrix, pre-calculate all “good” paths and save them to the pathDB; • Sequentially push new requests from the given traffic matrix; we can also push requests in parallel. But to make it easy to assess the consequences of utilizing ADMPCF and MDDHT, sequential pushing enables easy observation from the above user interface and key performance indicators. • Search paths from the pathDB in MDDHT. • If there is a path match, use it; otherwise call the multi-path algorithm(s) in ADMPCF instead. • Calculate the end-to-end time from when a request gets injected to when a path is established or a failure notification was received. With this UI, we can push flow requests from the traffic matrix one by one or by groups, or manually specify a flow with certain parameters. While those flows get injected into the network, links would get more utilized, and the color coded link utilization would tell the number of links in a certain utilization range from the UI. Underneath that, some key performance indicator, such as average link utilization, percent and number of rejected requests were updated and displayed continuously. At the bottom, links with top or bottom utilization were also displayed for management review and action. All the tests were conducted 10 times, and the average performance was reported. The results verified and confirmed the ability and effectiveness of the proposed MDDHT. B. Comparison with Other Apporaches We build a simple multi-dimensional model with three dimensions using the proposed mechanism as shown in
Figure 8. We then conduct performance evaluation for MDDHT, in-memory relational database using H2[25] and basic in-memory DHT (key, value) structure for the same scenario.
Figure 8 A simple multi-dimensional model for performance evaluation
In the experiment, we used the two scenarios as described in Table 1, and then we inserted 40000 paths and other related information into the path models, MDDHT, H2 and the basic DHT respectively. TABLE I. scenarios When a link is down, delete all the paths associated with the link.
Search the path for the flow of an application with certain priority
More interestingly, Figure 9 also shows that the number of associated dimensions does not affect the performance for MDDHT. C. Total Response Time vs. % PathDB Hit Figure 10 illustrates the delay comparison with 9610 new requests in the traffic matrix, while the percentage of path hit from the pathDB varies. We can see that when the percentage of requests needed to call the multi-path algorithm decreases, the delay would reduce almost linearly. The delay with only 30% hit is nearly 9 times that with 100% hit.
TEST CASES
relational database( H2) Two tables need to be associated and querried (pathID from the relationship table(path, link) according to the linkID, and a path from the path table with the pathID)
Three tables need to be associated and querried (flowID from flow table according to the application and its priority, and then pathID from relationship table (path, flow) on the flowID, and the pathID from path table using the pathID.)
DHT Two searches on DHT are needed (pathID from one columnFamily according to the linkID, and then a path from another columnFamily according using the pathID) Three searches on DHT are needed (flowID according to an application and its priority, then pathID with flowID, and finally the pathID using the pathID.)
path model Query with a single dimen sion
Query with two dimen sions
Figure 9 shows the delay of MDDHT compared to other approaches. We can see that the delay from the traditional relational database H2 is over 10 times longer than MDDHT. As the number of associated tables increase, such delays would grow exponentially. DHT reduces the delay compared with the traditional relational databases, but still is over 6 times longer than MDDHT. As the number of associated column family increases, the delay would also explode.
Figure 9 Delay comparison of different approaches
Figure 10 Experimental result(9610 traffic matrix request, total time)
D. Overall Controller Performance MDDHT was an integral part of the Smart OpenFlow Controller (SOX). As reported in Error! Reference source not found. with four average server cluster, SOX was able to process over 5 million OpenFlow 1.3 packet-ins (with true multi-flow tables) per second on a more complicated network configuration, as most of reported test results from other controllers were conducted using only one switch. VI.
CONCLUSION AND FUTURE DIRECTIONS
This paper presents an innovative in-memory multidimensional DHT mechanism that enables the Adaptive and Dynamic Multi-path Computation Framework for routing and resource allocation optimization. It aimed to convert most of the dynamic path calculation to a query process that satisfies all valid constraints and with the proper modeling and use of the network analytics. The proposed approach make it possible for near real-time storing, processing and query of globally “better” (or even optimized) forwarding paths for every new flow utilizing flow characteristics, such as statistics and inherent but hidden patterns efficiently, while the fast response time is extremely important for large centrally controlled networks, say with 1000 or more nodes. MDDHT is also an effective extension of NoSQL with OLAP capabilities. The dimensions are derived from many specific network application scenarios with various analytics information, especially characteristics or patterns of the traffic flows in terms of tenants, applications and their time varying properties, locality, mobility, QoS, bandwidth requirements, etc. The value of the dimensions and the path scores were all defined and preprocessed, or can be dynamically updated when the network states or traffic flow patterns change significantly. Experimental results validated and confirmed our conjecture that MDDHT could provide much faster response time, compared with the traditional relational databases in memory and DHT. In addition, the results indicated that as
the pathDB provides instant path information without the need to invoke the expensive multi-path routing and resource allocation mechanism, the response time could be decreased linearly. As there is lack of real network operational data, especially from the Internet or large enterprise data centers, the applicability and effectiveness of MDDHT, together with ADMPCF, is yet to be further proven. Some internal mechanism could also be improved for better scalability, reliability and performance. Finally, as we move toward distributed control and management over a geo-physically scattered network, the proper balancing act for synchronization, consistency and performance needs to be further studied, especially for large scale distributed data center networks. Finally, as mentioned in Section 1, this paper only discussed MDDHT in a single server and a cluster of controller servers acting as ONE centralized controller. The true MDDHT was designed and implemented in a prototype of the distributed SOX (DSOX [26]). However, in a distributed environment, several other key issues still need thorough investigation, trading off performance with consistency, possible (very) limited but allowed packet losses, and how to “partition a large scale network into autonomous domains that could be effectively and efficiently controlled by a centralized controller with limited interdomain communication.
[7]
REFERENCES
[19]
[1] [2]
[3]
[4]
[5]
[6]
Open Networking Foundation, "Software-Defined Networking: The New Norm for Networks," White paper, 2012. M. Luo, Y. Zeng, J. Li and W. Chou, “An Adaptive Multi-path Computation Framework for Centrally Controlled Networks,” accepted by Journal of Computer Networks, Elsevier, February 2015. S. Iyer, S. Bhattacharyya, N. Taft, N. McKeoen, and C. Diot, “A measurement based study of load balancing in an IP backbone,” SprintATL Technical Repport.TR02-ATL-051027, 2002. D. Xu, M. Chiang, and J. Rexford. “DEFT: Distributed exponentiallyweighted flow splitting,” in Proc. of the IEEE International Conference on Computer Communications (INFOCOM), 2007. J. Zhang, K. Xi, Li. Zhang, and H. J. Chao, “Optimizing network performance using weighted multipath routing,” in Proc. of the IEEE International Conference on Computer Communications and Networks (ICCCN), 2012. D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, and G. Swallow, “RSVP-TE: Extensions to RSVP for LSP tunnels,” RFC 3209, 2001.
[8]
[9]
[10]
[11]
[12] [13] [14] [15] [16]
[17] [18]
[20] [21]
[22]
[23]
[24]
[25] [26]
D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, and J. McManus, “Requirements for traffic engineering over MPLS,” RFC 2702, 1999. P. Aukia, M. Kodialam, P. V. Koppol, T. V. Lakshman, H. Sarin, and B. Suter, “RATES: A server for MPLS traffic engineering,” IEEE Network Magazine, vol.14, no. 2, pp. 34-41, 2000. Y. Honma, A. Masaki, H. Shimonishi, and A. Iwata, “A new multipath routing methodology based on logit-type probability assignment,” IEICE transactions on communications, vol. 94, no.8, pp. 2282-2291, 2011. M. Motiwala, N. Feamster, and S. Vempala, “Path splicing: Reliable connectivity with rapid recovery,” in Proc. of ACM Workshop on Hot Topics in Networks (HotNets), 2007. M. Tian, J. L. Lan, X. Zhu, and J. M. Huang, “A routing optimization algorithm of equal-cost-multi-paths based on link criticality,” in Proc. of the International Conference on Advanced Computer Control (ICACC), 2010. H. Garcia-Molina, J. Ullman, and J. Widom, “Database systems: The complete book,” Prentice Hall, 2002. T. B. Pedersen, and C. S. Jensen, "Multidimensional database technology," Computer, vol. 34, no. 12, pp. 40-46, 2001. M. Stonebraker, “SQL databases v. NoSQL databases,” Communications of the ACM, vol. 53, no. 4, pp. 10-11, 2010. J. A. O’Brien, G. M. Marakas, “Management Information Systems,” McGraw-Hill, 2011. N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker, “NOX: towards an operating system for networks,” ACM SIGCOMM Computer Communication Review, vol. 38, no. 3, pp. 105-110, 2008. Floodlight, http://www.projectfloodlight.org/ floodlight/. Z. Cai, “Using and Programming in Maestro”, Techinical Report, Rice University. Z. Cai, "Design and implementation of the Maestro network control platform,” Master’s thesis, Rice University, Houston, 2009. OpenDaylight. [Online],http://www.opendaylight.org/. T. Koponen, M. Casado, N. Gude, etc, “Onix:A Distributed Control Platform for Large-scale Production Networks”, OSDI, vol. 10, Oct. 2010 Z. Wang, and J. Crowcroft, "Quality of service routing for supporting multimedia applications," IEEE Journal on Selected Areas in Communications, vol. 14, no. 7, 1996. M. Luo, Q. Li, K. Lin, C. Li, M. Bo, X. Wu, S. Lu, and W. Chou, “Design and Implementation of a Scalable Centralized SDN-OF Controller Cluster,” accepted by the Fifth International Conference on Advanced Communications and Computation, Brussels, Belgium, June 2015 M. Luo, Y. Tian, Q. Li, J. Wang, W. Chou, "SOX - A Generalized and Extensible Smart Network Openflow Controller", The First SDN World Summit, Germany, October 2012 H2, http://www.h2database.com/html/main.html. Q. Li, M. Luo, et al., “DSOX: Tech Report”, Huawei Shannon Lab, May 2013