IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
117
A Hybrid Hierarchical Control Plane for Flow-Based Large-Scale Software-Defined Networks Yonghong Fu, Jun Bi, Senior Member, IEEE, Ze Chen, Kai Gao, Baobao Zhang, Guangxu Chen, and Jianping Wu, Fellow, IEEE
Abstract—The decoupled architecture and the fine-grained flow-control feature limit the scalability of a flow-based software-defined network (SDN). In order to address this problem, some studies construct a flat control plane architecture; others build a hierarchical control plane architecture to improve the scalability of an SDN. However, the two kinds of structure still have unresolved issues: A flat control plane structure cannot solve the superlinear computational complexity growth of the control plane when the SDN scales to a large size, and the centralized abstracted hierarchical control plane structure brings a path stretch problem. To address these two issues, we propose Orion, a hybrid hierarchical control plane for large-scale networks. Orion can effectively reduce the computational complexity of an SDN control plane by several orders of magnitude. We also design an abstracted hierarchical routing method to solve the path stretch problem. Furthermore, we propose a hierarchical fast reroute method to illustrate how to achieve fast rerouting in the proposed hybrid hierarchical control plane. Orion is implemented to verify the feasibility of the hybrid hierarchical approach. Finally, we verify the effectiveness of Orion from both the theoretical and experimental aspects. Index Terms—SDN, control plane architecture, hybrid hierarchical, abstracted hierarchical routing, fast reroute.
I. I NTRODUCTION
S
DN decouples the network’s control plane and data plane [1], and extracts complex control functions from network devices. It also supports a fine-grained flow-based management based on the OpenFlow protocol [2] to enable highly programmable and flexible networks. Currently, almost all commercial switches support the OpenFlow protocol; other south-bound interface protocols, such as Cisco onePK API, also support the fine-grained flow control feature of SDN. We consider the fine-grained flow control to be a very important feature for supporting innovative applications, but also an inherent problem of flow-based SDN, because it introduces a Manuscript received November 24, 2014; revised February 26, 2015 and May 8, 2015; accepted May 12, 2015. Date of publication May 18, 2015; date of current version June 12, 2015. This work was supported in part by the National High-Tech R&D Program (“863” Program) of China under Grant 2013AA013505 and in part by the National Natural Science Foundation of China under Grant 61472213. The associate editor coordinating the review of this paper and approving it for publication was F. De Turck. (Corresponding author: Jun Bi.) The authors are with the Department of Computer Science and Technology, Institute for Network Sciences and Cyberspace, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China (e-mail:
[email protected]; junbi@ tsinghua.edu.cn;
[email protected];
[email protected]. edu.cn;
[email protected];
[email protected]; jianping@ cernet.edu.cn). Digital Object Identifier 10.1109/TNSM.2015.2434612
great communication overhead between the data plane and the control plane, which limits the scalability [3]–[5]. Many solutions to this problem have been explored. Some researchers design different control plane structures to extend the control plane’s processing ability. For example, some studies construct a flat control plane architecture to improve the control plane scalability and reduce the delay caused by geographical distance, as in HyperFlow [3], Onix [6] and ONOS [7]. Alternatively, some studies build a centralized hierarchical control plane of SDN, in which the top layer controller is responsible for global applications service. For example, Kandoo has a two layer hierarchical architecture, in which the bottom layer controllers run local control applications based on the local network view, and the top layer controller runs global applications based on the global network-wide view [8]. Logical xBar, on the other hand, introduces a recursive building block to construct a centralized logical hierarchical SDN network [5]. Though the above control plane architectures can improve the scalability of SDN networks, both the flat and the centralized hierarchical architecture have limitations. Since routing is an essential control operation of a SDN network, we take the Dijkstra algorithm as an example to illustrate the problem. 1) The flat control plane architecture cannot solve the super-linear computational complexity growth of the control plane when a SDN network scales to large size. To illustrate this problem, we use the source IP address and the destination IP address together to identify a flow. We take an example to illustrate the problem. Assume that a SDN controller manages M network devices; it uses SrcIP, DstIP to identify a data flow; the SDN controller adopts the Dijkstra algorithm to compute routing paths with computational complexity O(M 2 ). When the network size increases N times, there will be N ∗ M nodes, so the computational complexity of the routing algorithm increases to O(N 2 M 2 ). Thus, if we use N SDN controllers to share the work load of the control plane, the processing capacity will increase N times, but the computational complexity will increase N 2 times. A typical centralized hierarchical control plane, such as Kandoo, cannot solve the issue of the super-linear computational complexity growth of the control plane. 2) The centralized logical hierarchical control plane architecture brings path stretch problem. In a network graph, Stretch(u, v) represents the stretch of path from Path(u,v) , node u to node v. It is defined as Stretch(u, v) = ShortestPath(u,v) where Path(u, v) is the length of the path from node u to node v and the ShortestPath(u, v) is the corresponding shortest path length. A centralized logic al hierarchical control plane, such as Logical xBar, constructs abstracted hierarchical network views
1932-4537 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
118
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
to provide scalable SDN control plane, but the method used brings path stretch problem. The more layers it abstracts, the bigger the path stretch is. In order to address the above two problems, we propose Orion, a hybrid distributed hierarchical control plane for largescale networks. The proposed architecture combines the advantages of flat and centralized hierarchical control planes, and addresses the two unresolved problems discussed above. This paper has four contributions: First we design Orion, a hybrid hierarchical control plane which can reduce the computational complexity growth of the SDN control plane by constructing abstracted hierarchical network views. Second, we design an abstracted hierarchical routing method to address the path stretch problem by constructing abstract intra-area links and pre-calculating all intra-area abstracted links hops. Third, we propose a hierarchical fast reroute method to illustrate how to achieve fast rerouting in the proposed hybrid hierarchical control plane. Finally, we implement Orion to verify the feasibility of the hybrid hierarchical approach, and we verify its effectiveness from both the theoretical and experimental aspects. The rest of this paper is organized as follows. We present the architecture and modules of Orion in Section II. Then, we introduce an abstracted hierarchical routing method used to address the path stretch problem in Section III. In Section IV, we design a hierarchical fast reroute method to illustrate how to achieve fast rerouting in the proposed hybrid hierarchical control plane when single link failures happen. The implementation and evaluation of Orion are presented in Section V. Related works are introduced in Section VI. The deployment of Orion is introduced in Section VII. Finally, conclusions are given in Section VIII.
Fig. 1. Orion’s hybrid hierarchical architecture.
Fig. 2. Network view of the hybrid hierarchical architecture.
We outline the design of Orion, a hybrid hierarchical control plane architecture of SDN focusing on the intra-domain control and management of large-scale networks. Throughout this paper, a domain is a complete network which can be controlled and managed by one administrator. It can be divided into several areas, which are regions that can each be controlled by a single SDN controller.
overhead between different control plane layers, the detailed design and the prototype implementation of Orion is based on a two-tier architecture. We show the hierarchical network views of Orion in Fig. 2. The bottom layer’s view is the view of physical device layer. The top layer’s view is the view of the lower abstracted area. In this layer, every area is abstracted as a node, and the exports of the area are regarded as the ports of the node. Through dividing areas and constructing abstracted hierarchical views, the hybrid hierarchical architecture of Orion can efficiently reduce the control plane’s computational complexity growth by several orders of magnitude. The detailed analysis about the computation complexity is analyzed in Section III-G.
A. Architecture
B. Components
The hybrid hierarchical architecture of Orion is shown in Fig. 1. 1) The bottom layer of Orion is the area controller layer. This is connected to physical switches, and is responsible for collecting physical device and link information, managing the intra-area topology and processing the intra-area routing requests and updates. It abstracts its local network view and sends it to the top layer. 2) The top layer of Orion is the domain controller layer, which treats area controllers as devices, and synchronizes the global abstracted network view through a distributed database. The distributed domain controller is used to reduce the delay caused by the long geographic distance. Theoretically, Orion could be extended to a multi-layer control plane architecture. However, because of the communication
There are nine major modules in Orion, shown in Fig. 3. Among these, the Device Management, Topology Management, Routing, Fast Reroute, Storage and Vertical Communication Channels modules each have two sub-modules located separately in the area controller and the domain controller, which are responsible for intra-area and inter-area information processing respectively. Corresponding sub-modules have the same color and the dashed lines show the communication between the modules. OpenFlow Base Module: OpenFlow [2] is a widely-used communication protocol, which has been used for Orion’s southbound interface. The OpenFlow Base Module is responsible for receiving Packet-In messages, and providing an interface
II. D ESIGN
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
Fig. 3. The modules of Orion.
for other modules that listen to it or install rules on OpenFlow switches. Device Management Module: This module has two parts to deal with the area device information and domain device information. 1) Area Device Management. This sub-module obtains the host information through the ARP packet sent by the host. When a host sends an ARP packet, the switch that connects to the host sends a Packet-In message to the area controller. The sub-module decapsulates the PacketIn message, acquires the host information and collects switch information. 2) Domain Device Management. In order to provide interarea host information, this module works as an ARP Proxy in Orion. In order to prevent a broadcast storm, we use an algorithm similar to Spanning Tree to avoid a broadcast loop. This sub-module also manages the global edge switches information. Link Discovery Module: This Module obtains intra-area and inter-area link information through the LLDP protocol. When there are multiple areas, the module needs to acquire LLDP messages from other areas to discover the links. To obtain interarea link information, one area controller sends LLDP packets to all ports of its edge switches. Upon receipt of a packet the switch forwards it to the edge switch in another area through their physical link. When the packet reaches the edge switch in another area it is encapsulated into a Packet-In message and sent it to the area controller, which decapsulates the message and extracts the TLV message from the LLDP Packet. If the LocalControllerID in the LLDP message is different from its own ControllerID, the area controller knows that this is an external link and that the switch sending the Packet-In message is an edge switch. Topology Management Module: This module has two submodules: the Area Topology Management sub-module and the Domain Topology Management sub-module. 1) The Area Topology Management sub-module manages the physical topology information received from the Link Discovery Module. Specifically, the module is responsible for adding, updating and deleting topology information. 2) The Domain Topology
119
Management sub-module is used to manage the abstract topology of the top layer. Routing Management Module: This module is divided into two sub-modules: the Area Routing Management sub-module and the Domain Routing Management sub-module. The area routing management sub-module is responsible for intra-area routing path calculation based on the local area network view. The domain routing management sub-module calculates the inter-area routing path based on the lower layer abstracted network view. However, as the domain controller cannot see the links inside the area, it is impossible to calculate the interarea global shortest path. In order to address this problem, we design an abstracted hierarchical routing method (introduced in Section III). Storage Module: This module stores host, switch and link information, the latter including both abstract and physical links. There are two kinds of abstract link, between inner switches and edge switches, and between edge switches, in both cases showing the shortest path. A physical link indicates the real physical inter-area links between different areas. The Area Storage sub-module stores the host, switch and real physical link information of its own area. The Domain Storage SubModule stores the host, switch and abstract link information of all areas, together with inter-area physical link information. Horizontal Communication Module: This module is responsible for synchronizing global abstract network information within the domain controller cluster. In this part, we use a scalable NoSQL database which supports dynamic clustering to store global host information, global switch information and global abstract topology information. The distribution of routing rules is realized through the Publish/Scribe mechanism. In the top layer of Orion, every domain controller creates a topic with its ControllerID. When two hosts communicate between two different sub-domains, a flow routing computation request is sent to the domain controller which is responsible for the source sub-domain. Since the domain controller has global abstract topology, the source domain controller can determine the overall inter-area routing path. Then it acquires the ControllerID of all area controllers which own the edge switches on the routing path, as well as the ControllerID of the corresponding domain controllers. Next it publishes the routing rules to all domain controllers on the routing path according to their ControllerID obtained above. Finally each domain controller sends the routing rules to the area controllers according to the received area ControllerID. Vertical Communication Module: The vertical communication channel between the area controllers and the domain controllers is established via the TCP connection. It is used to send requests and distribute rules. 1) Send Requests. The vertical communication channel between area controllers (client) and the domain controllers (server) is established via a TCP connection. In this part, the domain controller is built with asynchronous sockets. 2) Distribute Rules. The domain controller distributes rules (in reverse order) and ARP replies through the established TCP connection with the area controller.
120
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
Fast Reroute Module: This module has two sub-modules: the Area Fast Reroute sub-module and the Domain Fast Reroute sub-module. 1) Area fast reroute. To protect against intra-area single-link failures, the area controller employs Dijkstra’s algorithm to calculate the shortest path for each node pair in its area as the main working path, and pre-computes the backup path for each node pair by excluding each link in its area. The Area Routing Management sub-module is responsible for the main working path calculation, and the Area Fast Reroute sub-module is responsible for the backup path calculation. When an intra-area link failure happens, the failure OpenFlow switch sends a link down status message to the area controller. Then the Link Discovery Module in the area controller sends the intra-area link changes to the Area Topology Management sub-module. After that, the Area Topology Management sub-module updates the area topology and sends it to the Area Fast Reroute sub-module. Finally, the Area Fast Reroute sub-module installs new flow entries on the switches in the new routing path list. 2) Domain fast reroute. The top domain layer abstracts each area as a node, and regards the exports of the area as the ports of the node to carry out inter-area fast reroute. The Domain Routing sub-module is responsible for the inter-area main working path calculation, and the Domain Fast Reroute sub-module is responsible for the inter-area backup path calculation. We have designed a hierarchical fast reroute method to illustrate how to achieve fast rerouting in Orion (introduced in Section IV). Failover: The fault tolerance and failover between switches and area controllers is guaranteed by the OpenFlow protocol. From OpenFlow v1.2 [9], all controllers have specific roles, such as Master, Slave or Equal. When a Master area controller fails, the Slave area controller can quickly take over all the switches that connect to the failed Master controller. III. A BSTRACTED H IERARCHICAL ROUTING M ETHOD (AHRM) In order to address the path stretch problem of the abstracted hierarchical control plane architecture, we propose an abstracted hierarchical routing method (AHRM), based on Dijkstra’s algorithm [10]. In AHRM, every area controller precomputes the inner hops from every switch to all edge switches by Dijkstra’s algorithm to construct the abstract link set, which is sent to the domain controller. Then the domain controller adds the inter-area path length and the intra-area abstract link length to calculate the global shortest path. Before giving further details of AHRM we introduce some necessary definitions. A. Definitions Definition 1: The network graph is G = (V, E), where V is the set of all physical network nodes and E is the set of all the edges or links. Definition 2: A network graph Gj = (Vj , Ej ) is a sub-graph of the physical network graph G = (V, E), when Vj ⊂ V, Ej ⊂ E. We distinguish between inner network nodes Vj,inner and edge
Fig. 4. Control plane multi-layer network graph.
network nodes Vj,edge and between inner links Ej,inner and links connected with other sub-networks Ej,ext . Definition 3: The control plane has L + 1 layers. The domain controller is at the top layer L. The area controller is at the bottom layer 0. Each domain controller can be seen as the root of a k-tree. Assume there are c domain controllers, then c ∗ K L = N. Definition 4: A network graph Gh,j = (Vh,j , Eh,j ) is the network graph stored in the j-th controller located on the layer h. We distinguish between inner network nodes Vh,j,inner and edge network nodes Vh,j,edge and between inner network link Eh,j,inner and external network link Eh,j,ext . Definition 5: The layer 0 controller’s network graph G0,j = (V0,j , E0,j ) is mapped to the underlying physical sub-network graph G. Definition 6: A network graph AbsGh,j = (AbsVh,j , AbsEh,j ) is the abstract network view constructed by the j-th controller on the layer h. Definition 7: An abstract network view table AbsTh,j = (AbsTVh,j , AbsTEh,j ) is used to store the abstracted network links from all inner switches to all its edges switches Vh,j,edge . The inner switches including its own inner switches and all its lower layer children’s inner switches AbsTVh,j = Vh,j,inner + TinnVh,j . Definition 8: An inner network view table Tinnh,j = (TinnVh,j , TinnEh,j ) is used to store all its children’s abstracted network links (from the children’s inner switches to all its edges switches). Tinnh,j = AbsTh−1,(j−1)∗k+1 ∪ · · · ∪ AbsTh−1,(j−1)∗k+k , where (1 ≤ h ≤ L) and Tinn0,j = ∅. The control plane’s multi-layer network graph is shown in Fig. 4. Meanwhile, we give an example about network graph G0,j to illustrate the notations introduced above, it is shown in Fig. 5. B. Assumptions Assumptions 1: The network graph G = (V, E) is a connected network graph. Assumptions 2: Each sub-network graph Gi = (Vi , Ei ) of the network G is a connected network graph. Assumptions 3: Each physical network node and each controller of the network has a unique ID. C. Multi-Layer Abstracted Network View Construction In order to address the path stretch problem, the AHRM constructs a multi-layer abstracted network view to calculate global shortest path, as shown in Algorithm 1. Between any two
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
121
Fig. 6. Multi-layer network view example Gh,j . Fig. 5. A notation example with network graph G0,j .
layers in the control plane, we name the upper layer controller “parent” and the corresponding lower layer controller “child.” Algorithm 1 CalcAbstractLink Input: Vh,j,edge , Gi,j , Tinnh,j . When h == 0, Tinnh,j = ∅. Output: AbsGh,j , AbsTh,j . 1: Process: 2: function C ALC A BSTRACT L INK(Vh,j,edge , Gh,j , Tinnh,j ): 3: Vh,j , Eh,j = GetNodesLinks(Gh,j ) 4: TinnVh,j , TinnEh,j = GetNodesLinks(Tinnh,j ) 5: AbsTVh,j = Vh,j + TinnVh,j − Vh,j,edge 6: /∗calculate all src,dst pair path hop∗/ 7: AllPathHop = Dijkstra(Vh,j + TinnVh,j , Eh,j + TinnEh,j ); 8: AbsEh,j,edge = ∅, AbsTEh,j = ∅; 9: for SrcSwitch ∈ Vh,j,edge + AbsTVh,j do 10: for DstSwitch ∈ Vh,j,edge do 11: Hop=GetPathHop(AllPathHop, SrcSwitch, DstSwitch); 12: abstractlink = SrcSwitch, DstSwitch, Hop; 13: if SrcSwitch ∈ Vh,j,edge then 14: AbsEh,j,edge + = abstractlink; 15: else if SrcSwitch ∈ AbsTVh,j then 16: AbsTEh,j + = abstractlink; 17: end if 18: end for 19: end for 20: AbsGh,j = Graph(Vh,j,edge , AbsEh,j,edge , Eh,j,ext ); 21: AbsTh,j = StoreTable(AbsTVh,j , AbsTEh,j ); 22: return AbsGh,j , AbsTh,j 23: end function Firstly, the area controller, which is the j-th controller on layer 0, calls the CalcAbstractLink function introduced in Algorithm 1 to pre-compute the inner hops between any two edge switches in its area by Dijkstra algorithm to construct its
abstract link set AbsE0,j,edge , and pre-compute the inner hops between any inner switch and any edge switch in its area to construct its inner abstract link set AbsTE0,j . Then the area controller constructs the abstract network graph AbsG0,j with V0,j,edge , AbsE0,j,edge , E0,j,ext , and stores all the abstract link set in the abstract network view table AbsT0,j . Finally, the area controller sends the AbsG0,j and AbsT0,j to its parent in layer 1. Secondly, when a parent controller which is the j-th controller locates on layer h(1 ≤ h ≤ L) receives all the abstracted network graphs from its children, it gets all the switches from its children’s abstracted network views AbsGh−1,i . Then the parent controller checks all the external links of its children. If any of the source switch or the destination switch of a link is not belong to its children, then the link is an external link connected with another parent controller and the switch that connects with an external link is an edge switch. Next, the parent controller gets knowledge of all edge switches Vh,j,edge and all inner switches Vh,j,inner . Further, it construct its own network graph Gh,j = (Vh,j , Eh,j ), where Vh,j = Vh,j,edge ∪ Vh,j,inner and Eh,j = Eh,j,inner ∪ Eh,j,ext . Finally, it calls the CalcAbstractLink function in Algorithm 1 to construct its own abstracted network view AbstractGh,j and the abstract inner switch hop table AbsTh,j . In the CALCABSTRACTLINK function, all inner switches Vh,j − Vh,j,edge of Gh,j are added with the lower layer inner switches Tinnh,j and stored the result into the inner switch set AbsTVh,j of table AbsTh,j . Then the parent controller precomputes the inner hops between any two edge switches in its network graph by Dijkstra algorithm to construct its abstract link set AbsEh,j,edge , and pre-computes the inner hops from any inner switch stored in the AbsTVh,j to any edge switch in the Vh,j,edge to construct its inner abstract link set AbsTEh,j . Then the parent controller constructs the abstract network graph AbsGh,j with Vh,j,edge , AbsEh,j,edge , and Eh,j,ext , and stores the abstract link set from its inner switches in the abstract network view table AbsTh,j . If the parent controller has an upper layer parent controller, it will send its abstract network view AbsGh,j and its abstract network view table AbsTh,j to its parent to construct next layer’s abstracted network view. We give an multi-layer network view graph example in Fig. 6 to illustrate the construction of Gh,j between the parent controller and
122
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
children controllers. The multi-layer abstracted network view construction algorithm is shown in Algorithm 2. Algorithm 2 Multi-layer abstract network view construction Input: AbsGh−1,i = (Vh−1,i,edge , AbsEh−1,i,edge , Eh−1,i,ext ), AbsTh−1,i = (AbsTVh−1,j , AbsTEh−1,j ) Output: Gh,j , Tinnh,j , AbsGh,j , AbsTh,j . 1: Process: 2: Vh,j = ∅, TinnVh,j = ∅; 3: Eh,j = Eh,j,inner Eh,j,ext = ∅, TinnEh,j = ∅; 4: Vh,j,edge = ∅; 5: for i ∈ child(Parentj ) do 6: Vh−1,i,edge = GetSwitches(AbsGh−1,i ); 7: Vh,j = Vh,j Vh−1,i,edge ; 8: end for 9: for i ∈ child(Parentj ) do 10: AbsEh−1,i,edge = GetAbsLinks(AbsG h−1,i ); 11: Eh,j,inner = Eh,j,inner AbsEh−1,i,edge ; 12: Eh−1,i,ext = GetExtLinks(AbsGh−1,i ); 13: for link ∈ Eh−1,i,ext do 14: SrcSwitchId=GetSrcSwitch(link); 15: DstSwitchId=GetDstSwitch(link); 16: if (SrcSwitchId ∈ Vh,j ) and (DstSwitchId ∈ Vh,j ) then 17: Eh,j,inner + = link; 18: else 19: Eh,j,ext + = link; 20: if SrcSwitchId ∈ Vh,j then 21: Vh,j,edge + = SrcSwitchId; 22: else if DstSwitchId ∈ Vh,j then 23: Vh,j,edge + = DstSwitchId; 24: end if 25: end if 26: end for 27: end for 28: Eh,j = Eh,j,inner Eh,j,ext 29: Gh,j = Graph(Vh,j , Eh,j ); 30: for i ∈ child(Parentj ) do 31: AbsTVh−1,i = GetSwitches(AbsT h−1,i ); 32: TinnVh,j = TinnVh,j AbsTVh−1,i ; 33: AbsTEh−1,i = GetLinks(AbsT h−1,i ); 34: TinnEh,j = TinnEh,j AbsTEh−1,i ; 35: end for 36: Tinnh,j = StoreTable(TinnVh,j , TinnEh,j ); 37: AbsGh,j , AbsTh,j = C ALC A BSTRACT L INK(Vh,j,edge , Gh,j , Tinnh,j ); 38: return Gh,j , Tinnh,j , AbsGh,j , AbsTh,j ;
D. Routing Algorithm The routing algorithm of AHRM is divided into two parts, for Bottom Layer and Upper Layer. Bottom Layer Routing Algorithm (Layer 0): This is responsible for calculating the intra-area routing path and constructing the intra-area abstract links to support the global shortest path calculation.
The area controller is responsible for calculating the intraarea routing path. The local routing algorithm is given in Algorithm 3. When a Packet-In message arrives, the area controller checks the source and the destination IP addresses in the message. If the source host and the destination host of a data flow are both in its area, the area routing algorithm employs Dijkstra’s algorithm to calculate the intra-area path for the flow. If the source host or the destination host is not in its area, then the data flow involves multiple areas. As the area controller only has its local network view, it encapsulates the source host IP and the destination host IP of the packet into a message and sends the message to the its parent to calculate the inter-area routing path. When its parent sends the routing result message to the area controller, the area controller analyzes the message, acquires the (IngressEdgeSwitchID, EgressSwitchID) of the dataflow in its area, and employs CalcShortestPathList function (in Algorithm 4) to calculate the intra-area routing path. Finally, it installs flow entries in all switches on the routing path and forwards the flow. Algorithm 3 Layer 0 Routing Algorithm Input: G0,j , host Set Host0,j , source host IP, destination host IP Output: Message sent to upper Controller or flows installed to switches 1: Process: 2: if SrcHostIP ∈ Host0,j and DstHostIP ∈ Host0,j then 3: /∗ calculates intra-area routing path∗/ 4: SrcSwitchID = DeviceManagement.getSwitch (SrcHostIP); 5: DstSwitchID = DeviceManagement.getSwitch (DstHostIP); 6: PathSwitchList = Dijkstra(G0,i , SrcSwitchID, DstSwitchID); 7: InstallFlowEntries(PathSwitchList, SrcHostIP, DstSwitchIP); 8: else if SrcHostIP ∈ Host0,j or DstHostIP ∈ Host0,j then 9: /∗sends a path request message to upper controller∗/ 10: message = PathRequest(SrcHostIP, DstHostIP) 11: WriteVerticalChannel(UpperController, message); 12: end if
Algorithm 4 CalcShortestPathList Input: Gh,j , Tinnh,j , SrcSwitchID, DstSwitchID Output: The shortest Path List SP Process: 1: function C ALC S HORTEST PATH L IST(SrcSwitchID, DstSwitchID, Gh,j , Tinnh,j ) 2: Vh,j , Eh,j = Get(Gh,j ) 3: V = ∅, E = ∅ 4: if SrcSwitchID ∈ Vh,j and DstSwitchID ∈ Vh,j then 5: V = Vh,j + SrcSwitchID + DstSwitchID 6: E = Eh,j GetAbsLinks(Tinnh,j , SrcSwitchID) GetAbsLinks(TinnEh,j , DstSwitchID) 7: else if SrcSwitchID ∈ Vh,j and DstSwitchID ∈ Vh,j then + SrcSwitchID 8: V = Vh,j 9: E = Eh,j GetAbsLinks(Tinnh,j , SrcSwitchID)
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
123
10: else if SrcSwitchID ∈ Vh,j and DstSwitchID ∈ Vh,j then + DstSwitchID 11: V = Vh,j 12: E = Eh,j GetAbsLinks(Tinnh,j , DstSwitchID) 13: else if SrcSwitchID ∈ Vh,j and DstSwitchID ∈ Vh,j then 14: V = Vh,j 15: E = Eh,j 16: end if 17: G = (V , E ) 18: SP = Dijkstra(G , SrcSwitchID, DstSwitchID); 19: return SP 20: end function CalcShortestPathList Function. This function is used to calculate global shortest path list. When a controller calculates the path list for a routing request, it checks whether the source switch ID and the destination switch ID are both in its network graph Gh,j . If any of them does not belong to its switch set Vh,j , it checks its inner switch table Tinnh,j and gets all the abstracted links related to the switch. In the algorithm, Tinnh,j table is a key-value type hash table used to store the abstract links from every inner switch to every edge switch. The key of this hash table is the ID of each inner switch, and the value of the hash table is a set of the entire edge switch IDs which connect with each inner switch and the corresponding hops. Meanwhile, the Tinnh,j table is pre-calculated. The query complexity of this table is O(1). Further, the controller constructs a new graph G = (V , E ) to calculate the global shortest path. The construction rules of G are as follows. 1) If the source switch and the destination switch are both inner switches in two different areas, then controller constructs the abstract network graph G with its edge switch set Vh,j , the abstract links from the source switch to all edge switches in the source area and the abstract links from the destination switch to all edge switches in the destination area. Then V = Vh,j + SrcSwitchID + DstSwitchID. The number of switches in V equals to K + 2. 2) If the source switch is an inner switch in an area and the destination switch is an edge switch in another area, then V = Vh,j + SrcSwitchID. The number of switches in V equals to K + 1. 3) If the source switch is an edge switch in an area and the destination switch is an inner switch in another area, then V = Vh,j + DstSwitchID. The number of switches in V equals to K + 1. 4) If the source switch and the destination switch are both edge switches in two areas, then V equals to its edge switch set Vh,j . The number of switches in V equals to K + 1. Next, the domain routing algorithm employs Dijkstra algorithm to calculate the shortest routing path SP based on the network graph G . The computation complexity of the CalcShortestPathList function equals to Max(O((K + 2)2 ), O((K + 1)2 ), O(K 2 )) + O(1) = O(K 2 ). The algorithm of CalcShortestPathList function is shown in Algorithm 4. Upper Layer Routing Algorithm: Suppose an area controller receives a request from one of its hosts, in which the destination host is located in a different area. The area controller will send the routing request to its parent controller, which checks whether the source IP and destination IP address is in its host set. If it is, the parent controller will call the CalcShortestPathList function to calculate a routing path list and send the result to all the lower layer controllers on the routing path. If not, it
Fig. 7. Inter-area routing example topology.
will send the request to its parent. Should a request reach the top-level domain controller it will definitely be satisfied since this controller has global information. The upper layer (h > 0) controller’s routing algorithm is shown is Algorithm 5. Algorithm 5 Upper Layer routing Algorithm (Layer h, h > 0) Input: Gh,j , Th,j , Hosth,j (host set of the j-th controller on layer h), SrcHostIP and DstHostIP in the request message Output: Path request message to upper controller or path install message to lower controller 1: Process: 2: if SrcHostIP ∈ Hosth,j and DstHostIP ∈ Hosth,j then 3: SrcSwitchID = DeviceManagement.getSwitch (SrcHostIP); 4: DstSwitchID = DeviceManagement.getSwitch (DstHostIP); 5: PathList = CaculatePathList(SrcSwitchID, DstSwitchID) 6: message = PathInstall(SrcHostIP, DstHostIP, PathList) 7: WriteVerticalChannel(LowerControllerSet, message) 8: /∗if this is controller in top layer, distribute message in all top nodes∗/ 9: if h == L then 10: DistributeMessage(TopControllerSet, message) 11: end if 12: else if SrcHostIP ∈ Hosth,j or DstHostIP ∈ Hosth,j then 13: message = PathRequest(SrcHostIP, DstHostIP); 14: WriteVerticalChannel(UpperController, message); 15: end if The core idea of the abstracted hierarchical routing method is similar to IS-IS [11] and OSPF [12]. Through dividing the global network into multiple areas, the administration complexity and routing computational complexity is reduced. However, IS-IS and OSPF protocols run on traditional distributed routers; the abstracted hierarchical routing method runs on the logical centralized area controllers and distributed domain controllers. E. A Two-Layer AHRM Routing Example We give an example to illustrate how Orion carries out the inter-area routing with a two-layer control plane. The routing example is based on the topology shown in Fig. 7. This example illustrates how different areas’ hosts (C and D) communicate in
124
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
a shortest path SP from the source switch to the destination switch in the abstract network graph. As network graph G has an abstract network graph G , then we can find an corresponding abstract shortest path for SP in G , let it be SP . Assume the shortest path from the source switch to the destination switch in G is not the shortest path in G, then SP > SP . As SP is the minimal length value that covers all possible cases from the source switch to the destination switch in the abstract network graph, SP < SP , which is a contradiction. Hence the shortest path from the source switch to the destination switch in G corresponds to the shortest path in G. G. Cost Analysis Fig. 8. The abstract network graph from the source switch to the destination switch.
Orion. When C sends a data flow to D, the data flow reaches the switch which D connects to. Assume the switch does not install flow entry for the flow. Then the switch generates a Packet-In message for the first packet of the flow and sends the message to area controller2. As D is not in area2, when area controller2 receives the message, it extracts the SrcHostIP, DstHostIP from the Packet-In message and encapsulates it to a simple request, sends the request to domain controller1. When domain controller1 receives the request, it calculates the inter-area path according to the global abstracted network view. As the destination D is in area4 (controlled by domain controller2), domain controller1 publishes the inter-area path routing rules (through the horizontal communication channel) to domain controller2 through the horizontal communication channel, which forwards messages to the area controllers on the routing path. When area controller3 and area controller4 receive the messages, they know the IngressSwitchID, EgressSwithID of the data flow, so they calculate the intra-area routing path and install the routing rules on the related switches. Finally, when all the switches on the routing path have installed routing rules, the data flow is forwarded from C to D. F. Correctness Proof In this subsection, we prove the correctness of the AHRM. The abstract network graph G from the source switch to the destination switch used in this part is shown in Fig. 8. Theorem 1: Given a connected network graph G = (V, E) and the connected sub-network Gi = (Vi , Ei ) of G, construct an abstract network graph G = (V , E ). In the abstract network graph, in which V includes the source switch, the destination switch, and all edge switches Vi,edge in Gi , and E contains the abstract links from the source switch and each edge switch to all edge switches in the source area, the abstract links from the destination switch and each edge switch to all edge switches in the destination area and all inter-area links Eedge in G. Then the shortest path from the source switch to the destination switch in G corresponds to the shortest path in G. Proof: Suppose network graph G has a shortest path SP from the source switch to the destination switch in the physical network graph. Meanwhile, the abstract network graph G has
In this part, the cost analysis is based on a two-layer hierarchical control plane architecture. Computation Complexity: Assuming there are N areas, and each area has Mi nodes. For simplicity, we assume that Mi = M. 1) Dijkstra algorithm computation complexity. In an area with M nodes, the SDN controller adopts the Dijkstra algorithm to compute routing paths with computational complexity O(M 2 ). When the network size increases N times, there will be N ∗ M nodes, so the computational complexity of the routing algorithm increases to O(N 2 M 2 ). 2) Orion’s computation complexity. If there are K edge switches in a domain, the computation complexity from an edge switch in an area to another edge switch in another area is O(K 2 ). The computation complexity from one inner switch in an area to another edge switch in another area is O((K + 1)2 ) = O(K 2 ), and the computation complexity from one inner switch in an area to another inner switch in another area is O((K + 2)2 ) = O(K 2 ). Let MOrion denotes the magnitude can reduced by Orion; 1/x, (x > 1) presents the average proportion of edge switches in all switches in an area, then MOrion =
O ((1/x) ∗ M ∗ N)2 O(K 2 ) 1 = = 2. O(M 2 N 2 ) M2N 2 x
When x = 3, Orion can reduce the computation complexity of flow-based SDN control plane by one order of magnitude. When x = 10, it can reduce the computation complexity of flowbased SDN control plane by two orders of magnitude. From the result we can see, Orion can reduce the computation complexity of flow-based SDN control plane by several magnitudes. Storage Space: The control plane stores host information, switch information and link information. 1) Host information storage (18 bytes) includes: MAC address (6 bytes), IP address (4 bytes) and the SwitchID (8 bytes) of the switch it connects with. 2) Switch information storage (9 bytes) includes: the switch ID (8 bytes) and connection status with the controller (1 byte). 3) Link information storage includes physical and abstract links. Each intra-area physical link (20 bytes) includes: SrcSwitchID (8 bytes), SrcPort (2 bytes), DstSwitchID (8 bytes), and DstPort (2 bytes). Each inter-area physical link (36 bytes) includes: SrcControllerID (8 bytes), SrcSwitchID (8 bytes), SrcPort (2 bytes), DstControllerID (8 bytes), DstSwitchID (8 bytes) and DstPort (2 bytes). The abstract link (18 bytes)
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
includes: SrcSwitchID (8 bytes), DstSwitchID (8 bytes) and hop (2 bytes). The total storage in each area controller includes: intra-area host information, switch information, physical links, abstract links and inter-area physical links. Let Hnum denotes the number of hosts in each area; M represents the number of intra-area switches; K denotes the total edge switch number in the N areas (the whole domain). Assume the average degree of all switches in the area is Sdegree, where the degree of switch means the number of its links to other switches. Then, the number of ; the number of intra-area intra-area physical links is Sdegree∗M 2 abstract links is K∗(M−1) ; the average number of inter-area links 2∗N Sdegree∗K is . The total storage in the area controller is Hnum ∗ N 18 + M ∗ 9 + Sdegree∗M ∗ 20 + K∗(M−1) ∗ 18 + Sdegree∗K ∗ 36. 2 2∗N N The total storage space in the domain controller includes: host information, switch information, abstract links in all areas, and inter-area physical links, totaling N ∗ Hnum ∗ 18 + N ∗ M ∗ ∗ 18 + Sdegree∗K ∗ 36. 9 + K∗(M−1) 2 2 Communication Cost: There are three kinds of communication cost: 1) Abstract network topology transmission cost. All area controllers send the abstract network topology to the . 2) Send requests cost. domain controllers, costing K∗(M−1) 2 The area controller sends a inter-area routing request to the domain controller for each inter-area routing request, including source host IP address (4 Bytes) and destination host IP address (4 Bytes). 3) Distribute rules cost. The domain controller distributes the routing path to all the area controllers in the routing path. The routing path includes a path list: source host IP address, the destination host IP address and the IngressSwitchID, EgressSwitchID for all passing areas. If LengthRP denotes the length of the routing path and the routing path passes c areas, the distribute cost between the domain controller and the area controller is c ∗ LengthRP. We use TCP to send requests and distribute rules, with a 20-byte header. IV. FAST R EROUTE In this section, we design a hierarchical fast reroute method to illustrate how to achieve fast rerouting in Orion’s hybrid hierarchical control plane architecture. The modules which support the fast reroute function are introduced in the Fast Reroute Module in Section II. In a traditional network, IETF provides a framework called IPFRR [13] for IP fast-reroute mechanisms that provide protection against link failure by invoking locally determined repair paths. FatTire proposes a new language for writing fault-tolerant network programs [14]. Kamamura et al. [15] propose an IP fast rerouting method and a compression mechanism for their IP fast rerouting mechanism. Borokhovich et al. [16] present a fast failover mechanism for OpenFlow networks which provably guarantees data plane connectivity. However, none of the above studies introduced a way to carry out a fast reroute under a logical hierarchical architecture. A. Intra-Area Fast Reroute To protect against intra-area single-link failures, the area controller employs Dijkstra’s algorithm to calculate the shortest
125
Fig. 9. Intra-area link failure example topology.
TABLE I S1, S2 M AIN W ORKING PATH AND BACKUP PATH
path for each SrcSwitch, DstSwitch pair as the main working path. It also pre-computes the backup path for each SrcSwitch, DstSwitch pair by excluding each link in its area. When an intra-area link failure happens, the failure OpenFlow switch sends a link down status message to the area controller. Through several modules (introduced in the Fast Reroute Module introduced in Section II), the Area Fast Reroute submodule in the area controller receives the link down status message. Then it sends a flow table require message through the OpenFlow Base Module to the failure switch. When the failure switch receives the message, it sends a reply message with the information in the flow table to the Area Fast Reroute sub-module through the OpenFlow Base Module. Then the Area Fast Reroute sub-module gets the SrcHostIP, DstHostIP of the flows pass that failure link. Next, it checks the backup path of the SrcSwitch, DstSwitch pair according to the precomputation backup routing result, and installs flow entries for these flows on all the switches in the backup routing path list with a higher priority. Finally, the data flows are rerouted. We give an example to illustrate intra-area fast rerouting method. The topology used in this example is shown in Fig. 9. Firstly, the area controller employs Dijkstra’s algorithm to calculate the routing paths for all SrcSwitch, DstSwitch pairs as main working paths. Meanwhile, the area controller precomputes the backup paths for each SrcSwitch, DstSwitch pair by excluding every intra-area link, shown in Table I. We choose two SrcSwitch, DstSwitch pairs which pass link S1, S2 as an example, S1, S2 and S1, S6, shown in Table I. When link S1, S2 fails, the area controller first checks the flow table of S1, and finds out HostA, HostB, HostA, HostC and HostA, HostD pass the failed link S1, S2. Then the area controller checks the backup path for the S1, S2 link
126
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
(shown in Table I) and installs entries for the flows past the failure link with higher priority on the switches in the new path list. Finally, the data flows are rerouted.
B. Inter-Area Fast Reroute To protect against inter-area single-link failures in the hierarchical fast reroute method, the top domain layer abstracts each area as a node, and regards the exports of the area as the ports of the node to carry out inter-area fast reroute. The method is the same as the intra-area fast reroute method. The domain controller employs Dijkstra’s algorithm to calculate the routing paths for all nodes pairs between the edge switches of the two different areas as main working paths. To protect the inter-area single-link failures, it also pre-computes the backup path for all inter-area links based on the network graph with all areas’ edge switches Gedge by excluding every inter-area link, where Gedge = (V0,j,edge , AbsE0,j,edge ). When an inter-area link failure happens, the edge failure switch sends a link down status message to the area controller. Same with the procedure introduced in the intra-area fast reroute subsection, the Area Fast Reroute sub-module gets the SrcHostIP,DstHostIP from the flows pass that failure link. Then it sends the IP pairs of the flows to the domain controller. Next, the domain controller checks the backup path for the failure link and gets the backup edge switches path list for the failure link. Further, it sends the backup routing path list to the corresponding area controllers who own these backup edge switches. When the area controller receives the backup routing path list, it calculates the intraarea routing path and installs new flow entries for the rerouting flows with higher priority. Finally, the data flows are rerouted. The procedure for both area and domain controllers is shown in Algorithm 6. Algorithm 6 Inter-area fast reroute algorithm Input: Source Host IP, Destination Host IP Global edge switches topology Gedge ,abstract link sets of all areas AbsE0,j and area edge switch sets of all areas V0,j,edge Output: Routing path list across multiple areas 1: Process 1: Area Controller 2: Begin 3: faultSwitchId, faultPort = GetMySwitchPort(downLink); 4: flowtableSet = GetAffectedFlows(faultSwitchId, faultPort); 5: IpPairSet = GetMatch(flowtableSet); 6: If HasBackupLink(downLink) 7: alterLink = GetBackupLink(downLink); 8: alterSwitchPort = GetMySwitchPort(alterLink); 9: extraPath = GetPath(faultSwitchId,alterSwitchPort); 10: InstallFlows(IpPairSet, extraPath, priority); 11: Else 12: message = DownLinkMessage(downLink,IpPairSet); 13: SendMessageToDomain(message); 14: EndIf 15: /∗remove affected flowtables∗/
Fig. 10. Inter-area link failure example topology.
TABLE II M AIN W ORKING PATH L IST OF S1
16: RemoveFlows(faultSwitchId, flowtableSet); 17: End 18: Process 2: Domain Controller 19: Begin 20: /∗get backup path∗/ 21: srcSwitchPort,dstSwitchPort = GetPorts(downLink); 22: backupPath = GetBackupPath(srcSwitchPort, dstSwitchPort); 23: /∗install extra path with higher priority∗/ 24: areaControllerIDSet = GetTroughController (backupPath); 25: message = InstallPathMessage(IpPairSet, backupPath, priority); 26: SendMessageToArea(areaControllerIDSet, message); 27: End We give an example to illustrate the inter-area fast reroute method, using the topology in Fig. 10. First, the domain controller employs Dijkstra’s algorithm to calculate the routing paths for all nodes pairs between the edge switches of the two different areas as main working paths. Table II shows the inter-area main working path list of S1 (from S1 to the other area’s edge switches). Second, in order to protect inter-area link failure, the domain controller pre-computes the backup paths for all node pairs between the edge switches pairs of two different areas by excluding every inter-area link. Table III shows the backup path of the switch pairs between S1 and other area’s edge switches for the inter-area link S1, S4. When link S1, S4 fails, the edge OpenFlow switch of area 1 sends link down status message to the area 1 controller. Third, area controller 1 checks the flow table of S1 and gets the SrcHostIP, DstHostIP pairs passing that link. In this example,
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
127
TABLE III BACKUP PATH OF S1 FOR THE L INK S1, S4
the area controller finds HostA, HostB passing the failure link S1, S4. Then it sends the host IP pair to the domain controller. When the domain controller receives the IP pair, it checks the source switch and the destination switch which connect with the host IP pair, finding that S1 connects with host A and S6 connects with host B. Fourth, the domain controller checks the backup path for S1, S4 and gets a new edge switches path list S1, S3, S3, S9, S9, S7, S7, S6. Next, it sends the backup routing path list to the corresponding area controllers. Each area controller calculates the intra-area routing path and installs new flow entries for host A and host B with higher priority on the switches in the routing path list. Finally, the data flow from host A to host B is rerouted. The hierarchical fast reroute method only considers singlelink failures, because the backup paths are only used for a short time and multiple failures protection costs too much for the control plane. However, in virtualized network scenarios, a single physical link failure may result in multiple logical link failures. There are already some solutions to this problem. SRLG [17] provides a fast reroute bypass that minimizes the probability of fate sharing with the main working path. Xu et al. [18] introduce a fast and efficient trap avoidance algorithm for a layered network.
Fig. 11. Theoretical evaluation between the plain routing and AHRM.
V. I MPLEMENTATION AND E VALUATION In this section, we present the implementation of Orion, and evaluate it both theoretically and experimentally. A. Theoretical Evaluation 1) Computation Complexity: In order to distinguish among different algorithms, we name the Dijkstra algorithm ‘plain routing’. We write a simple single-threaded plain routing algorithm to calculate the path from the source address to the destination address. The algorithm is running under a random topology. With N nodes in the topology, we estimate the number of edges as N x , where x depends on the topology. We run the algorithm on a server with Intel E5645 processors (6 cores in total, 2.40 GHz) and 64 GB memory. We compare the proposed abstracted hierarchical routing method with the plain routing algorithm. The computational complexity of the abstract hierarchical routing method is O(K 2 ), where K is the number of edge switches. In this experiment, we assume that the number of edge switches accounts for 10%, 20%, and 50% of all switches in the domain. In this experiment, we choose x = 1.25 and x = 1.15. The total number of switches N ranges from 0 to 3000 and the number of switches in an area M is 100. The number of
Fig. 12. BAPRA’s path stretch CDF with different number of areas under 2-layer control plane.
switches used in this experiment is referenced the statistic result introduced by Sherry et al. [19]. From Fig. 11, we can observe that with the increasing number of areas, the computing time of Orion increases slowly, much better than the plain routing algorithm. 2) Path Stretch: We write a basic abstract partitioning routing algorithm (BAPRA) to simulate the most commonly used partitioning routing algorithm used in a centralized logical hierarchical control plane. Here, all controllers are organized as a tree; the root controller does not know the detailed inner link situation in the lower controllers, only having the lower controllers abstracted network view. The root controller calculates the inter-area shortest routing path based on Dijkstra’s algorithm, while each lower controller also employs Dijkstra’s algorithm to calculate the intra-area shortest path from the ingress switch to the egress switch. We calculate the stretch of every path based on a random topology and obtain the path stretch between any two nodes in the random topology based on BAPRA. We carry out three experiments with a 2-layer control plane and a 3-layer control plane.
128
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
Fig. 13. BAPRA’s path stretch CDF with different number of switches under 2-layer control plane.
TABLE IV AVERAGE PATH S TRETCH OF BAPRA
Fig. 14. BAPRA’s path stretch CDF under 3-layer control plane.
controller has two children in the bottom layer, each of which connects 200 switches. The random degree of the switch in each area is selected with mean 22. From Fig. 14, we can observe that the path stretch increases with number of middle controllers (layer 2). Further, we compare the BAPRA and AHRM algorithms in terms of biggest path hops with 3, 4, and 5 middle controllers. The biggest hop values for BAPRA are 19, 26, and 32, whereas AHRM gives a reduction to 5, 5, 5. From the results, we can see that the AHRM algorithm can effectively reduce the path stretch problem. B. Experiments Evaluation
The results for the 2-layer control plane are shown in Figs. 12 and 13. In Fig. 12, we compare the cumulative distribution function of the path stretch with the different number of areas. In this example, there are 100 switches in each area, and the random degree of the switch in each area is selected with mean 22. The intra-domain switch degree is referenced by the statistics of CAIDA [20]. The number of links in the random topology is the average degree of the switch multiplied by the number of switch, divided by 2. From Fig. 12, we can observe that the path stretch increases with the number of areas in the topology. In Fig. 13, we compare the cumulative distribution function of the path stretch with different number of switches in each area. In this example, there are 9 areas, and the random degree of the switch in each area is selected around 22. Fig. 13 shows that the path stretch increases with the number of switches in the area. In Table IV, we give the average path stretch results computed with different parameters. The path stretch increases with the number of areas and with a fixed number of areas (9) the path stretch increases with the number of switches. Secondly, we experiment on a 3-layer controller plane, with one root controller in the top layer (layer 3). The number of the middle layer controllers ranges from 3 to 5. Each middle
In this part, we used Java build a prototype system to verify the feasibility and effectiveness of Orion. Substantial parts of the area controller (the OpenFlow Base module, the intra-area part of the Link Discovery module and Storage module) were built using the Floodlight controller [21], which uses OpenFlow. However, the domain controller does not use OpenFlow to communicate with the area controllers, which reduces the communication cost. We run Orion on a server with two QuadCore Intel E5-2650 processors (16 cores in total, 2.00 GHz) and 128 GB memory. Through virtualization, the whole server is divided into multiple virtual controllers. Every controller in Orion has 2 cores and 8 GB memory. We use Mininet [23] to simulate the data plane. In the first experiment, we tested Orion’s single area controller’s flow set-up rate. In this test, we simulated 200 switches in an area with the random network topology. The single area controller can handle 8114 new flows per second. Fernandez et al. [22] tested the performance of floodlight controller with 200 switches in an area (simulated by Mininet). The floodlight controller in their test can handle about 7500 new flows per second. Compared with their test result, the flow set-up rate of Orion’s area controller is a little better than their test result. In the second experiment, we tested the performance of Orion with multiple controllers. Because the data plane simulation software Mininet adopts real kernel and switch to create
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
129
Fig. 16. The average delay time of Orion. Fig. 15. Flow set-up rate of Orion.
virtual network, it costs a lot of resources. Due to the hardware and bandwidth limitations, we only started two domain controllers and six area controllers. In each area, we simulated 120 switches with the random network topology. We increased the number of areas to test the flow set-up rate. The multiple controllers test results are shown in Fig. 15. Further, we calculated the average flow set-up rate of each area controller and 6 Ri /6. Then, we got the the average flow set-up rate AvgR = i=1 6
(Ri −AvgR)2
standard deviation Deviation = = 161.8. From 6 the result we can observe that as the basic flow set-up rate is 8126, when the network size scales, the deviation of the set-up rate of each controller is very small. Thus, we can see that with the increasing number of areas, the average flow set-up rate of the control plane is stable, and the overall flow set-up rate of Orion is scalable. In the third experiment, we tested the delay between areas. The number of switches in each area (M) ranged from 20 to 120, the number of domain controllers from 1 to 2 and the number of areas from 2 to 6. From Fig. 16, we can see that the delay time gradually increases with the increasing number of areas. In the fourth experiment, we tested the fast reroute time when a link failure happens in an intra-area link. As the open source Floodlight controller does not have an existing fast reroute function, we extended it with an Area Fast Reroute sub-module to support intra-area single-link failures. When an intra-area link fails, the area controller checks the backup path for the failed link and installs new flow entries with higher priority on the new switches in the routing path list. In this experiment, there are 50 switches in the area. When there are 3 hops from the source host to the destination host, it costs 8.6 ms to reroute the data flow. In the fifth experiment, we tested the fast reroute time when link failure happens in an inter-area link. In order to support inter-area single link failures, we added a Domain Fast Reroute sub-module in the domain controller. In this experiment, there i=1
are three areas, each area has 50 switches. When an inter-area link failure happens, the area controller sends a message to the domain controller to calculate the inter-domain fast reroute path. We monitored the reaction time from the area controller sending the message to the domain controller to the area controller receiving the fast reroute result, which cost 1.26 ms. Then the three area controllers install flow entries on the switches in their area. The total rerouting time cost is 31.5 ms to forward the original data flow across three areas. Kempf et al. [24] proposes a scalable fault management method, in which the failure protection time between two ISP’s edge switches is 28 ms. Because Orion’s architecture is a hierarchical architecture, there exists transfer delay between different control plane layers. Though the inter-area fast reroute time of Orion is a little slower than 28 ms, we think our fast reroute result is acceptable. VI. D ISCUSSION ON D EPLOYMENT The goal of Orion is to be used in large-scale intra-domain WAN networks, in which the whole network is split into areas according to the ISP’s needs. Generally, the number of switches in a network is given. If deploying Orion on a WAN network, reducing the propagation latency becomes critically important. Currently, there are several solutions about where to place the controller and how many controllers are needed. Heller et al. [25] design a controller placement solution to minimize propagation delays; Hock et al. [26] present a resilient Pareto-based optimal controller-placement method. Lange et al. [27] presents a framework for Pareto-based optimal Controller placement with respect to different performance metrics. Our next step work is to deploy Orion on the China Education and Research Network (CERNET), a large-scale intra-domain network across 36 cities around the country that is managed by a single ISP. The geographic range is about 5500 kilometers north-south and 5000 kilometers east-west. CERNET has 5 super core nodes, and 31 independent core nodes connected to them. We plan to deploy the domain controllers on the super core nodes and deploy the area controllers on the other nodes. As the 31 independent core nodes are adjacent to multiple super
130
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 12, NO. 2, JUNE 2015
core nodes, the connected domain controller can be chosen according to the propagation latencies.
VII. R ELATED W ORK Many researchers have addressed the scalability issue of SDN. Maestro exploits parallelism in a single-threaded controller [28]. Beacon employs multi-threaded techniques to improve the scalability of a single controller [29]. However, neither of these supports communication among multiple controllers. DevoFlow considers that the SDN controller handles too many micro-flows, which creates excessive load on the controller and switches [4]. Then it proposes a way for the control plane to maintain a useful amount of visibility without imposing unnecessary costs. DIFANE employs authority switches to store necessary rules to share the work load of the control plane [30]. However, both DevoFlow and DIFANE require modifications to the OpenFlow switch. Some researchers design different control plane structures to extend the control plane’s processing ability. On the one hand, some studies construct a flat control plane architecture: HyperFlow [3] presents a distributed event-based control plane for OpenFlow; Onix develops a distributed system, which runs on a cluster of one or more physical servers [6]; ONOS is an experimental open source distributed SDN OS providing scale-out SDN control plane [7]. However, none of these solves the problem of computational complexity scalability. On the other hand, some studies, such as Kandoo and Logical xBar, build a centralized hierarchical control plane for SDN. Kandoo builds a two layer hierarchical architecture, in which the bottom layer controllers run local control applications with a local network view, and the top layer controllers run global applications with a global networkwide view [8]. Further, Logical xBar introduces a recursive building block to construct logical hierarchical SDN networks [5]. ElastiCon proposes an elastic architecture in which the controller pool is dynamically grown or shrunk according to given threshold values [31]. However, neither Kandoo nor ElasticCon solve the scalability problem, while Logical xBar reduce the scalability problem without addressing path stretch. Ahmed et al. [32] propose a mechanism to aggregate network topologies and to compute the route between pair of nodes in a hierarchical SDN-based architecture. Unlike their proposal, the domain controller in this paper can calculate the global shortest path whether the number of inter-area links is single or multiple.
VIII. C ONCLUSION In this paper, we design and implement Orion, a hybrid hierarchical control plane for large-scale networks. Orion can effectively reduce the computational complexity of a flowbased SDN control plane by several orders of magnitude, and solve the path stretch problem brought by the logical hierarchical control plane architecture. Further, we evaluate the effectiveness of Orion theoretically and experimentally. Our results show the efficiency and feasibility of Orion.
ACKNOWLEDGMENT We would like to thank Dr. John Davy of Peking University Health Science Center for his help in improving the English. R EFERENCES [1] ONF White Paper, Software-Defined Networking: The New Norm for Networks, Open Networking Foundation, Palo Alto, CA, USA, 2012. [2] N. McKeown et al., “OpenFlow: Enabling innovation in campus networks,” ACM SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, pp. 69–74, Apr. 2008. [3] A. Tootoocian and Y. Ganjali, “HyperFlow: A distributed control plane for OpenFlow,” in Proc. ACM INM/WREN, 2010, pp. 1–6. [4] A. R. Curtis et al., “DevoFlow: Scaling flow management for highperformance networks,” in Proc. ACM SIGCOMM, 2011, pp. 254–265. [5] J. McCauley, A. Panda, M. Casado, T. Koponen, and S. Shenker, “Extending SDN to large-scale networks,” in Proc. ONS, 2013, pp. 1–2. [6] T. Koponen et al., “Onix: A distributed control platform for large-scale production networks,” in Proc. OSDI, 2010, pp. 1–6. [7] B. Lantz et al., “ONOS: Towards an open, distributed SDN OS,” in Proc. ACM SIGCOMM HotSDN, 2014, pp. 1–6. [8] S. H. Yeganeh and Y. Ganjali, “Kandoo: A framework for efficient and scalable offloading of control applications,” in Proc. ACM SIGCOMM HotSDN, 2012, pp. 19–24. [9] OpenFlow Specification v1.2, ONF, Palo Alto, CA, USA, Dec. 2011. [10] E. W. Dijkstra, “Dijkstra: A note on two problems in connexion with graphs,” Numer. Math., vol. 1, no. 1, pp. 269–271, 1959. [11] IS-IS protocol specification (IETF), RFC 1142. [Online]. Available: http:// tools.ietf.org/html/rfc1142 [12] OSPF Version2, RFC 2328. [Online]. Available: https://tools.ietf.org/ html/rfc2328 [13] M. Shand and S. Bryant, “IP fast reroute framework,” RFC 5714, 2010. [14] M. Reitblatt, M. Canini, A. Guha, and N. Foster, “FatTire: Declarative fault tolerance for software-defined networks,” in Proc. ACM SIGCOMM HotSDN, 2013, pp. 109–114. [15] S. Kamamura, D. Shimazaki, A. Hiramatsu, and H. Nakazato, “Autonomous IP fast rerouting with compressed backup flow entries using OpenFlow,” IEICE Trans. Inf. Syst., vol. E96-D, no. 2, pp. 184–192, Feb. 2013. [16] M. Borokhovich, L. Schiff, and S. Schmid, “Provable data plane connectivity with local fast failover: Introducing OpenFlow graph algorithms,” in Proc. ACM SIGCOMM HotSDN, 2014, pp. 121–126. [17] Shared Risk Link Groups Encoding and Processing (SRLG). [Online]. Available: http://tools.ietf.org/html/draft-papadimitriou-ccamp-srlgprocessing-01 [18] D. H. Xu, Y. Z. Xiong, C. M. Qiao, and G. Z. Li, “Failure protection in layered networks with shared risk link groups,” IEEE Netw., vol. 18, no. 3, pp. 36–41, May/Jun. 2004. [19] J. Sherry, S. Hasan, and C. Scott, “Making middleboxes someone else’s problem: Network processing as a cloud service,” in Proc. ACM SIGCOMM HotSDN, 2012, pp. 13–24. [20] CAIDA. [Online]. Available: http://www.caida.org/research/topology/ generator/ [21] Floodlight. [Online]. Available: http://www.projectfloodlight.org/ floodlight/ [22] M. P. Fernandez, “Evaluating OpenFlow controller paradigms,” in Proc. 12th ICN, 2013, pp. 151–157. [23] B. Lantz, B. Heller, and N. McKeown, “A network in a laptop: Rapid prototyping for software-defined networks,” in Proc. ACM SIGCOMM HotNets workshop, 2010, pp. 1–6. [24] J. Kempf et al., “Scalable fault management for OpenFlow,” in Proc. IEEE ICC, 2012, pp. 6606–6610. [25] B. Heller, R. Sherwood, and N. McKeown, “The controller placement problem,” in Proc. ACM SIGCOMM HotSDN, 2012, pp. 7–12. [26] D. Hock et al., “Pareto-optimal resilient controller placement in SDN-based core networks,” in Proc. ITC, 2013, pp. 1–9. [27] S. Lange et al., “Heuristic approaches to the controller placement problem in large scale SDN networks,” IEEE Trans. Netw. Serv. Manage., vol. 12, no. 1, pp. 4–17, Mar. 2015. [28] Z. Cai, A. L. Cox, and T. S. E. Ng, “Maestro: A system for scalable OpenFlow control,” Rice Univ., Houston, TX, USA, Tech. Rep., 2010. [29] D. Erickson, “The beacon OpenFlow controller,” in Proc. ACM SIGCOMM HotSDN, 2013, pp. 13–18. [30] M. Yu, J. Rexford, M. J. Freedman, and J. Wang, “Scalable flow-based networking with DIFANE,” in Proc. ACM SIGCOMM, 2010, pp. 351–362.
FU et al.: HYBRID HIERARCHICAL CONTROL PLANE FOR FLOW-BASED LARGE-SCALE SDNs
[31] A. Dixit, F. Hao, S. Mukherjee, T. V. Lakshman, and R. Kompella, “Towards an elastic distributed SDN controller,” in Proc. ACM SIGCOMM HotSDN, 2013, pp. 7–12. [32] R. Ahmed and R. Boutaba, “Design considerations for managing wide area software defined networks,” IEEE Commun. Mag., vol. 52, no. 7, pp. 116–123, Jul. 2014.
Yonghong Fu received the M.S. degree in computer science from Harbin Engineering University, Harbin, China. She is currently a Ph.D. candidate in computer science with Tsinghua University. Currently, she is doing research in the Network Architecture Laboratory, Institute for Network Sciences and Cyberspace. Her research is focused on the scalability of software-defined network, network architecture design, distributed system, routing scalability, queuing theory, and system modeling.
Jun Bi (M’00–SM’14) received the B.S., M.S., and Ph.D. degrees in computer science from Tsinghua University, Beijing, China. He was a Research Scientist with the Communications Sciences Division and the Advanced Communications Technology Center, Bell Laboratories, USA. Currently, he is a Full Professor with and the Director of the Network Architecture Research Division, Institute for Network Sciences and Cyberspace, Tsinghua University, and a Key Member of Tsinghua National Laboratory for Information Science and Technology (TNList). He has successfully lead many government-supported or international collaboration research projects and published more than 100 research papers and 20 Internet RFCs or drafts (four of them were approved). His research interests include Internet architecture and protocols, future Internet (SDN and NDN), Internet routing, and source address validation and traceback. He is a Senior Member of the ACM and a Distinguished Member of China Computer Federation. He is a Cochair of the AsiaFI Steering Group and a Cofounder of the China SDN Commission and serves as the Executive Chair. He served as a Cochair of workshops/tracks at INFOCOM, ICNP, Mobihoc, ICCCN, etc., and served on the organization committees or technical program committees at SIGCOMM, ICNP, CoNEXT, SOSR/HotSDN, etc. He was a recipient of national science and technology advancement prizes.
Ze Chen received the B.S. degree in computer science from Tsinghua University, Beijing, China, where he is currently working toward the master’s degree with the Institute for Network Sciences and Cyberspace. His research is focused on the scalability of software-defined network, distributed system, routing scalability, and network programmability.
131
Kai Gao received the B.S. degree in computer science from Tsinghua University, Beijing, China, where he is currently working toward the Ph.D. degree with the Institute for Network Sciences and Cyberspace. He is interested in several topics in the field of software-defined networking such as automated management and virtualization, most of which are focused on improving the simplicity, flexibility, and performance of the network.
Baobao Zhang is currently a Ph.D. candidate with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. His research interests include network architectures, traffic class routing, 2-D routing, traffic engineering, routing scalability, failure recovery, source address validation, Internet measurement, and large-scale network addressing.
Guangxu Chen is currently working toward the B.S. degree in computer science with Beihang University, Beijing, China. This work was carried out during his internship with Tsinghua University. His research interests include software-defined networking scalability, distributed system, traffic engineering, routing scalability, network programmability, and WLANbased location system.
Jianping Wu (F’12) received the B.S., M.S., and Ph.D. degrees from Tsinghua University, Beijing, China. He is currently a Full Professor with and the Director of Network Research Center and a Ph.D. Supervisor with the Department of Computer Science and Technology, Tsinghua University. Since 1994, he has been in charge of China Education and Research Network (CERNET). His research interests include next-generation Internet, IPv6 deployment and technologies, and Internet protocol design and engineering.