Overview and Design of VXLAN
Engr. Syed Measum Haider Naqvi (
[email protected]) Senior Network Design Engineer, (SCADA and Telecommunication Section) Power & Mechanical Division, NESPAK (Pvt.) Limited, Lahore
Abstract Server virtualization has placed increased demands on the physical network infrastructure. A physical server now has multiple virtual Machines (VMs) each with its own Media access control (MAC) address. This requires larger MAC address tables in the switched Ethernet network due to potential attachment of and communication among hundreds of thousands of VMs. Data centers are often required to host multiple tenants, each with their own isolated network domain. Since it is not economical to realize this with dedicated infrastructure, network administrators opt to implement isolation over a shared network. In such scenarios, a common problem is that each tenant may independently assign MAC addresses and VLAN IDs leading to potential duplication of these on the physical network. What is VXLAN? VXLAN is designed to provide the same Ethernet Layer 2 network services as VLAN. Compared to VLAN offers the following benefits. 1. Flexible placement of multitenant segments throughout the data center 2. Higher scalability to address more Layer 2 segments 3. Better utilization of available network paths in the underlying infrastructure 1.
Flexible placement of multitenant segments throughout the data center:
It provides a solution to extend Layer 2 segments over the underlying shared network infrastructure so that tenant workload can be placed across physical pods in the data center. 2.
Higher scalability to address more Layer 2 segments:
VLAN use a 12‐bit VLAN ID to address Layer 2 segments which results in limiting scalability of only 4094 VLANs. VLAN uses a 24‐bit segment ID known as the VXLAN network identifier (VNID). Which enables up to 16 million VXLAN segments to coexist in the same administrative domain. 3.
Better utilization of available network paths in the underlying infrastructure:
VLAN uses the spanning tree protocol for loop prevention. Which ends up not using half of the network links in a network by blocking redundant paths. In contrast, VXLAN packets are transferred through the underlying network based on its Layer 3 header and can take complete advantage of layer 3 routing. Equal‐cost multipath (ECMP) routing and link aggregation protocols to use all available paths. How Does VXLAN Work? VXLAN is often described as an overlay technology because it allows you to stretch Layer 2 connections over an intervening Layer 3 network by encapsulating (tunneling) Ethernet frames in a VXLAN packet that includes IP addresses. Devices that support VXLANs are called virtual tunnel endpoints (VTEPs)— they can be end hosts or network switches or routers. VTEPs encapsulate VXLAN traffic and de‐
encapsulate that traffic when it leaves the VXLAN tunnel. To encapsulate an Ethernet frame, VTEPs add a number of fields, including the following: 1. 2. 3. 4. 5.
Outer MAC destination address (MAC address of the tunnel endpoint VTEP) Outer MAC source address (MAC address of the tunnel source VTEP) Outer IP destination address (IP address of the tunnel endpoint VTEP) Outer IP source address (IP address of the tunnel source VTEP) Outer UDP header
VXLAN Encapsulation and packet format VXLAN Header: This is an 8‐byte field that has:‐ Flags (8 bits): where the I flag MUST be set to 1 for a valid VXLAN Network ID (VNI). The other 7 bits (designated "R") are reserved fields and MUST be set to zero on transmission and ignored on receipt. VXLAN Segment ID/VXLAN Network Identifier (VNI): this is a 24‐bit value used to designate the individual VXLAN overlay network on which the communicating VMs are situated. VMs in different VXLAN overlay networks cannot communicate with each other. Reserved fields (24 bits and 8 bits): MUST be set to zero on transmission and ignored on receipt. Outer UDP Header: This is the outer UDP header with a source port provided by the VTEP and the destination port being a well‐known UDP port. Destination Port: IANA has assigned the value 4789 for the VXLAN UDP port, and this value SHOULD be used by default as the destination UDP port. Some early implementations of VXLAN have used other values for the destination port. To enable interoperability with these implementations, the destination port SHOULD be configurable. Source Port: It is recommended that the UDP source port number be calculated using a hash of fields from the inner packet ‐‐one example being a hash of the inner Ethernet frame’s headers. This is to enable a level of entropy for the ECMP/load balancing of the VM‐to‐VM traffic across the VXLAN overlay. When calculating the UDP source port number in this manner, it is RECOMMENDED that the value be in the dynamic/private port range 49152‐65535. UDP Checksum: It SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation. Optionally, if the encapsulating end point includes a non‐ zero UDP checksum, it MUST be correctly calculated across the entire packet including the IP header, UDP header, VXLAN header, and encapsulated MAC frame. When a decapsulating end point receives a packet with a non‐zero checksum, it MAY choose to verify the checksum value. If it chooses to perform such verification, and the verification fails, the packet MUST be dropped. If the decapsulating destination chooses not to perform the verification, or performs it successfully, the packet MUST be accepted for decapsulation. Outer IP Header: This is the outer IP header with the source IP address indicating the IP address of the VTEP over which the communicating VM (as represented by the inner source MAC address) is running. The destination IP address can be a unicast or multicast IP address (see Sections 4.1 and 4.2). When it is a unicast IP address, it represents the IP address of the VTEP connecting the communicating VM as represented by the inner destination MAC address.
VXLAN tunnels Layer 2 network over Layer 3 network. The VXLAN packet format is shown in this Figure.
VXLAN Tunnel Endpoint VXLAN uses VXLAN tunnel endpoint (VTEP) devices to map tenants end devices to VXLAN segments and to perform VXLAN encapsulation and de‐encapsulation. Each VTEP function has two interfaces. The IP interface has a unique IP address that identifies the VTEP device on the transport IP network known as the infrastructure VLAN. The VTEP device uses this IP address to encapsulate Ethernet frames and transmits the encapsulated packets to the transport network through the IP interface. A VTEP device also discovers the remote VTEPs for its VXLAN segments and learns remote MAC Address‐to‐ VTEP mappings through its IP interface. The functional components of VTEPs and the logical topology that is created for Layer 2 connectivity across the transport IP network is shown in this Figure.
VXLAN Packet Forwarding Flow VXLAN uses stateless tunnels between VTEPs to transmit traffic of the overlay Layer 2 network through the Layer 3 transport network. An example of a VXLAN packet forwarding flow is shown in this Figure.
In this Figure, Host‐A and Host‐B in VXLAN segment 10 communicate with each other through the VXLAN tunnel between VTEP‐1 and VTEP‐2. This example assumes that address learning has been done on both sides, and corresponding MAC‐to‐VTEP mappings exist on both VTEPs. When Host‐A sends traffic to Host‐B, it forms Ethernet frames with MAC‐B address of Host‐B as the destination MAC address and sends them out to VTEP‐1. VTEP‐1, with a mapping of MAC‐B to VTEP‐2 in its mapping table, performs VXLAN encapsulation on the packets by adding VXLAN, UDP, and outer IP address header to it. In the outer IP address header, the source IP address is the IP address of VTEP‐1, and the destination IP address is the IP address of VTEP‐2. VTEP‐1 then performs an IP address lookup for the IP address of VTEP‐2 to resolve the next hop in the transit network and subsequently uses the MAC address of the next‐hop device to further encapsulate the packets in an Ethernet frame to send to the next‐hop device. The packets are routed toward VTEP‐2 through the transport network based on their outer IP address header, which has the IP address of VTEP‐2 as the destination address. After VTEP‐2 receives the packets, it strips off the outer Ethernet, IP, UDP, and VXLAN headers, and forwards the packets to Host‐ B, based on the original destination MAC address in the Ethernet frame. References: [RFC7348] Mallik Mahalingam,"Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August 2014. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [802.1aq] IEEE, "Standard for Local and metropolitan area networks ‐‐Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks ‐‐ Amendment 20: Shortest Path Bridging", IEEE P802.1aq‐2012, 2012. [802.1D] IEEE, "Draft Standard for Local and Metropolitan Area Networks/ Media Access Control (MAC) Bridges", IEEE P802.1D‐2004, 2004. [802.1X] IEEE, "IEEE Standard for Local and metropolitan area networks ‐‐ Port‐Based Network Acces Control", IEEE Std 802.1X‐2010, February 2010. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990.
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC4541] Christensen, M., Kimball, K., and F. Solensky, "Considerations for Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping Switches", RFC 4541, May 2006. [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, "Protocol Independent Multicast ‐ Sparse Mode (PIM‐SM): Protocol Specification (Revised)", RFC 4601, August 2006. [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, "Bidirectional Protocol Independent Multicast (BIDIR‐PIM)", RFC 5015, October 2007. [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. Ghanwani, "Routing Bridges (RBridges): Base Protocol Specification", RFC 6325, July 2011. [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. Cheshire, "Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service Name and Transport Protocol Port Number Registry", BCP 165, RFC 6335, August 2011. https://www.juniper.net/techpubs/en_US/junos14.1/topics/topic‐map/sdn‐vxlan.html http://www.cisco.com/c/en/us/products/collateral/switches/nexus‐9000‐series‐switches/white‐paper‐ c11‐729383.html