DOS - A Scalable Optical Switch for Datacenters - CiteSeerX

0 downloads 0 Views 1MB Size Report
ABSTRACT. This paper discusses the architecture and performance studies of. Datacenter Optical Switch (DOS) designed for scalable and high- throughput ...
DOS - A Scalable Optical Switch for Datacenters Xiaohui Ye [email protected] Paul Mejia [email protected]

Yawei Yin [email protected] Roberto Proietti [email protected]

S. J. B. Yoo [email protected] Venkatesh Akella [email protected]

Department of Electrical and Computer Engineering University of California, Davis Davis, California, 95616 USA

ABSTRACT This paper discusses the architecture and performance studies of Datacenter Optical Switch (DOS) designed for scalable and highthroughput interconnections within a data center. DOS exploits wavelength routing characteristics of a switch fabric based on an Arrayed Waveguide Grating Router (AWGR) that allows contention resolution in the wavelength domain. Simulation results indicate that DOS exhibits lower latency and higher throughput even at high input loads compared with electronic switches or previously proposed optical switch architectures such as OSMOSIS [4, 5] and Data Vortex [6, 7]. Such characteristics, together with very high port count on a single switch fabric make DOS attractive for data center applications where the traffic patterns are known to be bursty with high temporary peaks [13]. DOS exploits the unique characteristics of the AWGR fabric to reduce the delay and complexity of arbitration. We present a detailed analysis of DOS using a cycle-accurate network simulator. The results show that the latency of DOS is almost independent of the number of input ports and does not saturate even at very high (approx 90%) input load. Furthermore, we show that even with 2 to 4 wavelengths, the performance of DOS is significantly better than an electrical switch network based on state-of-the-art flattened butterfly topology.

Categories and Subject Descriptors C.2.6 [Internetworking]: Routers; C.2.1 [Network Architecture and Design]: Packet-switching networks;

General Terms Design, Performance, Management

Keywords Data Center Networks, AWGR, Low Latency Optical Switches

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ANCS’10, October 25-26, 2010, La Jolla, CA, USA. Copyright (c) 2010 ACM 978-1-4503-0379-8/10/10…$10.00.

1. INTRODUCTION Low latency, low power, scalable, and high-throughput interconnection is essential for future data centers. Data center switches based on electronic multistage inter-connection topologies (e.g. Fat-Tree, Clos, Torus, Flattened Butterfly [1, 2]) result in large latencies (due to the multi-hop nature of these networks) and very high power consumption in the buffers and the switch fabric. On the other hand, optical interconnects can benefit from the inherent parallelism and high capacity of Wavelength Division Multiplexing (WDM). Furthermore, multiple WDM channels on an output can be used as multiple concurrent channels to avoid head-of-line blocking [3], which results in lower latency and higher performance. Optical switching is being successfully deployed in traditional telecommunication networks. However, there are two key differences between telecom applications and data center applications. First, at least two orders of reduction in latency is required for data center applications (100’s of nanoseconds as opposed to 10’s or 100’s of microseconds). Second, data center switches need to connect many more nodes -- hundreds or thousands in large data centers -- than typical telecom switches. Recently there are some efforts in designing optical switching architecture by considering the particular challenges faced by the networks supporting data centers and high-performance computing. The Optical Shared Memory Supercomputer Interconnect System (OSMOSIS) [4, 5] and Data Vortex [6-8] are two pioneering projects in this area. DOS (Datacenter Optical Switch) is based on an all-optical switching fabric called Arrayed Waveguide Grating Router (AWGR) that has been proven in telecom applications to scale to petabit/second aggregate switching capacity [9-11]. The cyclic wavelength routing characteristic of the AWGR allows different inputs to reach the same output simultaneously by using different wavelengths. Non-blocking switching from any input port to any output port can be easily achieved by simply tuning the wavelength at each input. Basically AWGR is a fully-connected topology. AWGR-based switching fabric is power efficient as the signal is only delivered to the desired output port via the appropriate wavelength, instead of using a broadcast-and-select mechanism that accompanies excessive power splitting losses. Architecturally, the AWGR is a passive and loss-less optical interconnect element. The power consumed in tunable wavelength converters (TWC), the loopback shared buffer, and the control plane logic scales linearly with the number of ports, unlike other

switches. Finally, DOS uses label switching [9] with the optical label transmitted on a different wavelength. This allows the control plane to operate at a significantly lower clock frequency than the data rate, though this introduces some restrictions in the minimum packet size requirements, which are quite reasonable for a data center application. Section 6 describes this in more detail. The main contributions of this paper are as follows. The paper provides the details of the architecture of a switch designed for data center applications and evaluates its performance and scalability with cycle-accurate simulations. The paper compares the performance of DOS with other optical switches such as OSMOSIS and Data Vortex, and with an electrical switch based on flattened butterfly topology [12], which is being considered by Cray and other commercial vendors. We propose a simple arbitration scheme that takes advantage of the unique cyclic properties of the AWGR switch fabric that further enhances the scalability of DOS. The main findings of this paper are as follows: a)

The proposed optical switching is attractive for data center applications because of its low latency and scalable effective bandwidth even under the input load as high as 90%,

b)

DOS performs exceptionally well even with as few as 2 to 4 wavelengths per fiber, and

c)

DOS performs well on such bursty on-off traffic patterns recently described by Wisconsin and Microsoft [13].

The remainder of this paper is organized as follows. Section 2 first reviews the related work. Section 3 presents the details of the DOS architecture. Section 4 provides the details of an arbiter that takes advantage of AWGR to reduce complexity. Section 5 describes how DOS was modeled in GARNET and the simulation infrastructure. Section 6 describes the performance analysis of DOS and comparison with related work.

2. RELATED WORK Conventional data center networks are built in a hierarchical manner, with a large number of cheap, low-speed, small-radix switches at the bottom level to connect with end nodes, and a few expensive, powerful, large-radix switches residing at the top to aggregate and distribute the traffic [14]. Recently, the network architects have adopted fat tree and CLOS topologies to provide high aggregate bandwidth. For example, Al-Fares [1] uses commodity off-the-shelf Ethernet switches while InfiniBand [15] utilizes special switches based on the InfiniBand protocol. Power consumption, latency, and throughput under high input loads are the key challenges with electrical switches. Farrington [16] suggests placing MEMS based optical circuit switches in parallel with electrical switches in the core network to carry slow changed inter-pod traffic, thus reducing power consumption and cost. Nevertheless, their design does not address the challenges of latency and throughput under high input loads. DOS employs optical switching with wavelength domain parallelism and a passive AWGR based switching fabric to overcome these challenges. A simple 5-port optical switch has also been adopted in an on-chip optical mesh network recently [17], wavelength parallelism is used to realize high bandwidth transmission, so that control bit and data bit can be transmitted from one core to

another in one clock cycle; while in our work, wavelength parallelism is utilized to boost switching fabric performance. As introduced before, Data Vortex and OSMOSIS represent the state-of-the-art in terms of optical switching in high performance computing applications. OSMOSIS utilizes semiconductor optical amplifiers (SOAs) to realize a synchronous optical crossbar switching fabric by using a broadcast-and-select data path combined with both space- and wavelength- division multiplexing [4, 5]. The OSMOSIS demonstrator switch has 8 broadcast units, each with 8 wavelengths on one fiber to connect with a total of 64 ingress adapters. Each wavelength on each fiber is duplicated and fed to 128 select units where SOAs are used to select the proper wavelength from the proper fiber according to the central scheduler's decision. Each of the 64 egress adapters connects with two select units, thus enabling the egress adapter to receive up to two concurrent cells. To reduce control latency, OSMOSIS allows packet transmission without getting a grant when the traffic load is light by using a speculative transmission (STX) scheme. The Data Vortex is a distributed interconnection network architecture [6, 7] based on deflection routing. Its structure can be visualized as cyclic subgroups that allow for deflections without loss of routing progress. By leveraging optical parallelism and avoiding optical buffer based on deflection routing, the data vortex architecture can achieve high aggregation bandwidth, low latency per hop due to short packet slot time and high potential switching capacity due to transparent wavelength switching. While OSMOSIS and Data Vortex provide significant improvements in capacity and latency compared with electrical switches, they both possess some drawbacks that cannot be avoided due to the architectures that they adopt. The power requirements of OSMOSIS switches can be very high because of its broadcast-and-select architecture -- signals are delivered to every select unit even though, ultimately, only one unit selects the signal. 16 SOAs in each select unit also consume power. In addition, STX can only effectively eliminate control path latency when the traffic load is light. Data Vortex also has some limitations. The banyan-style hierarchical multiple-stage structure becomes extremely complex when scaled to larger network sizes. As the number of nodes increase, packet reordering becomes more frequent due to the non-deterministic nature of the paths traversed by packets propagating through the data vortex networks, and the end-to-end latency of each packet becomes large and non-deterministic. It was observed that the system saturates before the offered load exceeds 50% [7]. Furthermore, with respect to the physical layer scalability [8], the optical gain saturation of the SOA limits the maximum number of payload channels, and the cascaded components’ spectrum profiles will limit the functional bandwidth achieved. Section 6 describes the performance comparison of DOS with both OSMOSIS and Data Vertex. The simulation and analysis in this paper show that DOS achieves higher throughput and lower latency than both Data Vortex and OSMOSIS. The AWGR based optical switches and optical routers with packet switching capability have been investigated for a number of years [9,18, 19]. Previous works [3, 9, 20-24] mainly focus on applying AWGR techniques in access networks, and telecommunication / IP networks. AWGR serves as the switching fabric in many

Figure 1. The system diagram of the proposed optical switch, OLG: Optical Label Generator; PE: Packet Encapsulation; LE: Label Extractor; FDL: Fiber Delay Line; PFC: packet Format Converter; O/E: Optical-to-Electrical converter; E/O: Electrical-toOptical Converter; TX: Transmitter; RX: Receiver;L(i): Label from Node i. switch architecture designs. Since there is no practical optical buffer available today, store-and-forwarding scheme, which is commonly used in the electrical switch, cannot be duplicated in the optical domain. A simple switch structure [20, 21] is adopted with no centralized control at the center switch; either a TDM based MAC protocol [21] or the input side access control [20] is used to resolve the contention. Many other designs require a centralized control to negotiate resources among multiple requests [3, 9, 22, 23]. Fiber delay line (FDL) is widely used to handle burst traffic [3, 9, 22] and provide priority routing [24]. Researchers have investigated putting a set of delay lines with different lengths in either forwarding paths, loopback paths, or both. The authors in [22] further proposed to add the AWGR inside the FDL loopback path thus providing another layer of loopback and delay. Using the FDL helps reduce the dropping probability significantly, but packet loss is still possible and cannot be eliminated. In addition, the FDL cannot provide arbitrary delay. It is possible the resource is available but the delayed packet cannot access it since it is still traveling through the FDL. Deflection routing [7, 25-27] is another way to handle contention in optical burst and packet switching networks. The contended packets will be sent to the alternative next hop if the desired next hop is not available. Usually an FDL based buffer is needed in optical burst switching in order to adjust the time difference between the label and the burst according to the new path [27]. Many works [7, 25, 26] have shown that deflection routing will help lower the end-to-end latency when the traffic load is light but will degrade the entire system performance under heavy load. Therefore, deflection routing will not be considered in DOS design.

3. DOS - ARCHITECTURE OVERVIEW Figure 1 presents a high-level overview of the DOS architecture. At the core of the switch are an optical switching fabric that includes tunable wavelength converters (TWC), a uniform loss and cyclic frequency (ULCF) AWGR, and a loopback shared

buffer system. In addition, the switching system includes a control block that processes the label of the packet and then arbitrates each packet by checking resource availability on the output port side. The Optical Channel Adapter (OCA) serves as the medium interface between DOS and each end node.

3.1 The AWGR Switching Fabric Early-stage optical switch architectures adopt optical-to-electricalto-optical (O/E/O) conversions and electrical switching fabric [28]. An all-optical switching fabric eliminates the overhead of the O/E/O conversions and keeps the payload in the optical domain for better optical transparency [9]. There are numerous all-optical switching fabric architectures [29], most of which can be divided into two categories: space switching and wavelength switching. Space switching can have a configuration of broadcastand-select [19, 28, 30] or a matrix of switching elements (1 x 2 switches [31], 2 x 2 switches [32], etc.). Wavelength switching utilizes optical wavelength converters, with an optical device that can support wavelength-to-space mapping [33-35]. The ULCF AWGR is a promising compact-sized candidate to achieve the wavelength-to-space mapping. The ULCF AWGR allows wavelength routed interconnection in a scalable manner and provides path-independent loss and cyclic-routing characteristics. The AWGR is able to route optical signals from any input AWGR port to any output AWGR port. The routing path for the signal inside the AWGR is determined by the wavelength that carries the signal. Since each output port interconnects with all input ports on separate and distinct wavelengths, the AWGR can easily achieve non-blocking switching by tuning the output wavelength of the TWC to an appropriate wavelength at each input, so that separate paths between inputs and their desired outputs are established. The AWGR can actually achieve concurrent contention-free optical switching, which is more than strictly non-blocking switching. Not only it can connect any idle output to an idle input,

but also allows any output to receive multiple concurrent signals that reside on separate and distinct wavelengths. By simply applying an optical DEMUX at the AWGR output, signals from different inputs can be separated and received independently. Ideally, an N × N AWGR is a switching fabric with speedup of N provided that a 1:N optical DEMUX and N receivers are available for each AWGR output. To reduce the cost, typically the speedup is less than N. The scalability of a single stage DOS depends on the scalability of AWGR and the capability of TWC. NTT researchers have demonstrated 400-port AWGR that utilizes 400 channels with 25 GHz channel spacing covering a wavelength range from 1530 nm to 1610 nm [36]. A 512 x 512 AWGR with the same channel spacing will be a little bit larger in size than the 400-port AWGR and can be fabricated on an 8-inch Si Wafer if it does not fit in a 6-inch Si wafer. Takada [37] shows an effective way to reduce the size of the wafer required in fabrication by folding the slab waveguides with respect to the surface of reflection. The optical path from any input to any output of a 512 x 512 AWGR will be less than 1 meter in length, so it takes less than 5 ns for signals to propagate through the device. To minimize the crosstalk and achieve any port to any port non-blocking switching, 512-port AWGR requires at least 512 channels that covers a wavelength range of 102.4 nm if assuming the channel spacing is 25 GHz. To be able to tune between 512 channels rapidly, TWC needs to accommodate a fast and wide-range tunable laser. A monolithic laser with 114 nm tuning range has been demonstrated in [38]. Matsuo [39] shows a tunable laser that can cover 34 channels with a switching latency less than 8 ns. An ultrafast interleaved rear reflector tunable laser with switching time of less than 2 ns has been reported in [40]. Instead of using a single wide-range fast tunable laser, an alternative way is to place multiple fast tunable lasers in parallel and each of them covers a relatively smaller range. In addition, power attenuation in the short distance data center network will be much smaller than long haul transmission and Erbium Doped Fiber Amplifier (EDFA) is not a must. Also, a lot of signal impairments that are considered in the long-haul transmission, e.g. dispersion and non-linear effect, can be considered as negligible in short distance optical networks. Therefore, DOS can use much wider range of wavelengths than telecommunication DWDM (Dense WDM) system. Clearly, 512port AWGR based optical switch is feasible based on those enabling technologies. Thus, we assume a single stage DOS can scale to 512 ports in this paper and we also assume the number of wavelengths used for DOS is equal or larger than the fabric port count.

3.2 DOS Control Plane For a N-by-N AWGR switching fabric, a concurrent contentionfree switching system can be realized if each OCA RX has a 1:N optical DEMUX and N receivers. To ensure every incoming packet goes to the desired output port, we simply need to first check the destination address and then set the proper wavelength at TWC, so that after wavelength conversion, the packet travels to the desired AWGR output port. This simple control function can be implemented in a fully distributed way. But if each OCA RX only has k (k

Suggest Documents