Switching for Hybrid Packet-OCS Intra-Data Center Networks ... Data-intensive applications such as cloud computing, steaming video and social networking ...
W3J.5.pdf
OFC 2016 © OSA 2016
Fine-Grained All-Optical Switching Based on Optical Time Slice Switching for Hybrid Packet-OCS Intra-Data Center Networks Yao Li1,2, Nan Hua1,2, Xiaoping Zheng1,2 1.
Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing 100084, China 2. Department of Electronic Engineering, Tsinghua University, Beijing, 100084, P. R. China {xpzheng,huan}@mail.tsinghua.edu.cn
Abstract: We propose a hybrid packet-OCS intra-datacenter network architecture with bufferless sub-wavelength all-optical time slice switching enabled by high-precision time synchronization. We experimentally demonstrate flex-capacity sub-wavelength circuit establishment among the ToR switches without buffers and contention. OCIS codes: (060.4250) Networks; (060.4253) Networks, circuit-switched
1. Introduction Data-intensive applications such as cloud computing, steaming video and social networking drive a dramatically rapid growth of data center networks (DCNs) in the past several years [1]. Conventional electrical packet switching technology creates bottlenecks in both bandwidth and latency [1], and its power-hungry devices also become a limitation of the scalability of DCNs. In order to provide high throughput, low latency and low power consumption intra-DCNs, the benefits of optical circuit switching (OCS) have been explored in many literatures [1-3], and hybrid packet-OCS intra-DCN architectures were also proposed [2,3]. The hybrid packet-OCS DCN offers the capability to handle large persistent data flows with unlimited bandwidth, thereby freeing up the packet network and removing bandwidth bottleneck. It also offers ultra-low latency, which is very important to latency-sensitive applications. Although the setup time of an optical circuit is usually tens of milliseconds in intra-DCNs, it is acceptable because most large flows persist for minutes or more, thus the OCS setup time becomes irrelevant. However, current fiber/λswitching-based OCS requires each lightpath to occupy the full bandwidth of a fiber/wavelength. Under such a constraint, the switching granularity is too coarse to match the size of the majority flows in data centers [2,4], leading to low bandwidth utilization. This paper presents, for the first time to our knowledge, a novel hybrid packet-OCS intra-DCN architecture with fine-grained bufferless optical time slice switching (OTSS) [5] enabled by high-precision time synchronization [6]. We also experimentally demonstrate successful flex-capacity all-optical sub-wavelength circuit establishment among the top of rack (ToR) switches without buffers and contention. 2. Network architecture Fig. 1(a) shows the proposed intra-DCN architecture. The ToR switches are divided into different pods. ToR switches in the same pod connect to their appropriative aggregate switches in each switching plane. In order to support flows with different characteristics, the switching planes are classified into three categories, namely electrical packet switching, optical fiber/λ switching and optical time slice switching, according to their switching Electrical Packet Switching Planes Optical Fiber/λ Switching Planes C
Optical Time Slice Switching Planes
C
E
A
A
C
… …
C
A
A
A
…
…
A
…
…
C
C A
A
A
Core A Switch
#3
A
T
T
T
T
…
T
T T
T
Pod 2
T T
Pod 3
T
T
T
T
T
T T
T
T
Switching Points (SPs)
Payload #2
#3
#1
#3
Output
#1
Guard Interval Output #2
OTSS Fabric
(b)
a variable-length time slice
(c)
Core Switch Tier Inter-Pod Flow
…
Aggregate Switch Tier
T
T
T
High-precision Sync. Time #3
…
T
T
T
T
#1
Intra-Pod Flow
T
T
T
T
T T
T
T
T
T
T T
Pod 4
Pod n-1
Pod n
T
T
T
T
Pod 1
T
Edge Clusters
#2
an OTSS frame
A
T
T
#1
A
A
Aggregate Switch
A
A
T
#2
switch at SPs Input
A
A
Input
C
…
Edge E Switch
C
C
C
C
E
Switch Controller
Control/Management Plane C
T
…
T
…
ToR Switch Tier
ToR Switch
(a)
Pod 1
Pod 2
Fig. 1: Diagram for intra-data center network with fine-grained all-optical switching: (a) Network architecture; (b) Concept of optical time slice switching (OTSS); (c) Intra-pod and inter-pod routing
W3J.5.pdf
OFC 2016 © OSA 2016
Tab. 1: Comparison of the representative switching paradigms
Switching type Granularity Delay-sensitive workloads Throughput-sensitive workloads Bandwidth utilization Transfer guarantee Power consumption Time Synchronization
Electrical Packet Switching Packet switching Packet Suitable Not suitable High Not guaranteed High Not required
Optical Fiber/λ Switching Circuit switching Coarse Not suitable Suitable Low Guaranteed Low Not required
Optical Time Slice Switching Circuit switching Fine Not suitable Suitable High Guaranteed Low Required
modes. The aggregate switches from different pods are connected with multiple core switches supporting the same switching paradigm, and they form the switching planes for inter-pod communication. In practical implementations, there should be two or more switching planes of each switching paradigm for redundancy and over-provisioning; while in this illustration, we only draw one for simplicity in Fig. 1(a). Core switches within the same switching plane are all connected with one or more edge switches for inter-data center communications. As mentioned above, workloads in data centers appear in various patterns [4], and can be supported by different switching paradigms according to their characteristics and requirements. Tab. 1 shows the comparison of the representative switching paradigms. Electrical packet switching is suitable for delay-sensitive flows with relatively small sizes, such as query, coordination and control state messages. On the other hand, optical fiber/λ switching is excellent at delay-insensitive bandwidth-hungry data transfer, for instance file backup and virtual machine migration. Optical time slice switching combines the advantages of capacity and energy consumption of optical fiber/λ switching and the flexibility of electrical packet switching, making it possible for optical networks to carry smallsize flows, which take a very large proportion of total intra-data center workloads [4], without sacrificing bandwidth utilization, and to reduce power consumption. 3. Optical time slice switching (OTSS) In our previous work [5], we proposed an all-optical sub-wavelength solution of OTSS. Fig. 1(b) illustrates the concept of OTSS. The optical transmission channels are organized into repetitive OTSS frames in time domain with a period of TFL . Each OTSS frame contains one or several variable-length time slice(s) for data transmission and each time slice occupies one time slot. When a time slice arrives, the switch controller sends control signals to the OTSS fabric at the precise time to direct the time slice to the expected output port. To guarantee high-precision timing, time synchronization of the OTSS nodes is required. Time synchronization is nowadays a mature technology. A high-precision time synchronization network has already been reported over commercial transport networks with an accuracy of 65ns realized under 13 synchronization hops [6]. OTSS brings a new problem of routing and time slot allocation (RTSA). The allocation of time slot is restricted by time slot continuity constraint, which means a flow traversing multiple nodes should not change its time slot(s). Propagation delay between switching nodes makes the RTSA problem even more difficult, because the same time slot allocated on the different links along a routing path should have different but closely related starting times according to the values of the delay. Fortunately, the structural features of our proposed intra-DCN architecture make the RTSA problem much simpler. As shown in Fig. 1(c), the routing path of an arbitrary intra-pod flow is uniquely determined by selecting an aggregate switch and has a constant hop number of two (“T-A-T” path). Similarly, after selecting a core switch, the routing path of an inter-pod flow is uniquely determined with constant four hops (“T-A-C-A-T” path). This degenerates the routing problem to a path selection problem, and hence reduces the computational complexity of RTSA in intra-DCNs. By obtaining global time slots state information for solving the RTSA problem, resource contention can be avoided fundamentally without optical buffers. 4. Experimental setup and results We carry out an experiment to demonstrate flex-capacity all-optical sub-wavelength circuit establishment among ToR switches under the proposed intra-DCN architecture. Fig. 2 shows the experimental setup. The period of OTSS frame and the guard interval between adjacent time slices are 100µs and 100ns, respectively. Four data flow to be switched (Data #0-#3) are generated by small form-factor pluggable (SFP+) transceiver modules at 10Gb/s carried on wavelength of 1550nm with repetitive time slices of 1µs, 1µs, 10µs and 25µs in length. Four 100m-fibers are used to generate propagation delay. The central controller is responsible for dispatching switching regulations,
W3J.5.pdf
EDFAs DSO
High-precision Synchronized Time
Core Switch
PLZT SW Controller
Data #0,#2 from pod 1 Data #1,#3 from pod 2
Central Controller
PLZT SW Controller
PLZT SWs
Central Controller
OFC 2016 © OSA 2016
EDFA EDFA
100m
PLZT SW Controller
100m
Port 1
Port 3
Port 2
Port 4 2x2 PLZT SW EDFA
100m
Aggregate Switch
Port 5 100m Data #2 to pod 4
EDFA
Data #0,#1 (pod 3) Data #3 (pod 3)
2x2 PLZT SW EDFA
DSO
PD
PD Fig. 2: Experimental setup
which contain information of OTSS frame period, starting time and guard interval of time slices, to every PLZT switch controller for each data flow. Switching operations are triggered periodically by PLZT switch controller at the pre-set precise time. Fig. 3(a) presents a segment of the four ports’ waveforms of the core switch to illustrate flexible sub-wavelength switching. By precisely controlling the PLZT switches, data carried on variable-length time slices can be directed to their expected ToR switches accurately. Fig. 3(b) shows data propagation delay and control signal delay. The data propagation delay between the core switch and the aggregate switch are measured to be 520ns, which includes the transmission delays of the 100m-fiber, PLZT switch and fiber pigtails. Correspondingly, the control signal of the aggregate switch is delayed 520ns compared with that of the core switch. Port 1
Port 2
1µs
10µs
#0
#2
1µs
25µs
#3
#1
Aggregate Switch Control Signal #3
#3 Port 3
Control Signal Delay 520ns
Core Switch Control Signal
#0
#1
Port 3 Data
Guard Interval 100ns
#0 Data Delay 520ns
Port 4
#2
Port 5 Data
#0
(a) (b) Fig. 3: Experimental results: (a) Flexible all-optical sub-wavelength switching at the core switch; (b) Data propagation delay and control signal delay
4. Conclusions We propose a novel hybrid packet-OCS intra-DCN architecture with flexible bufferless all-optical sub-wavelength switching based on OTSS. The proposed architecture can support various types of workloads with different characteristics and requirements by properly selecting the routing path in the appropriate switching plane offering specific switching paradigm, and may thereby reduce power consumption while maintaining high bandwidth utilization. The experiment based on the proposed architecture shows successful flex-capacity sub-wavelength circuit establishment among the ToR switches without buffers and contention. 6. References [1] C. Kachris et al., “A survey on optical interconnects for data centers,” Communications Surveys & Tutorials, Vol. 14, no. 4, p. 1021, (2012). [2] G. Wang et al., “c-Through: Part-time optics in data centers,” ACM SIGCOMM Computer Communication Review, Vol. 41, no. 4, p. 327 (2011). [3] N. Farrington et al., “Helios: a hybrid electrical/optical switch architecture for modular data centers,” Proc. ACM SIGCOMM, pp. 339–350, New Delhi (2010). [4] A. Greenberg et al., “VL2: a scalable and flexible data center network,” ACM SIGCOMM computer communication review. Vol. 39, no. 4, p. 51 (2009). [5] N. Hua et al., “Optical Time Slice Switching (OTSS): An All-Optical Sub-Wavelength Solution Based on Time Synchronization,” Proc. ACP, AW3H. 3, Beijing (2013). [6] L. Han et al., “First National High-Precision Time Synchronization Network with Sub-Microsecond Accuracy over Commercial Optical Networks for Wireless Applications,” Proc. ACP, PDP AF4B.6, Shanghai (2014).