Architectural Exploration of Chip-Scale Photonic ... - CiteSeerX

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 28, NO. 9, MAY 1, 2010

1305

Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis Johnnie Chan, Student Member, IEEE, Gilbert Hendry, Student Member, IEEE, Aleksandr Biberman, Student Member, IEEE, and Keren Bergman, Fellow, IEEE

Abstract—Chip-scale photonic interconnection networks have emerged as a promising technology solution that can address many of the scalability challenges facing the communication networks in next-generation high-performance multicore processors. Photonic interconnects can offer significantly higher bandwidth density, lower latencies, and better energy efficiency. Even though photonics exhibits these inherent advantages over electronics, the network designs that can successfully leverage these benefits cannot be straightforwardly extracted from typical electronic network methodologies and must consider the many unique physical-layer constraints of optical technologies. We conduct an architectural exploration of four chip-scale photonic interconnection networks in a novel simulation environment, measuring insertion loss, crosstalk, and power. We also explain and demonstrate the impact of these physical-layer metrics on the scalability, performance, and realizability of each design. Index Terms—Multiprocessor interconnection, optical interconnects, optical switches, photonic switching systems, simulation.

I. INTRODUCTION

T

HE networks used to interconnect the growing number of cores on chip multiprocessors (CMPs), and the many devices on board-level systems have emerged as a critical performance bottleneck that can potentially inhibit a system from performing at its full potential. As these systems continue to scale in performance and size, it becomes increasingly difficult to maintain a network that can both accommodate the communication demands and stay within power-dissipation limits of the system package. Electronically enabled interconnects in CMPs already account for over 50% of the dynamic power dissipated in some high-performance chips [1]. The portion of dissipated Manuscript received November 08, 2009; revised January 26, 2010. First published March 04, 2010; current version published April 07, 2010. This work was supported in part by the Interconnect Focus Center, one of the five research centers funded under the Focus Center Research Program, by the Semiconductor Research Corporation and Defense Advanced Research Projects Agency (DARPA) Program, and by the DARPA Microsystems Technology Office under a subcontract with International Business Machines (Prime Contract HR0011-08-C-0102). The views, opinions, and findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense. The authors are with the Department of Electrical Engineering, Columbia University, New York, NY 10027 USA. (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JLT.2010.2044231

power that comes from the interconnect is expected to continue to grow with time and is therefore a critical issue to the future scaling of CMP performance. Recent advancements in silicon nanophotonic technology have opened the possibility of integrating photonics for chip-scale interconnection networks. Photonics has the potential to offer high-bandwidth connections by leveraging wavelength-division-multiplexed (WDM) transmission schemes. Additionally, energy dissipated in photonic signaling is practically distance independent, enabling greater energy efficiency than electronics for global chip- and board-scale communications. Although these advantages exist, photonic interconnect devices are fundamentally different from electronics in function. Established electronic design methodologies are therefore not necessarily well suited for photonics. Optical signals are incapable of practical in-flight processing or buffering without optical-electronic-optical (O-E-O) conversion. While O-E-O conversion is typically allowed in large-scale optical networks, the additional power that would be dissipated in chip-scale systems would be a significant impediment to its viability. Also, signal regeneration in optics cannot be easily accomplished on the CMOS-compatible silicon photonic platform; therefore, all photonic transmissions must be able to propagate the length of the transmission path without accumulating significant optical loss. In regards to these constraints, many photonic interconnect designs have been considered for enabling optical data transmission in the chip-scale domain [2]–[5]. In this paper, we conduct a detailed physical-layer analysis of four photonic interconnection networks at the physical-layer and determine the resulting system performance implications. Although many photonic topologies have been proposed in an effort to improve computing performance, less emphasis has been placed on understanding whether such designs are feasible from a physical-layer standpoint. Additionally, since it is currently impractical to test full network topologies in a laboratory environment due to fabrication yield limitations, we have implemented a physically accurate simulation model for this analysis. Section II gives a brief overview of chip-scale silicon photonic interconnection networks and presents two previously proposed topologies, the Torus and Non-Blocking Torus. In Section III, we give an overview of the simulation environment and mention assumptions that are common for all the results presented later. Next, we present results and analyses of each network in terms of insertion loss in Section IV, crosstalk in Section V, and

0733-8724/$26.00 © 2010 IEEE Authorized licensed use limited to: Columbia University. Downloaded on May 04,2010 at 19:37:44 UTC from IEEE Xplore. Restrictions apply.

1306


power dissipation in Section V. Lastly, we make our concluding statements in Section VII. II. SILICON PHOTONIC INTERCONNECTION NETWORKS The use of photonic networks is currently being investigated as a potential method for interconnecting on- and off-chip components. Currently, networks can predominately be identified as either using wavelength-selective routing [2]–[4] or ones that leverage space routing [5]. The primary difference lies in the method used to arbitrate the source-to-destination path in the photonic medium. With wavelength-selective routing, switching functionality is implemented using wavelength filters throughout the network. The filters are tuned in such a way to allow each source to route to each destination through the selection of specific wavelengths. This can be described as source routing since the selection of the wavelength at the transmitting node will determine the entire network path used and the destination node. This form of routing ensures low latency but is not able to leverage the full throughput that optics can provide. Space routing focuses on the use of multiwavelength transmission to enable messages with high-aggregate bandwidth. These networks are designed to use actively controlled broadband switches to route the entire spectrum of wavelength channels concurrently from source to destination. An electronic control plane, mirroring the photonic network layout, is used to control each broadband switch, using a circuit-switching protocol. Spatially routed networks can fully utilize optical spectrum using WDM to create extremely high-throughput links; however, the required circuit-switching protocol will induce an overhead that creates longer latencies. All the necessary devices needed to design photonic interconnection networks have been experimentally demonstrated. These devices include waveguides [6], waveguide crossings [7], vertical and horizontal off-chip couplers [8], [9], modulators [10], photodetectors [11], filters [6], and 1 2 and 2 2 photonic switching elements (PSEs) [12], [13]. This collection of fundamental devices form a catalog of building blocks from which higher order devices can be created. The nonblocking 4 4 switch, an important component for on-chip switching, is one such device that has been proposed with various designs [14], [15], fabricated, and tested [16], [17]. This paper focuses on the physical-layer analysis of spaceswitched photonic networks. Two previously proposed topologies are the Torus [5] and a Non-Blocking Torus [14], as shown in Figs. 1 and 2, respectively. We define a node (marked ) as the logical switching point on the network, whereas an access point (marked ) is a gateway where a network user (e.g., a processor node) can initiate or receive a transmission. The nodes are implemented with the nonblocking 4 4 switch. The primary folded-torus path in both networks is illustrated with thick lines to represent two waveguides forming a bidirectional link. , and ) indicate The remaining thinner lines and blocks ( the location of additional waveguides and switches that compose the access network, which is needed to enter and exit the tori. The primary difference between the two topologies is the manner in which access points are mapped to nodes. The

2

Fig. 1. 4 4 photonic Torus with 16 access points. Switching points and access and , respectively. and represent switches used to points are labeled inject and eject messages into and out of the network respectively.

2

X

G

I

E

Fig. 2. 4 4 Non-Blocking Torus with eight access points. combined injection–ejection switching points.

S labels indicate

Torus has an access point mapped to every node, while the Non-Blocking Torus is limited to two access points on each row and column of nodes in the Torus in order to achieve a strictly nonblocking network. For example, an 8 8 Torus would allow 64 access points in a normal configuration but would only allow 16 access points in a nonblocking configuration. Previous studies have shown that the non-blocking property can be advantageous in both throughput and latency compared to blocking networks [14], but performance improvements will be offset by the physical layer constraints that have not previously been considered. We simulate the networks using PhoenixSim, a physical-layer simulator that we have developed in the OMNeT++ simulation environment [18], which achieves a high level of accuracy by incorporating detailed physical models of the basic photonic building blocks such as the waveguides, modulators, photodetectors, and switches. More complex photonic circuits, such as

Authorized licensed use limited to: Columbia University. Downloaded on May 04,2010 at 19:37:44 UTC from IEEE Xplore. Restrictions apply.

CHAN et al.: ARCHITECTURAL EXPLORATION OF CHIP-SCALE PHOTONIC INTERCONNECTION NETWORK DESIGNS USING PHYSICAL-LAYER ANALYSIS

the 4 4 switches or full topologies, can be created by properly arranging the building blocks. These composite structures can then be further analyzed within the simulator to determine the overall performance characteristics. Electronic power performance is based on the ORION electronic router model [19], which has also been integrated into PhoenixSim. The simulation topology models assume die sizes of 2.0 cm 2.0 cm.

1307

TABLE I SIMULATION INSERTION LOSS PARAMETERS

III. INSERTION LOSS ANALYSIS We first examine the insertion loss aspects of the interconnect fabric design. Closely related to insertion loss is the optical power budget of interconnect, which is constrained by multiple physical aspects of the photonic devices. The first constraint is the aggregate optical power threshold, which considers the maximum power the network can sustain without incurring undesirable effects. We assume the threshold to be the limit in cumulative power across all wavelength channels of the WDM signal that the source access point can transmit while avoiding optical nonlinearities that may interfere with switches and introduce additional insertion loss. Second, as a photonic message propagates through the network, it is attenuated through waveguide scattering, ring resonator insertion losses, and waveguide crossing reflections, all of which when accumulated directly translate to network-level insertion loss. Lastly, the sensitivity of the photodetectors at the receiving node determines the lower limit at which the optical power of the message can be reliably detected. The difference in the upper limit in transmission power and photodetector sensitivity equate to the network-level optical power budget. If we consider the fact that each individual wavelength channel only transmits at an equal fraction of the total allotted power, then the channel-specific optical power budget will be lower. This ultimately restricts the number of wavelength channels that can be employed, and the number of access points that the network can accommodate. Also, since each wavelength channel can be independently modulated with a data signal, the throughput of a communication link will be proportional to the number of allowed wavelength channels in the switch and it becomes desirable to maximize this number. These design constraints are summarized by the inequality (1) The left side of the inequality contains variables that relate to device performance. is the amount of power that is allowed to be injected, assuming a threshold for nonlinearities, and is the photodetector sensitivity for the desired device and bit error forms the network-level optical power budget. rate (BER). On the right-hand side are the two terms that relate to design asassumes the worst-case path pects of the switch fabric. for a signal in terms of loss and will play a role in determining , and the complexity and scalability of the network. are expressed in decibel units. Lastly, the term determines the number of wavelengths that are used in the WDM signal. Our study assumes loss parameters close to currently realizable values. Propagation loss in single-mode silicon waveguides has been measured at 1.7 dB/cm, while bending loss has been demonstrated at 0.005 dB per 90 bend with a 6.5- m

Fig. 3. Maximum possible network-level insertion loss by component for varying sizes the Torus and Non-Blocking Torus, using the parameters listed in Table I. Labeled values represent the peak cumulative insertion loss (in dB) for the network.

bending radius [6]. Bogaerts et al. produce waveguide crossings with elliptical mode expanders that obtain a 0.16-dB insertion loss [7]. Previous work in signal attenuation caused by traveling into a ring on-resonance has been demonstrated at 0.6 dB, while the loss associated with passing by a ring off-resonance was negligible [12]. The near-term simulation parameters, which assume marginal improvements of demonstrated devices, are summarized in Table I. Note that ring resonators exhibit a strong thermal dependency, which could potentially cause additional losses, increased crosstalk, and disruptions in the network. Thermal management of ring resonator devices is currently an active research topic with proposed solutions that include integrated heaters for thermal compensation [16] and athermal devices [20]. For this simulation work, we assume an adequate mechanism for managing this issue. The maximum possible loss (across all paths) that a message will incur from each type of component in the Torus and NonBlocking Torus is shown in Fig. 3 for networks ranging from 4 4 to 18 18 nodes. Losses due to bending waveguides and passing a ring off-resonance are negligible and are not shown. As the photonic network topology scales to support more access points, signals will incur higher losses due to more waveguide crossings and switching elements. The waveguide crossings are shown to be the most significant component of optical losses reaching as high as 68% for the


1308


Fig. 4. Upper limits on the number of wavelength channels allowed for a given number of access points assuming various network-level optical power budgets. Solid lines assume all realistic parameters (original) and dashed lines assume a hypothetical improvement in crossing loss (improved).

Torus and 61% for the Non-Blocking Torus. The contribution of loss from dropping into a ring on-resonance for the Torus and Non-Blocking Torus regardless of topology size are approximately 17% and 20%, respectively, whereas propagation losses in the 4 4 configuration are as high as 43% and 49%, respectively, and gradually decrease in percentage as the topology size increases. The decreasing trend in percentage for propagation loss is due to the assumed fixed size of the die keeping the approximate maximum propagation distance equal, while other components continue to scale in number as the topology size increases. Passing by rings off-resonance and passing through waveguide bends induce relatively negligible losses in these topologies. Consequently, the most beneficial improvements to these networks can be achieved through either a reduction of waveguide crossing losses or through the redesign of the switching fabric layout to reduce the number of crossings. A. Device Improvement The previous analysis of network-level insertion loss of the Torus and Non-Blocking Torus suggests that research advancements in lower loss crossings will have the most impact in increasing system performance. In particular, two system parameters stand to gain with improvements in loss, the bandwidth available to each access point, which is specified by the number of wavelengths, and the number of access points available in the network. We examine in simulation a hypothetical improvement in crossing loss and use (1) to determine the impact it will have on network scalability. Fig. 4 shows the maximum number of wavelengths that are allowed for varying topology sizes and the change in performance when assuming a hypothetically better crossing loss of 0.05 dB (compared with 0.15 dB in the original case). The gains in system-level performance from the improved crossings are apparent from the network's support for more access points and greater numbers of wavelengths. For instance, assuming a 30-dB allowed network-level optical power budget, the maximum connectivity supported on the Torus scales from 36 access points when using the original crossings to 196 access points when using the improved crossings (a more than fivefold increase). Similarly, the Non-Blocking Torus scales from 12 to 24 access points. On the other hand, we can fix the Torus

2

Fig. 5. Light propagation in 1 2 PSE. (a) Off-resonance propagation with crossing. (b) On-resonance propagation with crossing. (c) Off-resonance propagation without crossing. (d) On-resonance propagation without crossing.

topology to 36 access points and have a gain in the number of possible wavelength channels from 2 to 20 (tenfold increase in bandwidth), while a Non-Blocking Torus with 12 access points will increase from 2 to 15 wavelength channels. For the case of the Torus network operating with a 20-dB optical power budget and original parameter set, the network configuration is unable to produce any wavelengths since the worst-case insertion loss exceeds the optical budget. B. Topology Exploration Network performance improvement can also be achieved though design optimizations that decrease network-level insertion loss. As was shown, waveguide crossing losses are the dominant contribution to the total optical insertion loss. Therefore, designs that decrease the number of crossings will be advantageous. TorusNX and Square Root were designed with this objective in mind. A significant amount of loss in the original Torus is attributed to two reasons. First, the usage of the access network introduces an additional set of waveguide crossings, which produce a high insertion loss overhead. Second, the Torus (and also Non-Blocking Torus) is designed using only the 1 2 and 2 2 PSEs, which both contain an embedded waveguide crossing [Fig. 5(a) and (b) shows the 1 2 case, Fig. 6(a) and (b) shows the 2 2 case]. These switch designs were suitable



1309

2

Fig. 6. Light propagation in 2 2 PSE. (a) Off-resonance propagation with crossing. (b) On-resonance propagation with crossing. (c) Off-resonance propagation without crossing. (d) On-resonance propagation without crossing. Fig. 9. (a) Basic unit of the Square Root topology, 2 Root.

Fig. 7. 4

2 2 quad. (b) 4 2 4 Square

2 4 TorusNX network with 16 access points.

Fig. 10. Maximum possible network-level insertion loss by component for varying sizes of TorusNX and Square Root, using the parameters listed in Table I. Labeled values represent the peak cumulative insertion loss in dB.

Fig. 8. Design for a photonic gateway with an integrated bidirectional crossing.

for prior investigations into photonic networks since the studies did not consider insertion loss, but our analysis shows that the overall system performance would be significantly impacted. In many circumstances, a designer can take advantage of an alternative 1 2 [see Fig. 5(c) and (d)] and 2 2 [see Fig. 6(c)

and (d)] PSE designs, which eliminate the crossing and reduce the insertion loss impact on off-resonance message traversal but keep similar switching functionality. The TorusNX topology (see Fig. 7) is designed to preserve the connectivity and scalability of the original Torus topology while lowering the overall insertion loss. In contrast with the Torus, which required a complex access network to facilitate injection and ejection from the network, TorusNX uses a new gateway design (see Fig. 8), which splits the access point into two blocks for modulation and detection, and circumvents adding any additional crossings to the Torus through the use of the 1 2 PSE variant. The modulation block enables a message to be injected north or south, while the detection block can


1310


Fig. 11. Upper limits on the number of wavelength channels allowed for a given number of access points, assuming a particular optical power budget. Solid lines assume all realistic parameters (original) and dashed lines assume a hypothetical improvement in crossing loss (improved).

receive signals coming from the east or west direction. This scheme is well suited for dimension-ordered routing which is the assumed routing for this topology. TorusNX also uses an optimized version of the 4 4 nonblocking switch, which was previously shown to perform better in dimension-order routed topologies [15]. The Square Root topology was also designed with fewer waveguide crossings and fewer switches in mind by simplifying the entire network into only using 4 4 nonblocking switches. In addition to the axioms used to reduce insertion loss in the physical layer, the Square Root also uses hierarchical organization to simplify routing and path multiplicity between organizational units to increase performance. The Square Root is constructed recursively beginning with a 2 2 quad, as shown in Fig. 9(a), which has no waveguide crossings outside the 4 4 switches. A 4 4 Square Root is composed of four sets of quads, and is shown in Fig. 9(b), connecting quads through central switches and interquad express lanes. In a similar fashion, an 8 8 Square Root can be constructed from four 4 4 Square Roots. This recursive construction can be used to build any size square topology with dimensions equal to any positive integer power of two. The insertion loss performances of TorusNX and Square Root assuming realistic loss parameters are shown in Fig. 10. For the radixes examined, TorusNX has between 23% and 29% lower network-level insertion loss in comparison to the original Torus, while Square Root has between 31% and 46% lower loss. In the case of 8 8 topologies, the Torus contains 3200 waveguide crossings, while TorusNX reduces this number to 1796, and Square Root further reduces it to 1080. As before, improved crossing loss can also be applied to these designs to further improve the scalability and performance (see Fig. 11). In both of the new networks, assuming the same 30-dB optical budget and improved crossing losses, both networks are able to achieve the maximum size network simulated in this study (324 access points for TorusNX, 256 access points for Square Root) and with the remaining optical budget transmit on seven wavelength channels.

The results of this insertion loss analysis clearly indicate that the newly developed networks are better in sustaining higher bandwidths and more access points for better overall system performance. However, for a fixed network design, optical power budget, and device performance, determining the optimal number of wavelengths and access points to use will largely depend on the specific system requirements being targeted. As an example, we can choose to maximize the total ideal network throughput (number of access points number of wavelengths per access point data rate per wavelength) of the TorusNX topology. We assume a 30-dB optical budget, the improved device parameters, and a 10-Gb/s modulation rate per wavelength. At one extreme, selecting the maximum number of access points (324) while using a single wavelength achieves a throughput of 22.6 Tb/s. On the other hand, maximizing the number of wavelengths (70) would allow a total of 16 access points, which results in a throughput of 11.2 Tb/s. A balance of the two parameters, in fact, achieves the best throughput performance at 27.4 Tb/s when using 196 access points with 14 wavelengths. IV. CROSSTALK ANALYSIS For system performance, it is useful to report the SNR, which is a measure of the integrity of the message being transmitted. More specifically, the optical SNR (OSNR) is the ratio of optical signal power to optical noise power at the point where the measurement is being made. The signal power is calculated based on the injected power and the network-level insertion loss, while the noise power is derived from several sources. This includes laser intensity noise, intermessage crosstalk interference, and intramessage crosstalk interference. Laser intensity noise is a result of inherent fluctuations in the laser source due to mechanical vibrations and quantum noise. This noise produces an upperbound in the OSNR performance since it exists for all message transmissions. The noise performance for lasers can be expressed in terms of relative-intensity noise (RIN), which measures the fluctuations in the laser



relative to the mean optical power being emitted. For continuous-wave quantum cascade laser, RIN has been measured to dB/Hz for a 10-mW output [21]. The SNR can be about be derived using the following equation [22]:

1311

TABLE II SIMULATION CROSSTALK AND NOISE PARAMETERS

(2) is the noise bandwidth, which we assume to be an ideal is the modula10 GHz (equal to the modulation rate), and , where is the extinction ratio of tion index, equal to the modulator. Silicon ring modulators have been demonstrated with extinction ratios of about 9 dB when modulated at 12.5 Gb/s [23]. Polysilicon ring modulators have also been demonstrated with extinction ratios of 16 dB during dc operation, and 10 dB with active signaling at 2.5 Gb/s [24]. From (2), we can solve for the noise power since the signal power is known. We define intermessage crosstalk as the power that is unintentionally added to an optical message by a secondary optical message during transmission through the network fabric. This crosstalk predominately occurs in two situations in the photonic networks that we are considering: at the waveguide crossings and the PSEs. The crossings are regions where two waveguides physically intersect and are a result of the planar nature of the CMOS fabrication process. In situations where two perpendicularly propagating signals intersect at a crossing in unison, a certain fraction of each signal will leak onto the other, in the dB form of crosstalk. This leakage has been measured at below the signal power [7]. Similarly, the ability of a ring to resonate or pass a particular optical wavelength channel is also nonideal. A signal that is on-resonance with the ring will mostly drop through the ring; however, a small portion of the optical power will continue through in the off-resonance direction. The same is true in the case of an off-resonance signal, which will partially leak onto the on-resonance direction. This small fraction of the optical signal can interfere with other propagating messages as more noise. This behavior is characterized by the extinction ratio, which has been measured experimentally to be 28.6 dB for the through port and 18.7 dB for the drop port [12]. Intramessage crosstalk also exists due to WDM. The photodetector each requires a tuned filter, so that only a single-wavelength channel is received at any single photodetector. Since the networks being examined assume ring filters before each photodetector, they also exhibit a finite extinction ratio, which results in other wavelength channels leaking through. This can be quite significant in the case of systems that support a large number of WDM channels. The OSNR computation is expressed as follows: (3) is the signal power just before the photodetector, is the laser intensity noise, is the intermessage crosstalk, and is the intramessage crosstalk. The crosstalk analysis we report here assumes non-WDM (single wavelength) transmisequal to zero. For this reason, the sion; therefore, we set presented results can be thought of as an upper bound in OSNR

performance. Relevant parameters for the crosstalk analysis are listed in Table II. The OSNR measurements for the four networks are reported in Fig. 12 for varying message sizes. Communications on spacerouted topologies have varying ratios of photonic activity to electronic activity due to the separate electronic control and photonic planes. Network activity exclusively takes place on the control plane during the provisioning and release stages of a photonic path; therefore, no optical signal is injected during these periods. As the transmission message sizes increases, the ratio of photonic to electronic activity increases and is reflected by increased optical crosstalk and lower OSNR. We assume maximal loading of the network with uniform random traffic. Each network assumes an 8 8 topology. For short messages, the message transmissions are dominated by the electronic control messages; therefore, optical transmission is less frequent and crosstalk is less likely. In this limiting case, the OSNR is limited by the laser intensity noise. By solving for (2) with the assumed parameters, we get an OSNR of about 47 dB, which corresponds well with the simulation results. For large messages, the electronic path-setup time is amortized by long data transmissions, and the optical network becomes saturated with the long optical messages. In this case, intermessage crosstalk is highly likely to occur, causing more significant signal degradation. The Square Root topology performs the best for large messages with an OSNR of about 16.0 dB. Torus, Non-Blocking Torus, and TorusNX achieve OSNRs of 11.3 dB, 13.2 dB, and 12.2 dB, respectively. Lack of signal integrity ultimately results in erroneous bits detected. If we assume orthogonal signaling, and an ideal optimal binary receiver, we can calculate the BER using the following function [25]: (4) is the energy in each bit, and is the power spectral density of the noise. The term inside the radical is equivalent to the SNR of the signal. For a BER of 10 , the network requires an SNR of 16.9 dB (indicated in Fig. 12 by a horizontal line). This indicates that in the large-message cases, none of the networks are able to achieve this level of signal integrity. The achieved BERs for networks using 10 -bit messages are for for Non-Blocking Torus, for the Torus, for Square Root. The high BERs TorusNX, and can be lowered by using smaller messages, or mitigated through the use of a higher network-layer error correction scheme.


1312


TABLE III SIMULATION ENERGY DISSIPATION PARAMETERS

Fig. 12. Optical SNR performance for varying message sizes, assuming satu: rated network load, measured at the photodetectors. The line at dB is where a BER of 10 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling.

OSNR = 16 9

V. POWER ANALYSIS The network-level power dissipation is a major component in limiting performance scaling of chip-scale systems. Photonic on-chip networks have been shown to drastically outperform electronic networks in both performance and energy, especially in the case of traffic patterns that require large data transmissions [26]. We conduct simulations to examine the dissipation of the four photonic networks. Each network is assumed to use the maximum number of wavelengths allowed for the improved 8 8 topology, assuming a 30-dB optical power budget according to the results in Figs. 4 and 11. The simulator uses the ORION model [19] for electronic router energy dissipation, which is configured for a 32-nm process with a normal voltage threshold transistor type and a equal to 1.0 V. The electronic components in the network are clocked at 1.0 GHz. All electronic routers use a standard three-stage pipeline model with an 128-bit buffer on each input port and a flit size of 32 bits. All control messages are 32 bits in size. The routers in the torus-like networks use dimension-ordered routing, while Square Root uses a unique routing scheme that is optimized to equally distribute load and reduce propagation distance. All routers are modeled with credit-based flow control. Photonic energy dissipation is computed using the PhoenixSim simulator, assuming various device parameters. The simulations assume integrated thermal tuners to manage thermal fluctuations in a chip, which will be strongly dependent on application activity. Thermal tuners integrated at of each ring in the network assume approximately power dissipation, while the system is assumed to have a mean temperature deviation of 20 . Modulators assume a dynamic dissipation of 85 fJ for every bit transmitted (bit edges) and an additional 30 W of static power during periods when a constant signal is transmitted (hold periods). Switches exhibit higher dynamic and static dissipation than the ring modulators, at 375 fJ/bit and 400 W, respectively, due to larger footprints. Photodetector energy is assumed to be 50 fJ/bit. The photonic power dissipation parameters used in this set of simulations are listed in Table III.

The power performance is reported for each of the four networks and assumes maximum loading with uniform random traffic on 8 8 topologies (see Fig. 13). In all four network designs, the electronic buffers, crossbar circuit, and clock tree dissipate a clear majority of the network power. This is a clear indication that electronic power will remain as a relatively significant contributor to total network power dissipation even with photonic integration. Additional notable trends can be reasoned by relating the power dissipated to the exhibited bandwidth performance of the networks. From Fig. 14, we can see the total network performance of the four networks. As the network assumes larger message sizes, the network throughput also rises due to the amortization of the circuit-switching overhead. Congestion of optical traffic on the photonic network plane causes the eventual saturation of the networks. TorusNX achieves the best network bandwidth at 7.80 Tb/s, while Square Root, Torus, and Non-Blocking Torus obtain throughputs of 3.75 and 2.45 Tb/s and 669 Gb/s, respectively. Relating back to Fig. 13, we see that as the network achieves higher throughput with larger messages, the ratios in power dissipation shifts from high amounts of wire power dissipation and low-photonic device power dissipation to low-wire power dissipation and high-photonic device power dissipation. This is evidence of the higher photonic network utilization and amortization of the electronic path-setup overhead. Furthermore, the total power dissipated by the electronic components in the network remains approximately constant regardless of network throughput since all the data is being sent optically. Fig. 15 combines the power and bandwidth results to plot the energy-per-bit efficiency of the networks. For the largest message size, TorusNX and Square Root achieve the best efficiencies at 585 and 681 fJ/bit. Torus achieves an efficiency of 2.73 pJ/bit, while Non-Blocking Torus achieves an efficiency of 3.62 pJ/bit. The new network designs attain at least 75% better efficiency compared to the Torus, and at least 81% better efficiency than the Non-Blocking Torus. This dramatic improvement is attributed to the lower loss network designs, which enable better bandwidth utilization and reductions in the number of required switches. We see that although the Non-Blocking Torus produces a comparatively reasonable absolute power dissipation measurement, the efficiency, for larger message sizes, is the worst of the four networks. Although the Non-Blocking Torus has the advantage of being nonblocking, the fact that it supports fewer access points than the other three network designs results in a dramatic degradation in performance. Note that each network



1313

Fig. 13. Power-dissipation breakdown of different photonic topologies over varying message sizes.

Fig. 14. Total network bandwidth of each network at saturation.

Fig. 15. Transmission efficiency of each photonic network.

assumes the same topology size; however, the Non-Blocking Torus only uses 16 nodes due to the layout constraints. While it may seem reasonable to assume a 32 32 Non-Blocking Torus so that each network can be normalized to the number of gateways, we can see from our original conclusions in Fig. 4 that a 64-gateway version is not possible. The insertion loss penalties usurp the benefits of the nonblocking property, resulting in bandwidth degradation. While from an efficiency standpoint, larger message transmissions clearly perform better, the prior crosstalk simulations indicate that the OSNR also decreases with increased message

size. This indicates that in order to maintain the high energy efficiency that these photonic topologies can provide, a scheme must be in place to either correct or mitigate these errors. VI. CONCLUSION An architectural exploration of chip-scale photonic interconnection networks was conducted in simulation and reported. Four photonic topologies were analyzed and compared using the physical-layer metrics of insertion loss, crosstalk and power. Insertion loss measures the amount of attenuation an optical signal will incur as it propagates through the


1314


photonic network and plays an integral part in the scalability and link throughput performance of the networks. Crosstalk considers the sources of noise and interference in the network and is critical to ensuring reliable data transmission without errors. The power performance is an important consideration in today’s highly power constrained systems. Various tradeoffs were shown to exist with various system configurations, which will be important in realizing final photonic interconnection network designs. REFERENCES [1] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Interconnect-power dissipation in a microprocessor,” in Proc. Int. Workshop Syst. Level Interconnect Prediction, Paris, France, 2004, pp. 7–13. [2] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. W. Holzwarth, M. A. Popovic, H. Li, H. I. Smith, J. L. Hoyt, F. X. Kartner, R. J. Ram, V. Stojanovic, and K. Asanovic, “Building many-core processor-to-DRAM networks with monolithic CMOS silicon photonics,” IEEE Micro, vol. 29, no. 4, pp. 8–21, Jul./Aug. 2009. [3] N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi, “On-chip optical technology in future bus-based multicore designs,” IEEE Micro, vol. 27, no. 1, pp. 56–66, Jan./Feb. 2007. [4] D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. G. Beausoleil, and J. H. Ahn, “Corona: System implications of emerging nanophotonic technology,” in Proc. 35th Int. Symp. Comput. Archit., Beijing, China, Jun. 2008, pp. 153–164. [5] A. Shacham, K. Bergman, and L. P. Carloni, “Photonic networks-on-chip for future generations of chip multiprocessors,” IEEE Trans. Comput., vol. 57, no. 9, pp. 1246–1260, Sep. 2008. [6] F. Xia, L. Sekaric, and Y. Vlasov, “Ultracompact optical buffers on a silicon chip,” Nature Photon., vol. 1, pp. 65–71, 2006. [7] W. Bogaerts, P. Dumon, D. V. Thourhout, and R. Baets, “Low-loss, low-cross-talk crossings for silicon-on-insulator nanophotonic waveguides,” OSA Opt. Lett., vol. 32, no. 19, pp. 2801–2803, 2007. [8] V. R. Almeida, R. R. Panepucci, and M. Lipson, “Nanotaper for compact mode conversion,” OSA Opt. Lett., vol. 28, no. 15, pp. 1302–1304, Aug. 2003. [9] J. Schrauwen, F. V. Laere, D. V. Thourhout, and R. Baets, “Focusedion-beam fabrication of slanted grating couplers in silicon-on-insulator waveguides,” IEEE Photon. Technol. Lett., vol. 19, no. 11, pp. 816–818, Jun. 2007. [10] M. R. Watts, D. C. Trotter, R. W. Young, and A. L. Lentine, “Ultralow power silicon microdisk modulators and switches,” presented at the IEEE Int. Conf. Group IV Photonics, Sorrento, Italy, Sep. 2008, Paper WA2. [11] S. J. Koester, C. L. Schow, L. Schares, G. Dehlinger, J. D. Schaub, F. E. Doany, and R. A. John, “Ge-on-SOI-detector/Si-CMOS-amplifier receivers for high-performance optical-communication applications,” J. Lightw. Technol., vol. 25, no. 1, pp. 46–57, Jan. 2007. [12] B. G. Lee, A. Biberman, P. Dong, M. Lipson, and K. Bergman, “Alloptical comb switch for multiwavelength message routing in silicon photonic networks,” IEEE Photon. Technol. Lett., vol. 20, no. 10, pp. 767–769, May 2008. [13] B. G. Lee, A. Biberman, N. Sherwood-Droz, C. B. Poitras, M. Lipson, and K. Bergman, “High-speed 2 2 switch for multi-wavelength silicon photonic networks on-chip,” J. Lightw. Technol., vol. 27, no. 14, pp. 2900–2907, Jul. 2009. [14] H. Wang, M. Petracca, A. Biberman, B. G. Lee, L. P. Carloni, and K. Bergman, “Nanophotonic optical interconnection network architecture for on-chip and off-chip communications,” presented at the Optical Fiber Communications Conf., San Diego, CA, Feb. 2008, Paper JThA92. [15] J. Chan, A. Biberman, B. G. Lee, and K. Bergman, “Insertion loss analysis in a photonic interconnection network for on-chip and off-chip communications,” presented at the Annual Meeting Lasers and ElectroOptics Society, Newport Beach, CA, 2008, Paper TuT3. [16] N. Sherwood-Droz, H. Wang, L. Chen, B. G. Lee, A. Biberman, K. Bergman, and M. Lipson, “Optical 4 4 hitless silicon router for optical networks-on-chip (NoC),” OSA Opt. Exp., vol. 16, no. 20, pp. 15915–15922, Sep. 2008.

[17] B. G. Lee, A. Biberman, K. Bergman, N. Sherwood-Droz, and M. Lipson, “Multi-wavelength message routing in a non-blocking fourport bidirectional switch fabric for silicon photonic networks-on-chip,” presented at the Optical Fiber Communications Conf., San Diego, CA, Mar. 2009, Paper OMJ4. [18] A. Varga, “OMNeT++ discrete event simulation system,” [Online]. Available: http://www.omnetpp.org [19] A. Kahng, B. Li, L. Peh, and K. Samadi, “ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration,” presented at the Design, Automation and Test in Europe Conference and Exhibition , Nice, France, Apr. 2009. [20] M. Uenuma and T. Moooka, “Temperature-independent silicon waveguide optical filter,” OSA Opt. Lett., vol. 34, no. 5, pp. 599–601, 2009. [21] T. Gensty, W. Elsäßer, and C. Mann, “Intensity noise properties of quantum cascade lasers,” OSA Opt. Exp., vol. 13, no. 6, pp. 2032–2039, 2005. [22] “Intensity modulation and noise characterization of optical signals,” in Fiber Optic Test and Measurement, D. Derickson, Ed. Upper Saddle River, NJ: Prentice-Hall, 1997, pp. 146–282. [23] Q. Xu, S. Manipatruni, B. Schmidt, J. Shakya, and M. Lipson, “12.5 Gbit/s carrier-injection-based silicon micro-ring silicon modulators,” OSA Opt. Exp., vol. 15, no. 2, pp. 430–436, Jan. 2007. [24] K. Preston, S. Manipatruni, A. Gondarenko, C. B. Poitras, and M. Lipson, “Deposited silicon high-speed integrated electro-optic modulator,” OSA Opt. Exp., vol. 17, no. 7, pp. 5118–5124, Mar. 2009. [25] B. P. Lathi, “Behavior of digital communication systems in the presence of noise,” in Modern Digital and Analog Communication Systems, 3rd ed. New York: Oxford University Press, 1998, pp. 577–625. [26] G. Hendry, A. Biberman, J. Chan, S. Kamil, B. G. Lee, M. Mohiyuddin, K. Bergman, L. P. Carloni, L. Oliker, and J. Shalf, “Analysis of photonic networks for a chip multi-processor using scientific applications,” in Proc. Int. Symp. Networks-on-Chip, San Diego, CA, May 2009, pp. 104–113.

Johnnie Chan (S’08) received the B.S. degree (high distinction) in computer and electrical engineering and the M.S. degree in electrical engineering from the University of Virginia, Charlottesville, in 2005 and 2007, respectively. He is currently working toward the Ph.D. degree at the Department of Electrical Engineering, Columbia University, New York. His current research interests include the design of chip-scale photonic interconnection networks for on-chip and interchip systems, and the modeling and simulation of nanophotonic devices.

Gilbert Hendry (S’08) received the B.S. and M.S. degrees in computer engineering from Rochester Institute of Technology, Rochester, NY, in 2007. He is currently working toward the Ph.D. degree at the Department of Electrical Engineering, Columbia University, New York. His current research interests include the design of computing systems, using silicon photonics, and the software tools used in this endeavor.

2

2

Aleksandr Biberman (S’05) received the B.S. degree in electrical and computer and systems engineering from Rensselaer Polytechnic Institute, Troy, NY, in 2006, and the M.S. degree in electrical engineering in 2008 from Columbia University, New York, where he is currently working toward the Ph.D. degree at the Department of Electrical Engineering. His current research interests include silicon nanophotonic devices for networks-on-chip and interchip communication, photonic interconnection networks for chip multiprocessor architectures, optical networking in high-performance computing systems, as well as silicon photonic devices for parametric optical processes and systems.



Keren Bergman (S’87–M’93–SM’07–F’09) received the B.S. degree from Bucknell University, Lewisburg, PA, in 1988, and the M.S. and Ph.D. degrees from Massachusetts Institute of Technology, Cambridge, in 1991 and 1994, respectively, all in electrical engineering. She is currently a Professor at the Department of Electrical Engineering, Columbia University, New York, where she also directs the Lightwave Research Laboratory. Her research programs involve optical interconnection networks for advanced computing systems, photonic packet switching, and nanophotonic networks-on-chip. Prof. Bergman is a Fellow of the Optical Society of America. She is the Coeditor-in-Chief of the OSA/IEEE Journal of Optical Communications and Networking.


1315