silicon photonic interconnects for large-scale computer systems

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 68

..........................................................................................................................................................................................................................

SILICON PHOTONIC INTERCONNECTS FOR LARGE-SCALE COMPUTER SYSTEMS ..........................................................................................................................................................................................................................

OPTICAL INTERCONNECTS PLAY AN INTEGRAL ROLE IN LARGE-SCALE DIGITAL COMPUTING, SWITCHING, AND ROUTING SYSTEMS. THE AUTHORS DESCRIBE A PATH TOWARD FUTURE MANY-CHIP MODULES BASED ON SILICON PHOTONIC INTERPOSERS THAT STITCH TOGETHER TENS OF CHIPS IN A DENSE AND EFFICIENT COMMUNICATION

Ron Ho Philip Amberg Eric Chang Pranay Koka Jon Lexau Guoliang Li Frankie Y. Liu Herb Schwetman Ivan Shubin Hiren D. Thacker Xuezhe Zheng John E. Cunningham Ashok V. Krishnamoorthy Oracle Labs

INFRASTRUCTURE. THEY REVIEW THE GUIDING DESIGN PRINCIPLES FOR THIS

‘‘MACROCHIP’’ AND DESCRIBE ITS CANONICAL ENERGY, LOSS, AND AREA BUDGETS.

......

Interconnects play an increasingly dominant role in large-scale computer systems’ energy and performance. Continued transistor scaling decreases the incremental cost of computation by packing more transistors into compact functional units, making the design of optimal interconnects for connecting these compute blocks a growing challenge. Interconnects include both wires within a modern multicore processor and communication channels between chips. On-chip wires, constructed from micron-thick copper film, are dense—one line per micron—and can reasonably span several millimeters. Their high packing density limits performance: small cross-sections make the wires’ performance dominated by their resistance and capacitance (RC) characteristics, and with scaling their cross-sections continue to shrink, further lowering their speeds. Offchip transmission lines are much coarser, with one line every 200 mm, and they connect to chips through large solder balls 150 mm in diameter. They span up to a few feet across printed circuit boards (PCBs) and carry data at the speed of light in the board material. The physical size

mismatch between fine on-chip wires and coarse off-chip wires encourages overclocked and serialized off-chip wires, which incur added energy and complexity. Much longer distances can employ optics, which have achieved remarkable commercial success when the product of throughput and reach exceeds 100 Gbps meter.1 Most of these systems use coarse PCB copper wires and often pluggable connectors to connect functional silicon— processors, switches, and memories—to specialized optical modules lighting up optical fibers. In these systems, the optical channels serve to connect physically separate VLSI system packages. In this article, we explore the technical challenges of pulling optical technologies deep into VLSI packages, sockets, and systems. Over the past several years we have explored system-level motivations for silicon nanophotonics, and we have fabricated a host of optical devices connected to custom analog and digital circuits. We believe our results demonstrate a viable path toward optics integrated within a many-chip VLSI package and will enable systems with unprecedented integration levels.

Published by the IEEE Computer Society

0272-1732/13/$31.00 c 2013 IEEE

..............................................................

68

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Large-scale VLSI systems Today’s highly integrated processors aggregate dozens of out-of-order execution threads, megabytes of cache, and several system-on-a-chip components such as networking, PCI Express, and memory controllers,2 in order to run business-scale database, financial transaction, and highperformance scientific workloads. Typical large-scale systems combine tens to hundreds of such processors with thousands of DRAM modules in blades and cards, with each processor transmitting and receiving tens of gigabits per second (Gbps) to its local DRAM and a similar amount of memory and cache-coherence traffic to the other processors. Such systems’ cost and complexity require tradeoffs between memory footprint and capacity, distance between processors, channel bit rate, and energy per bit. Partly in response, researchers are collapsing parts of these large-scale machines into compact subsystems organized around multichip sockets or packages. Leveraging projections for 3D memory chip stacking using through-silicon vias (TSVs), these multichip sockets can be as simple as a processor and a memory stack sitting atop a single shared silicon interposer. The interposer provides fine-pitch, short-reach interconnects between the processor and the memory stacks, as well as through-vias to provide power to both processor and memory from a second-level package. More complex multichip sockets can embed multiple processors and memory stacks, perhaps with chips soldered to both sides of the interposer. Such multichip packages provide several advantages. When they combine chips that require high bandwidth, such as a processor and a stack of its local DRAM, they carry interchip traffic on short, low-loss interposer wires, allowing energy and performance optimizations on both chips’ transceiver circuits. More importantly, they enable higher total bandwidth between those chips, in part because the tightly controlled environment allows for fine-pitch solder connections and hence more wired connections between chips. These chips can thus be far more tightly coupled—and offer much higher total performance—than separately packaged

Page 69

chips on the same PCB, coming close to the functional illusion of monolithically integrating them on the same silicon, Moore’s law style. Tradeoffs include higher cost and complexity and an increase in socket-level power, though still within the limits of commercially available microchanneled water coolers. A reliance on silicon interposer interconnects, however, does present scalability limits. Many-chip packages with 16 to 64 processors and memory stacks could require a spaghetti nest of interposer wires connecting the functional silicon chips; compared to multichip packages with just a few chips, the interposer wires required would increase substantially in length and in number. Yet silicon interposer wires can generally be engineered either for fine pitch, density, and low energy or for coarse pitch and the ability to carry a high bit rate over long distances, but not for both. Can optics help?

Optics in many-chip packages Commercially successful optical links have leveraged well-established, inexpensive vertical-cavity surface-emitting laser (VCSEL) and fiber technologies. VCSELs’ modest bandwidth requirements, coupled with laser redundancy and failover, make up for the fact that their reliability falls superlinearly with bit rate;3 and long-haul route distances amortize the area costs of 250-mm pitch fiber connectors. For use in a high-bandwidth many-chip package scenario, however, optics would need to employ densely packed silicon-based waveguides, mass-patterned by exposure lithography and with modulated light sources of high per-channel bit rate and reliability. As a result, many researchers, including us at Oracle, have focused on silicon nanophotonics, using continuous-wave laser light in telecommunications wavelengths of 1,300 to 1,550 nm modulated with transistor-scale structures, carried in silicon waveguides, and received by integrated photodetectors. We have employed a number of forcing functions to drive our research, captured in the design principles we describe here.

Optics are primarily an off-chip technology As described previously,4 we don’t consider optics a feasible replacement for strictly

....................................................................

JANUARY/FEBRUARY 2013

69

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 70

............................................................................................................................................................................................... OPTICAL INTERCONNECTS

on-chip wires. Long routes on a chip rarely span more than 5 to 10 mm, and over that distance even an aggressively optimized optical link will incur higher energy costs and provide only minimal throughput and latency benefits over copper wires of comparable area. Our most aggressive projections of optical link energy aim for several hundred femtojoules (fJ) per bit, not counting laser wall-plug efficiency. By contrast, a 10-mm electrical wire with 2 picofarads of capacitance, driven through a 0.25-mV swing from a 0.9-V supply and with a 25-percent activity factor will dynamically consume just over 100 fJ per bit. Moreover, optical waveguides, with a 5- to 10-mm pitch, are much coarser than wires, which need a 1- to 2-mm pitch and have several layers suitable for cross-chip routes. Hence, even with multiple channels per waveguide, optics can offer only about the same throughput as wires for the same silicon area. Finally, while signals on repeated on-chip wires travel at a fraction of the speed of light, the latency advantage of optics over a 10-mm span is under a clock cycle for a modern 4-GHz processor. For data transmission entering or exiting a chip, however, optical channels present an enormous benefit over copper wiring. Optical couplers from on-chip waveguides to silicon interposer waveguides are much smaller than soldered connectors5 and, with multiple channels per waveguide (wavelength division multiplexing, or WDM), offer much higher net bandwidth. This enables the tight coupling between chips—and concomitant system performance speedup—in a many-chip package. By the same token, interposer waveguides with WDM support much higher bandwidth than either passive (unrepeated) RC wires or wide transmission lines. In addition, at a few hundred femtojoules per bit, optical links will likely beat the energy costs of comparable electrical links, which must either heavily equalize RC-limited interposer wires or match the impedance of interposer transmission lines.

Hybrid integration outperforms monolithic integration CPU process technologies are somewhat incompatible with some of the requirements

....................................................................

70

IEEE MICRO

of optical device manufacturing. Silicon waveguides and modulators require isolation on all four sides, necessitating a buried oxide layer like that in a silicon-on-insulator (SOI) process. However, as SOI processes scale to support future CPU designs, the silicon layer and the buried oxide will soon be too thin to efficiently support an optical mode, making it unsuitable for silicon photonics. In addition, industry trends toward tri-gate or finned transistors could eventually limit the use of SOI for processor chips. Furthermore, maintaining the thermal stability of resonant optical devices such as ring modulators requires silicon underetching to increase thermal tuning efficiency, but integrating a silicon underetching process into a mainstream CPU or application-specific integrated circuit (ASIC) technology will add an unwelcome cost and complexity. Finally, optical receivers need a material such as germanium, which has a direct bandgap edge near 1,550 nm, for photodetection. Although germanium is employed today for straining the silicon lattice, its concentration and process flow are ill-suited for highly responsive integrated photodetectors. As a result, we’ve worked to develop a hybrid integration platform, with photonic devices manufactured in a process technology optimized for low-loss optics, thermal isolation, and efficient photodetectors. We then bond these optical devices face to face with a processor chip manufactured in a mainline bulk CMOS process and optimized for integrated silicon logic. This lets us leverage the best process technologies for both optics and electronics. However, it also requires extremely fine solder, or else the area and parasitics of the face-to-face bonding will overwhelm the process technology advantages. We have developed a microsolder targeting fine arrays of bumps and have demonstrated 35-mm and 25-mm bumps—with 10 mm forthcoming—all with sub-ohm resistance (see Figure 1).6,7 For 10-mm bumps we anticipate only about 10-fF (femtofarads) bonding capacitance, or a loading on par with the parasitics of the optical device itself, making the overhead of hybrid integration small. As described later, however, this faceto-face bonding does require rethinking how interposer systems are assembled.

[3B2-9]

mmi2013010068.3d

23/1/013

Acc.V Spot Magn Det WD 12.0 kV 3.0 250x SE 20.3 Oracle

12:18

200 µm

Page 71

Acc.V Spot Magn Det WD 12.0 kV 3.0 1000x SE 18.4 Oracle

50 µm

Figure 1. Microsolder enables a hybrid bonding strategy. The left image, from a scanning electron microscope, shows 25-mm microsolder arrayed on a 40-nm chip with transceiver circuits (not visible under the passivation). Each bump comprises a flat aluminum pad, plated with under-bump metallization, with a ‘‘cuplike’’ microsolder bump. Chips are attached to other components using thermocompressive bonding with several pounds of pressure at modest temperatures. The right image shows a closeup of the microsolder.

Thermal tuning is a device and circuit codesign problem Dealing with transistor and interconnect device variations is a well-known challenge for system architects. Not surprisingly, energy-efficient optical modulators present similar issues. High-finesse rings can modulate light efficiently and effectively,6,8 but with a Q in excess of 15,000, they must resonate at exactly their intended channel wavelength to be effective. However, manufacturing variations in silicon-layer thickness, waveguide width, and waveguide etch depth (each around 10 nm) can result in unintended shifts in a ring’s resonant wavelength of up to 15 nm. Extensive fabrication data has shown these wavelength variations to be largely random and uniformly distributed across the ring’s free spectral range (FSR)— itself well-controlled across and between wafers—which makes post-fabrication tuning of each ring feasible.9 The key observation is that for N channels in a WDM link, we can tune each ring to its nearest channel. (Note that channels ‘‘wrap’’ in frequency space). In that case, the average per-ring tuning distance to

capture 6-sigma ring variation is approximately FSR/N. For example, a group of eight rings for 8-way WDM and a 12.5-nm FSR requires an average tuning per ring of about 1.6 nm. Such a strategy would require a static (hence, low-energy) digital shifter before and after the link. A ring can be tuned with voltage, current, or heat. Voltage tuning through a reversebiased pn junction is inefficient due to silicon’s weak electro-optic effect (only about 30 picometers/V). Tuning via carrier injection (current) to a forward-biased pn junction can be more efficient, but it has a limited tuning range because the injected carriers introduce excess loss, significantly reducing ring Q. Tuning rings using heat offers a wide tuning range, but selectively applying and removing heat can itself be inefficient because heat will spread rapidly in the silicon substrate both above and below the buried oxide layer. To combat this, we developed a method to etch thermal isolation trenches in the silicon below and around the ring modulators, not only minimizing the thermal mass to be adjusted but also increasing the thermal impedance to outside

....................................................................


71

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 72


Figure 2. A ring modulator photographed from underneath an optical device chip after the silicon substrate is locally removed from around the ring. The thinned material is backlit from a light source on the other side of the chip. This thinning process dramatically increases the efficiency of thermal tuning of resonant ring modulators by reducing the thermal mass required to initially tune the ring to the desired resonant wavelength and by increasing the thermal impedance to external heat sources.

noise sources.10,11 This improves the static thermal tuning cost and lets a control circuit maintain a stable temperature in a fluctuating thermal environment. This silicon etching, made possible by hybrid integration and a process technology optimized for optical devices, represents half of the solution (see Figure 2). The other half of the solution lies in transceiver circuits for controlling a resistive heater placed near the ring modulator. Our approach favors simplicity: each ring modulator uses a monitoring photodiode that persistently taps off a small percentage (about 1 percent) of the ring’s light. During system bring-up, the ring transmits an AC data pattern, and local firmware controlling a digitalto-analog converter (DAC) adjusts the heater current to maximize the ring’s optical signal amplitude. We store the monitor photodiode’s output for later reference. During normal operation, a circuit compares the real-time monitor output to the stored ‘‘ideal’’ value and then drives the DAC and

....................................................................

72

IEEE MICRO

heater to modulate the heat. Because of the silicon etching, external thermal noise will affect a ring fairly slowly, with a 1-ms time constant. Because our heater can generate a local temperature change with about a 1-ms time constant, this circuit can react to an outside stimulus with reasonable fidelity, eventually reaching a limit cycle and dithering between two successive DAC settings. A combination of measurement and simulation provides an estimate for the total cost of this tuning of about 1.6 mW (164 fJ/bit at 10 Gbps), including the power delivered to the ring, power lost in the heater switch, and power used to control the finite-state machines and DACs. Thermal stability has another—often underappreciated—aspect. When the modulator absorbs light, locally dissipated optical power will slightly heat the ring.12 This self-heating effect is limited, because as the ring heats and falls off resonance, it absorbs less optical power. However, the effect is strong enough to shift the logical 0 and 1 levels up and down nearly 10 percent of the signal swing (1 sigma). Because the receiver uses a fixed threshold, this DC shift significantly reduces noise margin and increases bit error rate. This self-heating effect happens with a time constant equal to that of the local integrated heater, and so the monitor solution described earlier cannot compensate for it. Instead, we employ a receiver-based decision feedback equalization (DFE) circuit. As the receiver decodes each bit into a logical 0 or 1, it shifts its own threshold accordingly to compensate, thus exactly tracking the DC shift in the transmitted signal. We estimate this DFE circuit to add about 40 percent more power to the base receiver.

Calibration and support circuits are critical Circuits for modulating a resonant ring and translating a photocurrent into a digital voltage are well-known and relatively simple. Perhaps unsurprisingly, we have found that the extra supporting circuits needed to make the full link operational consume much of the total complexity, energy, and area budget.13 These include thermal tuners on the transmitting side and self-heating compensators on the receiving side, as described earlier. They also include clock

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

recovery and metastability circuits to ensure that the receiver’s amplifiers always strobe at the right moment in each bit and that the resulting data can be clocked into a digital data path. We must also control circuit variations, using offset cancellation for our signal amplifiers. Finally, because we wish to avoid the overhead of guaranteed DCbalanced data codes (such as 8b/10b), we use a system that episodically recalibrates the DC input level to enable a fixed input threshold. All these side circuits employ a digitally assisted analog scheme with finitestate machines driving DACs to set appropriate control voltages, and they contribute substantially to the operation of a complete link; hence, the numbers we present later include them and do not simply represent transceiver ‘‘core.’’ In other words, we count all the circuits from the transmitter’s last data path flip-flop to the receiver’s first data path flip-flop.

An optical link with loss and energy budgets With these design principles in mind, we can now describe a many-chip package that leverages silicon photonic links, by starting from the concept of functional silicon chips (‘‘islands’’) atop a silicon interposer. In our case, the islands are face-up, not face-down, and rather than sitting atop the interposer they lie in chip-size pits etched into it. Instead of carrying a rats’ nest of copper wires between islands, the interposer routes WDM-compatible waveguides between sites; the waveguides are passive and thus allow large-scale interposers with minor yield and cost impact. Following our hybrid-integration strategy, the face-up island chips, optimized for transistors and logic, are bonded to small face-down optical chips (‘‘bridges’’) optimized for photonic devices; the bonding employs microsolder as small as 10 mm to densely connect circuits on the islands to optical devices on the bridges. These bridges, using underetched resonant rings with thermal stability circuits, convert electrical data from the island chips into modulated light signals, which route along the bridge until they overhang the interposer. There, they couple into interposer

Page 73

Figure 3. A 64-chip macrochip package, supporting 16 CPUs and 48 DRAM stacks (each integrated atop a logical chip carrier). The CPUs and DRAM carriers, all face-up, use microsolder (see Figure 1) to connect to face-down optical chips. Laser light brought in through attached fibers (not shown) is modulated on the bridges using optical devices driven by circuits on the CPUs and DRAM carriers. Modulated light is then coupled into statically configured passive waveguides embedded into the silicon interposer.

waveguides using small mirror or grating couplers.5 Figure 3 shows a cartoon of the scheme, with an 8 8 array of islands (16 processor islands and 48 DRAM stacks, each atop a memory controller base chip) and waveguides drawn in the silicon interposer. We have traditionally called this a macrochip design. The figure elides many of the packaging complexities inherent to a macrochip many-chip package. This includes power delivery from above using nonsoldered connections, to avoid a ‘‘known-good-die’’ yield problem; a thermal solution; and methods of aligning chips to the silicon interposer. We addressed these concerns with springlike ‘‘claws’’ for power delivery, sinteredcopper microchannel cold water plates, and sapphire balls locked into etched pyramids, although the details are too numerous to cover here.5 Attaching fibers to this package using multicore fiber arrays would allow further scalability by enabling multimacrochip designs.4 This macrochip configuration provisions fully static routes from island to island with no in-route switching. For workloads with high-radix uniform traffic, this seems intuitively practical; but for highly nonuniform traffic patterns, it appears to waste unused bandwidth while underprovisioning where more is needed. However, a closer analysis

....................................................................


73

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 74


λ15:0 Lasers

On-chip waveguide Fiber

λ0

λ1

Waveguide in silicon carrier

λ15

Optical couplers

On-chip waveguide λ0

λ1

λ15

CMOS photonic “bridge” chips Chip-to-chip hybrid bond CMOS electrical chips d15:0

d0

d1

d15 Ibias

–A

TIA

vref

d1 clk

d15:0

Figure 4. A full link.13 The shaded boxes indicate source and destination chip sites, each with an electrical chip—a CPU or memory carrier—and a photonic ‘‘bridge’’ chip. The two are attached through small microsolder hybrid bonds. The waveguide between the sites can either be a long-traversal waveguide in the interposer or a pair of short Manhattan waveguides with vertical couplers. This figure employs the former scheme.

Table 1. Loss budget projections from 20084 and recent results. No. of

Loss budget (dB) per

components

component in 2008

Modulator

1

4

3 dB ‘‘on’’ loss, 7 dB extinction ratio6

1-cm waveguide at source

1

1

0.036/cm waveguides14

Bridge-interposer coupler

2

1

2.8 dB loss through grating couplers15

Mux

1

2.5

1 dB with moderate channel isolation16

40-cm waveguide on interposer Interlayer coupler

1 2

2 1.2

Research in progress 2.8 dB, but eliminated in flat routing design15

1-cm waveguide at destination

1

1

0.036 dB/cm waveguides14

Demux (pass-through)

7

0.1

Research in progress

Demux (dropped)

1

1.5

Research in progress

Component

focused on optical loss from switching elements shows that for any switched network to outperform this type of statically provisioned nonswitched network, optical device loss characteristics will need to dramatically improve from today’s state of the art. For optical devices in the foreseeable future, our static nonrouted network presents a wide range of workloads with the best performance and energy tradeoffs.17 The waveguide routes can be a two-layer grid of boulevards and avenues, or a single layer of long, nonintersecting, semiperimeter routes. The former requires vertical layer-tolayer couplers but has much shorter waveguides than the latter; the choice depends

....................................................................

74

IEEE MICRO

Demonstrated results

on the relative losses of vertical couplers and waveguides. Although we’ve designed both kinds of macrochip interposers, we focus here on the former scheme with two layers of waveguides. Figure 4 shows a cartoon of a full optical link, and Table 1 gives its link budget as we originally projected four years ago,4 along with some of our recent research results. Table 1’s total budget (loss budget per component times number of components) of 17 dB means a 0-dBm launch optical power results in 17 dBm photodetector sensitivity. A 0.8 A/W target responsivity and a 6-dB transmitted extinction ratio gives a peak-to-peak photodetector output

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 75

Table 2. Optical energy projections from 20084 and recent results. Energy budget Components Laser

(fJ/bit) in 2008

Demonstrated energy (fJ/bit)

70

RX sensitivity result (17 dBm) compatible

Demonstrated area (␮m2) Research in progress

with 70 fJ/b13 Transmitter Receiver Mux/demux

80 120 30

66 fJ/b at 10 Gbps18

5,500 including 8 bondpads (4 per chip), each 25 mm 25 mm6

270 fJ/b at 10 Gbps18

7,900 including 4 bondpads6

30 fJ/b at 10 Gbps based on 2.4 mW/FSR

Demonstrated filters from 80 to 32,000;11,19

and 8-way WDM,11 or 49 fJ/b at 10 Gbps

most recent result is 200 per ring filter20

based on 3.9 mW/FSR19 Tuning Grating coupler

Not included N/A

164 fJ/b at 10 Gbps9,13,19

Included in the above 16

N/A

675, shared among 8 channels .................................................................................................................................................................................

FSR ¼ free spectral range.

of 19.2 microamperes (mA), or just about 10 mA of single-ended swing. A TIA-based receiver presents a 4 kO transimpedance gain to convert this 10 mA into 40 mV of single-ended signal, which is then brought to full logic levels using memory-style sense amplifiers.13 Thermal noise analysis of the circuits shows that reasonable bit error rates (better than 10 to 12) would require at least half of this signal swing, which gives our design some sensitivity headroom; we are nonetheless still exploring several different topologies and alternatives for both optical devices and circuits. We project a full system using these links to implement a higher-level error check and retry methodology so that these wireline-quality bit error rates can be managed in a large server-class system. We project the total link energy per bit by considering the major components: laser light, modulators and drivers, WDM muxes and demuxes, and photodetectors and receivers. Four years ago we projected a 2015 research target of 300 fJ/bit at 15 Gbps: 70 fJ/bit of laser light, 80 fJ/bit for the transmitter, 120 fJ/bit for the receiver, and 30 fJ/bit for the WDM multiplexers. We omitted both thermal tuning and considerations of laser wall plug efficiency from these projections, which for WDM sources is only a few percent today. We’re currently engaged in a research effort to push waveguide-coupled WDM laser wall plug efficiency up to the 10 percent that we consider achievable at 1,550 nm,

with up to 15 percent possible at 1,300 nm. If we retroactively apply a laser with 10 percent wall plug efficiency to our original 2008 targets, the laser cost grows to 700 fJ/b, making the total approach 1 pJ/bit. Clearly, the laser dominates the system energy. We can mitigate its effect by further lowering the loss budgets in Table 1, allowing us to use a lower launch power; however, these loss budgets are already aggressive. Another option is to reduce the signal presented to the receiver circuits. Because our current receiver design has headroom, we can cut its input swing in half and reduce the laser launch power to 3 dBm. At 10 percent wall plug efficiency, this cuts the laser contribution down to 350 fJ/b, reducing total power significantly. Table 2 shows our 2008 projections and recent demonstrations in our ongoing research program. This table shows several recent results in mux/ demux filter rings, showing the large design space between ring diameter (and free spectral range) and tuning cost. Another important characteristic is area. With optics and electronics each on their own chips, we count the total area cost as the sum of the area used on both chips for both devices and bonding pads; on the electronics chip we permit circuits under the pads. This gives us the second column in Table 2. Assuming 8-way WDM for sharing vertical couplers, dividing 10 Gbps by the total sum of the chip areas (transmitter plus receiver plus a ring filter plus 1/8 of two

....................................................................


75

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 76


grating couplers) gives an overall area efficiency of better than 700 Gbps/mm2.

O

ptics for silicon VLSI systems is not only a viable technology but also a strong enabler for scaling up highly integrated many-chip packages organized around a silicon interposer. Although most projections of interposer-based systems envision a few chips or chip stacks connected with fine-pitch copper wires, we are pathfinding systems that scale to dozens or hundreds of such chips. At this scale, with wafer-sized interposers supporting the equivalent of a server-in-a-package, silicon photonics is a key ingredient. Our continued design, fabrication, and characterization of optical components to the specifications we’ve outlined will lead to their demonstration in full silicon photonic links running at tens of Gbps. As we scale data rates up to 20 Gbps, we anticipate having to solve many challenges in order to meet our loss targets. However, our results thus far indicate a roadmap to commercial deployment of silicon nanophotonics that is MICRO both feasible and sensible.

4. A. Krishnamoorthy et al., ‘‘Computer Systems Based on Silicon Photonic Interconnects,’’ Proc. IEEE, vol. 97, no. 7, 2009, pp. 1337-1361. 5. J. Cunningham et al., ‘‘Integration and Packaging of a Macrochip with Silicon Nanophotonic Links,’’ IEEE J. Selected Topics in Quantum Electronics, vol. 17, no. 3, 2011, pp. 546-558. 6. X. Zheng et al., ‘‘Ultra-Efficient 10 Gb/S Hybrid Integrated Silicon Photonic Transmitter and Receiver,’’ Optics Express, vol. 19, no. 6, 2011, pp. 5172-5186. 7. H. Thacker et al., ‘‘Hybrid Integration of Silicon Nanophotonics with 40nm-CMOS VLSI Drivers and Receivers,’’ Proc. 61st Ann. IEEE Electronic Components and Technology Conf., IEEE, 2011, pp. 829-835. 8. X. Zheng et al., ‘‘Ultralow Power 80 Gb/s Arrayed CMOS Silicon Photonic Transceivers for WDM Optical Links,’’ IEEE J. Lightwave Technology, vol. 30, no. 4, 2012, pp. 641-650. 9. A. Krishnamoorthy et al., ‘‘Exploiting CMOS Manufacturing to Reduce Tuning Requirements for Resonant Optical Devices,’’ IEEE Photonics J., vol. 3, no. 3, 2011, pp. 567-579. 10. J. Cunningham et al., ‘‘Compact, Thermally-

Acknowledgments We gratefully acknowledge the support of Oracle’s optics and packaging team under Kannan Raj, and the VLSI Research group. This work was supported in part by DARPA under Agreement HR0011-08-090001. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the US Government.

Tuned Resonant Ring Muxes in CMOS with Integrated Backside Pyramidal Etch Pit,’’ Proc. Optical Fiber Communication Conf. and Exposition (OFC/NFOEC 11), IEEE, 2011, pp. 1-3. 11. P. Dong et al., ‘‘Wavelength-Tunable Silicon Microring Modulator,’’ Optics Express, vol. 18, no. 11, 2010, pp. 10,941-10,946. 12. X. Zheng et al., ‘‘Enhanced Optical Bistability from Self-Heating Due to Free Carrier Absorption in Substrate-Removed Silicon

.................................................................... References 1. A. Krishnamoorthy et al., ‘‘Progress in Low-

13. F. Liu et al., ‘‘10 Gbps, 5.3 mW Optical

Power Switched Optical Interconnects,’’

Transmitter and Receiver Circuits in 40 nm

IEEE J. Selected Topics in Quantum Elec-

CMOS,’’ IEEE J. Solid-State Circuits, vol. 47,

tronics, vol. 17, no. 2, 2011, pp. 357-376.

....................................................................

76

IEEE MICRO

Ring Modulators,’’ Optics Express, vol. 20, no. 10, 2012, pp. 11,478-11,486.

no. 9, 2012, pp. 2049-2067.

2. J. Shin et al., ‘‘The Next-Generation 64b

14. G. Li et al., ‘‘Ultralow-Loss High Density SoI

SPARC Core in a T4 SoC processor,’’ Proc.

Optical Waveguide Routing for Macrochip

IEEE Int’l Solid-State Circuits Conf. (ISSCC 12), IEEE, 2012, pp. 55-56.

Interconnects,’’ Optics Express, vol. 20, no. 11, 2012, pp. 12,035-12,039.

3. J. Cunningham et al., ‘‘Scaling Vertical-Cavity

15. J. Yao et al., ‘‘Grating-Coupler Based Low-

Surface-Emitting Laser Reliability for Petas-

Loss Optical Interlayer Coupling,’’ Proc.

cale Systems,’’ Applied Optics, vol. 45,

8th IEEE Int’l Conf. Group IV Photonics

no. 25, 2006, pp. 6342-6348.

(GFP 11), IEEE, 2011, pp. 383-385.

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

16. X. Zheng et al., ‘‘A Tunable 1 4 Silicon CMOS Photonic Wavelength Multiplexer/ Demultiplexer for Dense Optical Intercon-

Page 77

performance modeling. Koka has an MS in electrical engineering from the University of Wisconsin, Madison.

nects,’’ Optics Express, vol. 18, no. 5, 2010, pp. 5151-5160. 17. P. Koka et al., ‘‘A Micro-Architectural Analysis of Switched Photonic Multi-Chip Interconnects,’’ Proc. 39th Int’l Symp. Computer Architecture (ISCA 12), IEEE, 2012, pp. 153-164. 18. X. Zheng et al., ‘‘Ultra-Low Power Arrayed CMOS Silicon Photonic Transceivers for an

Jon Lexau is a senior consulting member of the technical staff at Oracle Labs. His research interests include electrical transceivers for optical interconnects and developing DDR memory interfaces for new DRAM architectures. Lexau has an MS in electrical engineering from Stanford University. He is a member of IEEE.

80 Gbps WDM Optical Link,’’ Proc. Optical Fiber Communication Conf. and Exposition (OFC/NFOEC 11), IEEE, 2011, pp. 1-3. 19. J. Cunningham et al., ‘‘Highly-Efficient Thermally-Tuned Resonant Optical Filters,’’ Optics Express, vol. 18, no. 18, 2010, pp. 19,055-19,063. 20. I. Shubin et al., ‘‘Integration, Processing and Performance of Low Power Thermally Tunable CMOS-SOI WDM Resonators,’’

Guoliang Li is a consulting member of the technical staff at Oracle Labs. His research interests include photonic devices and systems for various communication needs, in particular, silicon photonics for optical interconnects in computer systems. Li has a PhD in electrical and computer engineering from the University of California at San Diego.

Optical and Quantum Electronics, vol. 44, nos. 12-13, 2012, pp. 589-604.

Ron Ho is an architect at Oracle Labs. His research interests include off-chip and onchip communication, memory hierarchies and architectures, and database acceleration hardware. Ho has a PhD in electrical engineering from Stanford University. He is a senior member of IEEE. Philip Amberg is a senior hardware engineer at Oracle Labs. His research interests include chip-to-chip I/O technologies and low-power database processing for large-scale analytics. Amberg has an MS in electrical engineering from Stanford University. Eric Chang is a staff researcher at Oracle Labs. His research interests include efficient data communication, particularly high-speed optical link transceivers. Chang has a BS in electrical engineering and computer sciences from the University of California, Berkeley. He is a member of IEEE. Pranay Koka is a principal engineer at Oracle Labs. His research interests include parallel architectures, system interconnects, workload analysis, and large-scale

Frankie Y. Liu is a consulting member of the technical staff at Oracle Labs. His research interests include streaming applications, lowpower clock recovery circuits, and optical transceivers. Liu has a PhD in electrical engineering from Stanford University. He is a member of IEEE. Herb Schwetman is a consulting member of the technical staff with Oracle Labs. His research interests include systems modeling and performance evaluation, parallel processing, and computer architecture. Schwetman has a PhD in computer science from the University of Texas at Austin. He is a member of IEEE. Ivan Shubin is a principal hardware engineer at Oracle Labs. His research interests include advanced packaging solutions and platforms for electronic, optoelectronic, and MEMS applications, wafer scale packaging, 3D integration, and novel silicon photonic components. Shubin has a PhD in electrical engineering from the University of Central Florida. He is a member of IEEE. Hiren D. Thacker is a principal engineer at Oracle Labs. His research interests include the design, integration, and packaging of

....................................................................


77

[3B2-9]

mmi2013010068.3d

23/1/013

12:18

Page 78


high-performance multichip 2.5D and 3D components deploying advanced electrical and photonic interconnects. Thacker has a PhD in electrical and computer engineering from the Georgia Institute of Technology. He is a member of IEEE. Xuezhe Zheng is a consulting member of the technical staff at Oracle Labs. His research interests include WDM Si photonics for advanced interchip and intrachip interconnects. Zheng has a PhD in optical instruments from Tsinghua University. He is a senior member of IEEE. John E. Cunningham is a senior consulting member of the technical staff at Oracle Labs. His research interests include advanced packaging initiatives in interchip I/O and silicon nanophotonic solutions for data communications. Cunningham has a PhD

....................................................................

78

IEEE MICRO

in physics from the University of Illinois at Urbana-Champaign. Ashok V. Krishnamoorthy is an architect at Oracle Labs. His research is focused on the use of silicon photonic communication links within high-performance engineered computing systems. Krishnamoorthy has a PhD in applied physics and electrical engineering from the University of California, San Diego. He is a fellow of the OSA and the IEEE. Direct questions and comments about this article to Ron Ho, Oracle Labs, Mailstop 5IP2, 500 Oracle Parkway, Redwood Shores, CA 94065; [email protected].

silicon photonic interconnects for large-scale computer systems

silicon photonic interconnects for large-scale computer systems

Suggest Documents

Computer Systems Based on Silicon Photonic Interconnects - CiteSeerX

Broadband Silicon Photonic Electrooptic Switch for Photonic

Silicon and silicon nitride photonic circuits for ... - OSA Publishing

Multilayer Silicon Nitride-on-Silicon Integrated Photonic Platform for ...

SILICON PHOTONIC MICRORING LINKS FOR HIGH-BANDWIDTH ...

Integrated Silicon Photonic Transmitter for Polarization-Encoded ...

Silicon Photonic Devices for Advanced Modulation

Fundamentals of Silicon Photonic Devices

Silicon Photonic Optical Receiver with

A two-stage photonic crystal fiber / silicon photonic ... - OSA Publishing

A two-stage photonic crystal fiber / silicon photonic ... - OSA Publishing

Towards Chip-Scale Plasmonic Interconnects - UCSB Computer ...

Strain-tunable silicon photonic band gap ...

Fundamentals of Silicon Photonic Devices - Mellanox

Low Power Silicon Photonic Transceivers - IEEE Xplore

Interposer-to-Interposer Electrical and Silicon Photonic ...

Microheater-integrated silicon coupled photonic crystal microcavities

Transmissive silicon photonic dichroic filters with ...

s silicon photonic pulse-amplitude modulation

A scalable silicon photonic chip-scale optical

Photonic analog computing with integrated silicon waveguides

Mitigating Congestion in High-Speed Interconnects for Computer ...

Nanostructured silicon as a photonic material - CiteSeerX

QUANTUM DOT INTEGRATED SILICON PHOTONIC ...