The iPOINT Testbed for Optoelectronic ATM Networking

c Copyright by

John W. Lockwood, 1993

UILU-ENG-93-0401

The iPOINT Testbed for Optoelectronic ATM Networking

John W. Lockwood

The University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering NSF/ERC Center for Compound Semiconductor Microelectronics

Beckman Institute, VLSI Group 405 North Mathews Ave Urbana, IL 61801

[email protected]

(217) 244-1565 Fax: (217) 244-8371

May, 1993

ABSTRACT This document presents the Illinois Pulsar{based Optical INTerconnect (iPOINT) testbed, a collaborative research eort between the Microelectronics Center and the Computer Science Department at the University of Illinois investigating the use of Optical Electronic Integrated Circuits (OEICs) for high{bandwidth computer networking. Optoelectronic components are the enabling technology for multigigabit networks. This document discusses the system requirements and the device speci cations of the individual optical and electronic components of these networks. The packet switching technology required for computer network and variable bit rate trac patterns is investigated. Architectures are discussed for multigigabit packet switches. The iPOINT Asynchronous Transfer Mode (ATM) hardware prototype packet switch is presented. The network software and protocols needed to eciently transport messages between the workstation's internal memory and the high{bandwidth ber interface are examined. The iPOINT user{space network software and kernel{space device driver that were developed to provide simultaneous audio, image, and le transfers among Sun SPARCstation 10's using ATM cells are introduced. Memory{to{ ber bandwidth benchmark data are presented. Suggestions for future optoelectronic device research and the topics of investigation for the next phase of the iPOINT testbed are given in Chapter 5.

iii

ACKNOWLEDGEMENTS I would like to thank Professors S. M. Kang and S. G. Bishop for their support and guidance throughout the iPOINT project. I would like to thank Professor R. H. Campbell who made the Xunet interaction at this university possible. I would like to acknowledge Chao Cheong for his work in programming the user{space application programs. Noted also is Haoran Duan for his work on the queue module. I would like to thank Ben Cox, an undergraduate CCSM intern, for his work in the development of the STREAMS device driver and gathering the performance data of the Fore device. Finally, thanks are due to Ben Sander, an undergraduate, for his work on the FPGA Pulsar switch.

iv

TABLE OF CONTENTS CHAPTER 1 INTRODUCTION :

PAGE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

1.1 The Role of Optics in Networking : : : : : : : : : : : 1.1.1 Integrated receivers : : : : : : : : : : : : : : : 1.1.2 Optical laser sources : : : : : : : : : : : : : : 1.1.3 Integrated laser drivers : : : : : : : : : : : : : 1.1.4 Distributed feedback laser arrays : : : : : : : 1.1.5 External cavity semiconductor lasers : : : : : 1.1.6 Multichip optical/electronic modules : : : : : 1.1.7 Optical ber : : : : : : : : : : : : : : : : : : : 1.2 Packet Switch Networking : : : : : : : : : : : : : : : 1.2.1 Nomenclatures : : : : : : : : : : : : : : : : : 1.2.2 Figure of merit for a packet switch : : : : : : 1.2.3 The bus-based switch : : : : : : : : : : : : : : 1.2.4 Output port contention : : : : : : : : : : : : : 1.2.5 The full crossbar switch : : : : : : : : : : : : 1.2.6 The Knockout switch : : : : : : : : : : : : : : 1.2.7 The Pulsar switch : : : : : : : : : : : : : : : : 1.2.8 The Starlite switch : : : : : : : : : : : : : : : 1.3 ATM Networking : : : : : : : : : : : : : : : : : : : : 1.3.1 The ATM cell : : : : : : : : : : : : : : : : : : 1.3.2 Adaptation layers : : : : : : : : : : : : : : : : 1.3.3 Signalling : : : : : : : : : : : : : : : : : : : : 1.4 Optical Computer Network Testbeds : : : : : : : : : 1.4.1 IBM's Rainbow : : : : : : : : : : : : : : : : : 1.4.2 Columbia University's TeraNet : : : : : : : : 1.4.3 Colorado's WDM multiprocessor interconnect

2 THE iPOINT PROTOTYPE ATM SWITCH : 2.1

: The iPOINT Hardware Design Testbed : : : : : : 2.1.1 Design capture and simulation : : : : : : : 2.1.2 Field programmable gate arrays : : : : : :

: : : :

2.1.3 The iPOINT prototype FPGA ATM switch 2.2 The Physical and Data-link Layers : : : : : : : : : 2.2.1 The optical ber : : : : : : : : : : : : : : : 2.2.2 Semiconductor devices : : : : : : : : : : : : 2.2.3 Information coding : : : : : : : : : : : : : : 2.2.4 Clock recovery : : : : : : : : : : : : : : : : v

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

1 2 3 3 5 6 6 6 7 8 8 9 10 10 11 11 12 13 13 14 15 15 15 16 17 18

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

21 21 21 22 23 24 24 25 25 26

2.2.5 Serialization/deserialization 2.2.6 The Taxi interface : : : : : 2.3 The Queue Module : : : : : : : : : 2.4 The Core Switch : : : : : : : : : : 2.4.1 Cell scheduling : : : : : : : 2.4.2 Prototype scheduling : : : : 2.4.3 Scheduling improvements : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

26 27 27 28 29 29 30

: : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

31 31 31 31 32 33 33 34 34 35 35

: : : : : : : : : : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

37 37 37 38 38 39 41 42

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

3 THE iPOINT WORKSTATION SOFTWARE 3.1 The Hardware Environment : : : : : : 3.1.1 The Sun SPARCstation 10 : : : 3.1.2 The Fore SBA-100 host adapter 3.2 User-space Networking Program : : : : 3.2.1 Virtual paths : : : : : : : : : : 3.2.2 Software initialization : : : : : 3.2.3 Digital audio : : : : : : : : : : 3.2.4 Signalling : : : : : : : : : : : : 3.2.5 Example client and server : : : 3.3 Kernel-space ATM Networking : : : : :

4 PERFORMANCE BENCHMARKS :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

: : : : : : : : : :

4.1 Workstation Bandwidth : : : : : : : : : : : : : : : 4.1.1 Overhead limitations : : : : : : : : : : : : : 4.1.2 Latency limitations : : : : : : : : : : : : : : 4.2 Benchmark Test Conditions : : : : : : : : : : : : : 4.2.1 User-space performance : : : : : : : : : : : 4.2.2 UIUC STREAMS device driver performance 4.2.3 TCP/IP performance : : : : : : : : : : : : :

5 SUMMARY AND FUTURE RESEARCH :

45 45 46 46 47 48 48 REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 : : : : : : : : : :

5.1 Summary : : : : : : : : : : : : : : : : : : : : : : 5.2 Future Research : : : : : : : : : : : : : : : : : : : 5.2.1 Microelectronic device research : : : : : : 5.2.2 Wide-area interoperable ATM networking 5.2.3 Multigigabit packet switching : : : : : : : 5.2.4 Network software development : : : : : : :

vi

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

: : : : : : :

LIST OF FIGURES Figure 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 5.1

Page Present and future high-speed networking : : : : : : : : : : Four-channel receiver array : : : : : : : : : : : : : : : : : : : Four-wavelength receiver array : : : : : : : : : : : : : : : : : Integrated laser array, photodetector, and transistors : : : : Optical/electronic multichip module : : : : : : : : : : : : : : The Knockout switch : : : : : : : : : : : : : : : : : : : : : : The Pulsar switch : : : : : : : : : : : : : : : : : : : : : : : : ATM cell structure : : : : : : : : : : : : : : : : : : : : : : : IBM's Rainbow WDM network : : : : : : : : : : : : : : : : Columbia University's WDM TeraNet : : : : : : : : : : : : : Eight-node Shuenet : : : : : : : : : : : : : : : : : : : : : : Design steps : : : : : : : : : : : : : : : : : : : : : : : : : : : The prototype iPOINT switch : : : : : : : : : : : : : : : : : 4B/5B symbol coding : : : : : : : : : : : : : : : : : : : : : : Deserialization circuit : : : : : : : : : : : : : : : : : : : : : : Queue module and Taxi interface detail : : : : : : : : : : : : Control word from queue module : : : : : : : : : : : : : : : Host interface software layers : : : : : : : : : : : : : : : : : UIUC User-space ATM networking program : : : : : : : : : STREAMS ATM software : : : : : : : : : : : : : : : : : : : Point-to-point workstation link : : : : : : : : : : : : : : : : SBA-100 throughput using AAL3/4 : : : : : : : : : : : : : : SBA-100 throughput using AAL5 : : : : : : : : : : : : : : : SBA-100 throughput using UIUC STREAMS module : : : : Script to generate TCP/IP performance data : : : : : : : : : SBA-100 throughput vs. batch size for Fore's TCP/IP driver Beckman-DCL Pulsar/XUNET optical link : : : : : : : : : :

vii

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : : :

2 4 4 5 7 11 13 14 16 18 19 23 24 25 27 28 28 33 34 35 39 40 41 42 43 43 47

CHAPTER 1 INTRODUCTION Optical devices and ber-optic communications links are the enabling technology for multigigabit networking. Currently, optics are extensively used for long-distance telecommunication services. Using conventional technology, one single-mode ber pair is currently capable of simultaneously transmitting over 30,000 Time Division Multiplexed (TDM) telephone conversations. Faster optical/electronic devices and Wavelength Division Multiplexing (WDM) techniques can increase the bandwidth of one ber beyond 1 terabit/sec. Optical devices can be employed in computer networks to provide high-bandwidth networking resources for desktop workstations. Improved network resources will enhance the performance of conventional network applications such as Network File System (NFS), Remote login (Rlogin), and File Transfer Protocol (FTP). By networking clusters (or farms) of workstations, computations can be distributed to run in parallel. Multimedia applications have also emerged for workstations, providing image, audio, and video services to the desktop [1]. Because of the burstiness of computer trac, packet switching, rather than simple circuit switching, is better suited for data networking. A workstation's network interface often remains idle between times of peak network usage. Fiber Distributed Data Interface (FDDI) was introduced to provide workstations with an optical, industry-standard, lowcost, packet interface. Asynchronous Transfer Mode (ATM) provides these same bene ts, as well as high-bandwidth, low-latency, scalable, local and wide-area network resources. The goals of the iPOINT project are to explore computer networking systems which can bene t through the use of high-bandwidth optical devices, to develop a testbed where new ideas can be eciently prototyped, and to demonstrate the feasibility of the above ideas on high-performance desktop workstations [2]. 1

Erbium-doped fiber

Optic-coupler

Optical preamplification

980 nm pump laser

Optic-coupler

Optical Amplifier

Integrated receiver/ transimpedance amp. Serialization/deserialzation

RX-OEIC Other similar devices

TX-OEIC

Narrow-wavelength DFB

Lambda=1.55 um Single-mode Dispersion-shifted fiber

SONET: OC-384 (?) 19.9 Gbps data framing

Feedback/drive electronics Serialization/deserialization

SONET: OC-48 2.48 Gbps data framing

Wavelengths ATM: packet switch

Sonet OC-384 (?) 19.9 Gbps data framing

SONET: OC-48 2.48 Gbps data framing ATM packet switch

...

Short-haul fiber links

DS3 service OEICs

Short-haul fiber links

T1 service PBX Phone service

Desktop workstation 1 Gbps network interface

Lower rate services... OEICs

SONET: OC-12 622 Mbps data framing ...

SONET: OC-12 622 Mbps data framing

Video/data file servers

To remote OC-48 switch

Figure 1.1 Present and future high-speed networking

1.1 The Role of Optics in Networking Optic devices serve multiple roles in future communication networks, as illustrated in Figure 1.1. For long-haul links, long wavelengths, at or near = 1:55 m, can be transmitted with minimal dispersion and loss over single-mode, dispersion-shifted ber. Through Wavelength Division Multiplexing (WDM), several signals can be transmitted on dierent wavelengths within a single ber. Erbium-doped ampli ers with short-wavelength pump lasers can be used to maximize the distance between repeaters. Optoelectronic Integrated Circuits (OEICs) can provide an interface between the optical and electronic signals. Time-division multiplexors hierarchically combine several lower 2

bit-rate streams into a single faster stream, which forms the basis for the Synchronous Optical NETwork (SONET). ATM switches allow bursty trac sources to statistically multiplex several data streams onto a single long-haul transmission link. Multimode ber and short-wavelength optic components can be used for short-haul links between the packet switch and the individual workstations. Integrated optoelectronic devices serve an important role in future optical networks. Compound semiconductor devices are currently used as the laser sources, modulators, receivers, ampli ers, pump lasers, and high-speed logic components of the system described above. Integration of these devices can reduce the packaging cost and improve performance.

1.1.1 Integrated receivers A prime candidate for integration is the optical receiver module. In [3], a GaAs MESFET integrated optical receiver was demonstrated for = 0:85 m with a 1 Gbps input Non-Return-to-Zero (NRZ) input. This receiver consisted of approximately 2500 devices including the MSM photodetector, transimpedance ampli er, decision circuit, bias controller, and clock recovery circuit. A 4 GHz MSM detector integrated with an AlGaAs/InGaAs/GaAs MODFET transimpedance ampli er has been built within the microelectronics center at this university [4]. An array of n such devices could potentially serve in one of two con gurations. In one con guration, each receiver could feed one port of an n-port switch, as shown in Figure 1.2. With the addition of a diraction grating, the integrated receiver array could be used to receive parallel data channels on multiple wavelengths, as shown in Figure 1.3.

1.1.2 Optical laser sources Laser diodes are also candidates for integration. Typical laser diodes have a p-n junction quantum-well heterostructure. Spontaneous emission occurs along the length of the device, with the cavity's mirrors typically formed by cleaved edges of the crystal. The wavelength is determined by the material properties and quantum-well thickness. 3

Receiver array Fibers

Figure 1.2 Four-channel receiver array

Receiver array

Fiber

Micro lens Grating

Mirror

Figure 1.3 Four-wavelength receiver array

4

Interconnection MESFETs Monitor photodiode Microcleaved facet Laser diode

Au/Zn/Au Au/Zn/Au Quantum-well SI GaAs

(Wada, et. al.)

Figure 1.4 Integrated laser array, photodetector, and transistors Variations in the wavelength and intensity occur as a function of the drive current and temperature. A thermoelectric cooler is typically required to maintain stable laser operation. To maintain a constant output power, a monitor photodiode is often incorporated into a feedback loop of a current control circuit.

1.1.3 Integrated laser drivers By etching multiple stripes, it is possible to build a laser array. Through additional processing techniques, it is possible to integrate optical detectors and drive electronics on the same substrate as the laser array. A four-channel AlGaAs/GaAs single quantum-well laser array operating at = 834 nm has been monolithically integrated with photodetectors and MESFET transistors, as shown in Figure 1.4. A microcleaved facet was used for the rear mirror of the laser cavity, allowing the integration of additional devices behind the laser. The photodiodes allowed monitoring and controlling the optical output power of each laser independently. Each laser had a threshold current of approximately 20 mA, and was directly modulated. The drive electronics for each circuit consisted of three GaAs MESFET transistors, arranged in a dierential pair con guration. Each laser module was separated by 1 mm and had a total length of 2 mm, for a complete geometry of 4 mm 2 mm. The circuit was demonstrated to operate at 1.5 Gbps using from an ECL input signal. The crosstalk between channels was measured to be -28 dB at 500 MHz and -14 dB at 1 GHz [5].

5

1.1.4 Distributed feedback laser arrays The Distributed Feedback Laser (DFB) uses an etched periodic grating along the ridge waveguide for distributed feedback. The DFB lasers eciently produce a narrow linewidth with the wavelength determined by the period of the grating. A multiwavelength array can be fabricated by varying the grating period for each device. Because of the ne resolution of the grating, e-beam lithography is typically employed to pattern the grating [6]. Because the laser's feedback is due to the grating rather than the facets, the DFB laser does not require microcleaved facets which have yield problems in production.

1.1.5 External cavity semiconductor lasers A laser array with individually addressable and wavelength separated sources can be fabricated using an external cavity. By coating one laser facet with an Anti-Re ective (AR) material, the laser cavity can be formed between the far facet of the device and an external curved mirror. The wavelength separation is achieved by placing an external diraction grating between the laser array and the mirror. The advantage of this con guration over other external cavity multiwavelength systems is the elimination of the optical crosstalk due to a shared gain medium [7].

1.1.6 Multichip optical/electronic modules While monolithic integration of optical detectors, lasers, and electronic modules would be ideal, incompatibilities with material systems and processing techniques make this goal dicult to achieve. Enhancement of one device's performance often degrades the performance of another. Thus, tradeos are required to optimize the performance of the entire system. Currently, multichip optical/electronic modules oer the ability to combine laser arrays, detector arrays, and control electronics into one compact module. A module consisting of an integrated four-channel receiver array, an integrated fourchannel transmitter array (as described in Section 1.1.3), and a 4 4 electronic GaAs switch has been demonstrated to operate at over 560 Mbps per channel [8], as shown in 6

Electrical inputs Receiver array

Input fibers

Laser diode array

4x4 GaAs switch

MSM/FET

Capacitive coupling

Output fibers

Chip carrier

Figure 1.5 Optical/electronic multichip module Figure 1.5. Four optical bers are coupled to the receiver array, which in turn electrically drives the switch module through a capacitive coupling. After permuting the signals, the switch next electrically drives the laser array using another capacitive coupling. Finally, the laser array transmits the optical signals through the output bers. The switch used in this module consisted of a simple electronic matrix switch. This document investigates integrated packet switch modules which could potentially replace the above module.

1.1.7 Optical ber Before concluding this section on optical/electronic devices, it is important to examine the characteristics of the ber through which optical signals are transmitted. For shorthaul communications (less than 1 km), multimode ber is often favored because of its large diameter and numerical aperature, allowing most of the optical power to couple to the ber even with relaxed geometric tolerances. Because each mode has a slightly dierent velocity, modal dispersion limits the maximum bandwidth-distance product. By grading the index of refraction across the ber, this eect can be minimized. For a graded-index 50 m core diameter ber, the modal dispersion ranges from 400 MHzkm to 600 MHzkm at = 850 nm and from 400 MHzkm to 1500 MHzkm at = 1300 nm [9]. At = 834 nm, the ber attenuation typically ranges between 2{3 dB/km, primarily due to Rayleigh scattering. 7

For long-haul communication links, single-mode ber is required. The bandwidthdistance-linewidth product is limited by chromatic dispersion. Narrow-linewidth laser sources maximize the bandwidth-distance product. Dispersion-shifted ber minimizes dispersion at = 1:55 m, where loss is approximately 0.2 dB/km. Erbium-doped ber ampli ers allow optical boosting of signals with wavelengths in the vicinity of 1:55 m, without the need for electronic repeaters. Soliton transmission systems allow dispersionfree pulse propagation by balancing ber nonlinearities with chromatic dispersion. Singlemode, long-wavelength, WDM, soliton transmission systems are currently nding their way into undersea transmission system experiments [10].

1.2 Packet Switch Networking In computer networks, the data trac is typically bursty, consisting of short packets of data with various destinations. Unlike circuit switching which can dedicate bers or reserve time slots for each channel, packet switching requires queuing, de ecting, or dropping packets when contention for output ports or internal blocking occurs. According to Paul Green, the architect of IBM's Rainbow optical computer network, \circuit-switching is uninteresting; packet-switching is a must [11]."

1.2.1 Nomenclatures For the packet switches discussed in this document, the following notation will be used. It will be assumed that the switches have an equal number of input and output ports, n. Each port operates at a bit rate of b bits per second (bps). The packet length is de ned to be L bits, which for ATM cells, as will be discussed in Section 1.3, is 424 bits. The distribution of the input trac to the switch as a function of time will be denoted as T (i; t), where i = 1; 2; : : : ; n ? 1; n is the trac at input port i as a function of time. T (t) will be used to denote the trac across all input ports. The latency between sending and receiving a message will be denoted by (T (t)), which is the sum of the propagation time,

8

p , and the queuing time, q (T (t)).

pd (T (t)).

The probability of dropping a cell will be denoted by

1.2.2 Figure of merit for a packet switch The gure of merit for a packet switch can be partially characterized as FOM = f (n; b; L; T (t); q(T (t)); pd(T (t))) For central-oce packet-switching, a scalable and modular switch supporting n > 100 ports is desirable, while for local area networking and multistage interconnection networks, switches with at least three ports can be quite useful. The aggregate bandwidth of the switch, n b, is useful for high-bandwidth applications, such as high-de nition television and visualization applications. The number of packets per second, b=L, is a critical parameter for computing applications, as numerous short messages are required to synchronize computation. Currently, HIPPI switches easily achieve multigigabit aggregate bandwidths, but compare rather poorly in terms of their ability to switch short packets. A minimum queuing latency, q , is important, especially for distributed memory applications, where the latency should not greatly exceed the memory access time. Real-time multimedia applications can tolerate a slightly relaxed value for q , but demand a strict upper bound for the worst-case delay. The dropping of a packet, with probability pd, can severely degrade the performance of an application. Typically, the loss of a single packet requires a higher-layer network protocol to retransmit an entire message. The design of the Knockout switch, as will be described in Section 1.2.6, chooses pd to approximate the probability of dropping a packet due to a transmission error. The performance of a switch also depends on the distribution of the trac, T (t). While bus-based switches provide good performance even for worst-case trac patterns, the performance of many multistage switches severely degrades for certain correlated input trac patterns. Finally, it should be noted that an interconnection network that reorders cells is unacceptable for ATM networking because cells do not, in general, carry a sequence number. 9

1.2.3 The bus-based switch Bus-based packet switches use time-division multiplexing of a shared media to provide full connectivity between n switch ports. The Xunet II switch, for example, uses a 32-bit backplane bus operating at 18.5 MHz to switch packets of data from input to output ports [12]. Upon arriving at an input port, a packet contends for a time slot on the shared bus. A prioritized round-robin bus-contention scheme can ensure that a high-priority message is not delayed by low-priority messages and provide fair usage of the bus for messages of equal priority. Once the appropriate time slot has been determined, the message is broadcast to all output ports on the shared bus. The destination port, upon detecting its own address, receives the packet from the bus. Because all messages are broadcast on the shared bus, the aggregate throughput of the switch is determined by the throughput of the backplane bus. to build a nonblocking, n=32-port, b=1 Gbps/port switch would require a backplane bus capable of providing a throughput of n b = 32 Gbps. Using o-the-shelf, single-ended, bus-interface logic operating at 20 MHz, the backplane would require 32 109 =20 106 = 1; 600 wires.

1.2.4 Output port contention In the worst case, if all n input ports have packets destined for the same output port, a nonblocking switch would deliver packets to the output port at a rate of n b bps. A high-performance queue which can accept data at the full bus rate has been implemented for the the Xunet II switch. The 32-bit wide data from the bus is rst shifted into a cellwide holding register. A large pool of Dynamic Random-Access Memory (DRAM) is used to queue the ATM cells. The memory operates on words of a width equal to one half the size of an ATM cell. Linked lists are used to implement queues. A prioritized round-robin service discipline is used to determine the order in which cells are transmitted from the line card [13]. A nonblocking output-buered shared-memory queue operating at the the full bus rate of n b bits/sec can provide optimal performance. To achieve the theoretical maximum throughput, the width of the memory would be equal to the entire width of one cell, 10

Inputs

...

Broadcast matrix

...

...

...

...

...

N-Packet filters N:K Concentrator K:1 Output buffer Output

Figure 1.6 The Knockout switch allowing a cell to be transferred to and from the memory using only a total of two memory operations. Double buering allows overlapping of bus transactions with memory operations. Neglecting all delays except memory access time, and assuming that the memory has an access time of ta, the maximum throughput is limited to b = L=(2 ta). For an ATM switch using a fast, high-capacity, commercially available DRAM memory with an access time of 60 ns; the maximumthroughput is limited to bits=(260 ns) = 3:53 Gbps. From this analysis, it is evident that the current Xunet queue, as described above, provides near-optimal performance.

1.2.5 The full crossbar switch A full crossbar switch uses an n n array with switched crosspoints to connect any input to any output. A crossbar can be implemented electrically or optically by providing a transmission gate between each input and output pair. The amount of space required for the mesh of transmission gates is O(n2). The crossbar switch is internally nonblocking. Unlike the bus-based switch, the data rate on any internal link of the crossbar is no more than the incoming line rate of b bps. A mechanism is required to resolve output-port contention.

1.2.6 The Knockout switch The Knockout switch, as shown in Figure 1.6, uses packet lters, packet concentrators, and a small output buer to resolve output-port contention. The packet lters 11

pass only those packets whose destination address matches the output port address. The concentrator uses a tournament-like algorithm to funnel as many as K packets from the N inputs to the output buer. Each output buer, much like the one described in Section 1.2.4, schedules packets for transmission. If more than K packets contend for the same output port at any given time, the extra packets will be dropped. The value of K is typically chosen such that the probability of dropping a packet due to output contention is approximately equal to the probability of dropping a packet due to transmission line errors. If bit errors occur during the transmission of a packet, they are usually detected as a Cyclic Redundancy Check (CRC) error, causing the entire message to be discarded. For a message of length L with probability pe for each bit error, the probability of a message with one or more errors is 1 ? (1 ? pe )L L pe . Thus, for a total bit-error rate of pe = 10?9 , a message that is exactly one ATM cell long may be dropped with a probability of pd = 4:24 10?7 . For random trac, it was determined that a K = 8 knockout switch with an output buer capacity of 40 packets, for an oered load of 84%, would achieve pd < 10?6 [14]. For correlated trac, larger values of K are necessary.

1.2.7 The Pulsar switch The Pulsar switch uses a fast word-parallel rotating shift register ring to provide nonblocking packet switching [15]. Figure 1.7 shows a complete shift-register ring with a block diagram for one of the n ring slices. Input data are rst deserialized into a word of size w bits/word. The input to each shift register may come from either the previous stage or from the holding register. Each slice of the rotating ring is reserved for one I/O port. When the outgoing data in the holding register see the proper phase of the ring and the token indicates the slot is available, the data are loaded from the holding register. To handle packet priorities, a priority register is also rotated in the ring. When the ring reaches its home position, the data are latched from the ring register to the transmit register. The transmit register then serializes and transmits the data.

12

W

W

RX Reg

DIN Hold Reg

from previous ring stage

W

Priority

Ring Reg

W

TX Reg

2:1 MUX

to next ring stage

Token

Ports

DOUT Ring Phase Priority Token

Logic State Machine

QCONTROL

Figure 1.7 The Pulsar switch 1.2.8 The Starlite switch The Starlite switch uses a multistage Banyan switch to transfer packets from the n inputs to the n outputs using less than O(n2) interconnection points. Unlike the crossbar switch, the Banyan switch can suer from internal blocking for certain input patterns. By introducing a Batcher sorting stage before the Banyan switch, it is possible to prevent internal blocking. A copy network can also be included in the switch to support multicast and broadcast operations. A trap stage is used to recirculate packets that contend for the same output port. The size of the switch is O(n(log n)2) [16].

1.3 ATM Networking The Asynchronous Transfer Mode (ATM) format has evolved to be a standard for xed-length packet switching [17]. Within the speci cations are the length and format of the ATM cell, adaptation layer functions, and (in the near future) signalling. Cells from multiple sources and multiple destinations are asynchronously multiplexed between multiple packet switches. 13

GFC/VPI

VPI...

GFC: Generic Flow Control [4 bits] (UNI) VPI: Virtual Path Identifier [8/12 bits]

VCI

VCI: Virtual Circuit Identifier [16 bits] PT

53 Bytes

CLP

HEC

PT: Payload Type [3 bits] CLP: Cell Loss Priority [1 bit] HEC: Header Error Check [8 bits]

Payload

Payload: [48 Bytes]

Figure 1.8 ATM cell structure Every circuit on each link of the network is identi ed by unique integer elds called the Virtual Path Identi er (VPI) and Virtual Circuit Identi er (VCI). ATM switches are responsible for switching cells between ports, buering cells, translating VPI/VCI's, guaranteeing Quality of Service (QOS), connection set-up, and connection tear-down.

1.3.1 The ATM cell The structure of the ATM cell is depicted in Figure 1.8. Each cell is 53 bytes long, with 5 bytes reserved for the packet header and 48 bytes reserved for the payload. The GFC eld exists only for the User-Network Interface (UNI), and can be used for primitive cell ow control at the endpoints of the network. The VPI eld identi es multiple circuits destined for the same endpoint, greatly reducing the number of entries in the translation table of each intermediate switch and minimizing the call-setup delay. The length of the VPI eld is eight bits for UNI and twelve bits for a Network-Network Interface (NNI). The combination of the VPI and VCI elds uniquely identi es each of the possible 212+16 = 268 million channels which may be asynchronously transmitted across a shared link. The PT eld can be used by higher network layers to determine if a cell is the last one in a message. The CLP eld can be interpreted by the switch when it is necessary to drop cells due to congestion. The HEC uses a CRC to ensure that an error has not corrupted the header. If the header is corrupted, the cell is immediately dropped. The user's actual data are transmitted in the payload. 14

1.3.2 Adaptation layers The ATM Adaptation Layer (AAL) builds upon the underlying ATM network to provide message services required by higher network layers. Multiple adaptation layers are available, and can be selected depending on the type of service required by the application. Real-time services require a strict bound on q , but may allow cells to be dropped. Reliable frame transfers, which are needed for le transfer services and signalling messages, use a CRC on the entire contents of the frame, message length, and sequence number to guarantee that large frames are delivered correctly, even in the presence of lost cells or bit errors. If any error is detected, the entire frame is discarded and higher network layers are responsible for its eventual retransmission.

1.3.3 Signalling Unlike Internet-Protocol (IP) messages, typical ATM messages do not carry the source and destination addresses. With the exception of Permanent Virtual Circuits (PVCs), a connection must be established between endpoints before the switches can forward the cells. To set up and tear down a connection, a signalling mechanism is required that can resolve a host address, create a translation-table entry in intermediate switches, track cell usage, and (optionally) reserve sucient network bandwidth. Currently, ATM signalling is proprietary to each vendor. For interoperable ATM networking, it is necessary to agree upon a common signalling protocol. Such a standard is slowly evolving.

1.4 Optical Computer Network Testbeds Research has been conducted to investigate techniques using pure optical- and mixed optical-electronic devices for computer networking. This section describes a few prototypes that have been built.

15

Host

0 1

Passive star coupler λ1

2

λ3

λ2

λ4

3

Figure 1.9 IBM's Rainbow WDM network 1.4.1 IBM's Rainbow IBM's Rainbow project employed wavelength-division multiplexing to form a local area network using a passive optical star coupler. The eventual goal of the project, Rainbow III, was targeted to provide 1000 nodes at 1 Gbit/sec/node for an aggregate bandwidth of 1 terabit per second. Each node in the network transmitted to a centralized optical star coupler on a separate wavelength around = 1:55 m, but separated by a fraction of a nanometer, as shown in Figure 1.9. In rainbow I, however, The coupler merged the signals and divided the combined optical signal among the receivers. Each receiver used a tuned lter to select the source wavelength. The Rainbow I prototype employed a centralized 32 32 optical star coupler and a MicroChannel-controlled interface at each node. The prototype host interface was used to stabilize the source laser and control the tunable detector. An external FDDI adapter was used to transfer data between the network and the host's memory. Each source used a distributed feedback laser diode to transmit data at a few hundred megabits per second on a xed, stabilized wavelength. Each receiver was equipped with a ber Fabry-Perot piezo-tuned lter to select the incoming wavelength [18]. To send a message from a source to a destination, a protocol is required to inform the destination on which wavelength it should receive. In the Rainbow I demonstration, the source repeatedly broadcast a CALL REQUEST message with the destination's address. On the destination side, when the node is not actively receiving a message, it scanned all wavelengths until nding a transmitter which is broadcasting a CALL REQUEST 16

with its own address. Upon detecting that a message is ready, it tuned to the source wavelength, and issued CONNECT CONFIRM using the receiver's wavelength. After the connection was established, the actual message was sent. A DISCONNECT message was sent at the end of the message, allowing the wavelength scan to resume [11]. To implement packet switching in a pure WDM network, it is necessary to eciently tune to a source, receive a message, and be prepared to receive the next message from another source. Because of the mechanical inertia of the ber Fabry-Perot piezo-tuned lter, however, wavelength tuning in Rainbow I required at least 10 msec. While the receiver is tuning, data cannot be received. For circuit switching, when the time to transmit a message is typically much longer than the time to establish a connection, the lost bandwidth and the latency between connections are not critical. The added requirement of eciently switching data between multiple destinations makes packet switching inherently more challenging than the circuit switching that was readily supported on Rainbow I [11].

1.4.2 Columbia University's TeraNet In many ways, the TeraNet project at Columbia University resembles the Rainbow project discussed above. Both networks employ WDM technology and a passive optical star coupler. Both networks transmit on xed wavelengths and use tunable ber-optic Fabry-Perot lters for wavelength selection. Rather than placing a host at the network endpoint, however, TeraNet places an electronic packet switch called the Network Interface Unit (NIU) at the interface to the passive optics, as shown in Figure 1.10. Each NIU has two network interfaces and one host interface. The NIU can transmit a message to the passive optical star by broadcasting the message on either of the outgoing wavelengths, receive data on either of the incoming wavelengths, and communicate with the host via a bidirectional link. Each NIU in TeraNet consists of a 3 3 electronic packet switch operating at b = 1 Gbps/port. The internal architecture of the NIU resembles the Knockout switch, using three internal, 40 bit, 25 MHz broadcast buses. 17

Host

0 1

Passive star coupler NIU

λ0α λ0β

λ4α λ4α

λ1α λ1β

λ5α λ5β

λ2α λ2β

2 3

4 5

λ6α λ6β

λ3α λ3β

λ7α λ7β

6 7

Figure 1.10 Columbia University's WDM TeraNet The NIUs in TeraNet are logically arranged as a Shuenet. An eight-node perfect shue is illustrated in Figure 1.11 [19]. Note that the graph is wrapped around like a cyclinder, and that the nodes in the second column connect to the nodes in the rst column. Packets generally make multiple hops between NIUs before reaching their nal destination. For host 0 to send a message to host 2, it rst transmits to host 5, which in turn forwards the message to Node 2. Unlike Rainbow, where wavelengths were tuned for each message, TeraNet only tunes wavelengths on an occasional basis to permute nodes in the Shuenet to optimize performance. If the trac pattern would show heavy usage between host 0 and host 2, the network control software may decide to adjust the wavelengths to provide a single-hop connection between the NIUs attached to host 0 and host 2.

1.4.3 Colorado's WDM multiprocessor interconnect A multiwavelength-encoded, self-routing, photonic, multiprocessor interconnect has been prototyped at the University of Colorado at Boulder. This network uses the same Shuenet topology described above, except that the Network Interface Units (NIUs) in this project have no electronic buering capability. 18

Wrap

0

NIU

Host 4

Wrap

00

1

5

11

2

6

22

3

7

33

Figure 1.11 Eight-node Shuenet Packets are transmitted optically though the network. The packet header is transmitted at a dierent wavelength than the packet payload. The packet header is regenerated at each intermediate node in the network, while the payload remains unchanged. For both the header and payload, multiple wavelengths are used to encode multiple bits of the message. In an early prototype, the bits of the header were transmitted at = 831:6 nm, 828:0 nm, and 824:4 nm; while a single payload bit was transmitted at = 1300 nm [20]. When a packet arrives at a node, it is rst split into the header and payload components. The header is illuminated on a diraction grating to spatially separate the wavelengths. The individual optical beams then illuminate an array of optical detectors. The electrical signals from the detectors then address an EPROM, which then drives LiNbO3 switches to route the optical data signals. The header is regenerated using a series of laser sources centered around = 830 nm, then recombined with the data. To resolve contention for the same output port, \hot-potato" routing is employed. If two packets request the same output port, one packet is permitted to use the correct outgoing port, while the other is de ected. With de ection, buers are not required, and thus the switch lends itself well to an all-optical implementation. Because packets can be

19

de ected, however, higher level protocols are left with the burden of packet reordering. As described in Section 1.2.2, reordering is not appropriate for ATM networking.

20

CHAPTER 2 THE IPOINT PROTOTYPE ATM SWITCH To explore the systems and device requirements of high-bandwidth optoelectronic packet switch networking, a hardware testbed has been established. This chapter rst discusses the hardware environment for the iPOINT testbed and the reprogrammable technology used to implement the prototype switch. The next section describes the optical and electronic devices used to implement the prototype switch. The nal section provides details of the prototype four-port 400 Mbps aggregate-bandwidth ATM packet switch that has been designed within the iPOINT testbed.

2.1 The iPOINT Hardware Design Testbed The main goal of the iPOINT testbed is to provide a hardware design environment, using optical and electronic devices, for experimental research in high-bandwidth digital switch architectures and interconnects. Through the extensive use of Computer Aided Design (CAD) tools and Field Programmable Gate Arrays (FPGAs), an ecient environment has been created for the design, simulation, and implementation of prototype networking hardware.

2.1.1 Design capture and simulation In the iPOINT project, design capture, functional simulation, and timing simulation are performed using Mentor Graphics (v.8) running on a Sun SPARCstation 10. The Mentor Graphics environment was chosen for a number of reasons. First, it is an industrystandard computer aided design tool. Second, it is readily available through a university site license, complete with documentation and software updates. Third, it is a familiar environment for the undergraduate students who have contributed to the design of the Pulsar switch. 21

Design entry was done using Mentor Graphics' Design Architect utility. Components from the generic library were often used, as this provides the most exibility when porting designs to other technologies. Components speci cally required for the Pulsar switch were built up from the components in this library. Quicksim was employed for functional and timing simulation after the hardware generation and design modi cations.

2.1.2 Field programmable gate arrays The design of complex hardware systems involves multiple iterations and design modi cations. To provide an experimental hardware development environment, a prototype Pulsar switch has been designed using a Field Programmable Gate Array (FPGA). For the iPOINT testbed, a Xilinx XC4000-series gate array, socketed on a prototype demonstration board, was employed. The initial design process for FPGAs is remarkably similar to the design process for the CMOS or GaAs gate array. In fact, the same design environment is often used for all three technologies. In the same manner in which a computer program written in C can be compiled across a broad range of hardware platforms, a gate-level hardware design can be implemented across a broad range of underlying technologies. The Xilinx FPGA is capable of implementing logic functions of arbitrary Boolean functions, logic stages, and interconnection routing. Current FPGAs provide equivalent Large Scale Integration (LSI) functionality through the use of pass transistors, generic interconnection meshes, Con gurable Logic Blocks (CLBs), and Input/Output Blocks (IOBs). In a similar manner in which a gate array achieves its custom functionality by nal stages of metallization, a eld programmable gate array achieves its custom functionality by downloading a serial bit stream that enables pass transistors and con gures the logic blocks. To program the FPGA device, men2xnf8 was used to create a Xilinx-compatible netlist from the native Mentor Graphics' netlist. The real work in the hardware synthesis was done by running xmake, which in turn runs a number of Xilinx utilities. First, xnfmap was run, to map the logic functions into Control Logic Blocks (CLBs) and Input/Output 22

Mentor: Design Architect Design entry Technology independent

Mentor: Quicksim

Design changes

Functional simulation Timing simulation

men2xnf8

GaAs gate array tools

Create Xilinx-compatible netlist

xmake Partition, Place, and Route Map design to logic cells

Technology dependent

makebits

timsim8

Create downloadable bit-stream for device

Extract timing delays from routed circuit

Figure 2.1 Design steps Blocks (IOBs). Next, map2lca was run, which prepares the logic elements for placement in the FPGA. Then, ppr was run to partition, place, and route the circuit in FPGA. Once the cell placement and routing were nalized, the makebits program generated a bit stream that was downloaded to the FPGA device. Finally, the bit stream was downloaded to the FPGA using the workstation's RS-232 serial port. The design steps required to transform the Pulsar switch schematic design to an operational FPGA are illustrated in Figure 2.1.

2.1.3 The iPOINT prototype FPGA ATM switch Using the computer aided design tools described above, a prototype ATM packet switch has been designed for the iPOINT testbed. This prototype switch has four ports, each with a 100 Mbps bidirectional line interface. A single Xilinx FPGA device implements the core of the switch. Each port of the switch interfaces to a queue module consisting of a First-In-First-Out (FIFO) memory buer and supporting logic. Each queue module, in turn, connects to an optical interface which is responsible for the datalink interface, and optically transmitting and receiving the data to a Fore SBA-100 ATM host interface on a SUN SPARCstation. A block diagram of this switch is illustrated in 23

File server

Fore ATM SBus adapter

AMD prototype board Queue module Pulsar switch [Xilinx]

Sun SPARCStation 10

Sun SPARCStation 10 External port

Figure 2.2 The prototype iPOINT switch Figure 2.2. The following sections detail the operation of this prototype switch, beginning with the incoming optical signal.

2.2 The Physical and Data-link Layers The following sections discuss the physical components and information coding techniques used in the current testbed to provide point-to-point digital data transmission.

2.2.1 The optical ber The optical ber used for this prototype has a 62.5 m graded index core diameter, and an outer diameter of 125 m. The ber is terminated with standard ST-type connectors. The center wavelength of the optical signal is at = 1:3 m. The optic components used for this prototype are compatible with the PHY (physical sublayer) speci cations

24

Data Codes Control Codes Violations Code Data Code Data Code Command Code 11110 0000 10010 1000 00000 Quiet 00001 01001 0001 10011 1001 11111 Idle 00010 10100 0010 10110 1010 00100 Halt 00011 10101 0011 10111 1011 11000 Start 00110 01010 0100 11010 1100 10001 " 01000 01011 0101 11011 1101 00101 " 01100 01110 0110 11100 1110 01101 End 10000 01111 0111 11101 1111 00111 Reset 11001 Set Figure 2.3 4B/5B symbol coding of FDDI (Fiber Distributed Data Interface)|an ANSI (American National Standard Institute) standard for 100 Mbps local area networking.

2.2.2 Semiconductor devices The transmitter module is packaged as a single module, consisting of an LED and a drive ampli er. Signals enter the module electronically, as a dierential signal. Light exits the module from an ST-type connector. The receiver module is also packaged in a signal module, consisting of the PIN detector, ampli er, and dierential drive circuitry.

2.2.3 Information coding Data are transmitted at a rate of 125 MHz using an NRZI (Non-Return to Zero with Inversion) encoding. With NRZ encoding, the duration of each bit is = 1/(f=125 MHz) = 8 ns. With inversion, the signal is toggled (from o to on, or vice versa) only when a logical \1" is to be transmitted. The 4B/5B encoding translates four data bits to a vebit symbol, as shown in Figure 2.3 [21]. Note that for the 16 possible permutations of the data, there are 32 symbols. By carefully choosing the symbols, it is possible to guarantee at least one signal transition within any three-bit period. Some of the remaining 16 symbols are reserved for control signals, while the other symbols are detected as coding violations. 25

2.2.4 Clock recovery A serial stream of data has meaning only if both ends of the connection use the same time base. For a long-haul communication link, it is not feasible to transmit both a data signal and a clock signal. The clock signal, which is purely harmonic, can be recovered from the data provided that the data make transitions. The 4B/5B encoding, as discussed above, ensures these transitions. Clock recovery circuits generally use a voltage-controlled oscillator. The control voltage can be generated using feedback from the rst derivative of the data and the rst derivative of the clock. If the clock signal leads the data signal, the control voltage can be decreased to slow down the oscillator. If the clock signal trails the data signal, the control voltage can be increased to speed up the oscillator. \Jitter" refers to changes in frequency and phase. \Lock-up time" refers to the time required for the Phase Locked Loop (PLL) to become synchronized with the data. If a clock recovery circuit responds slowly, the lock-up time will be long, since it will take a while for the clock frequency to adjust to the data frequency. If a clock recovery circuit responds too quickly, however, jitter will cause a loss of synchronization. A good clock recovery circuit should compromise the jitter with the lock-up time.

2.2.5 Serialization/deserialization Computers operate on words, not bits. One method of converting serial data to parallel data is to use a shift register, as shown in Figure 2.4. For an n-bit shift register, input data, din , enters at the serial bit rate of b bps. As soon as the rst bit has shifted completely across the register, all n-bits are loaded o the register in parallel. As was described in Section 2.2.3, groups of bits are typically transmitted as codewords. The detection of coding violations allows regenerating word alignment. Another method of deserialization uses a sample-and-hold technique, where n elements sample din with a duration of a single-bit cycle. Serialization and deserialization are done using the AMD Taxi chip sets [22]. 26

Parallel data output

4B/5B decoding Word allignment

Word clock

Logic

Latch

Din

Clock recovery

D Q C

D Q C

D Q C

D Q C

D Q C F/Fs

Bit clock

Figure 2.4 Deserialization circuit 2.2.6 The Taxi interface The AMD Taxi prototype board (DC-CAB AmTAXICRC/F) is currently used in the iPOINT testbed [23]. One half of this board consists of the transmitter module, oscillator, and Taxi AM-7968 to perform serialization and encoding. The data are transmitted after latching an 8-bit byte to the transmit module. The other half of the board consists of the receiver module and Taxi AM-7969 to perform clock recovery, decoding, and deserialization. When a valid byte has been received by the Taxi board, the Taxi board latches the 8 bits of parallel data to the queue module.

2.3 The Queue Module The queue module is responsible for receiving bytes from the line interface, decoding the virtual circuit, determining the proper outgoing destination port, then notifying the switch that a packet has arrived. In the iPOINT testbed, an input queue is used to buer the packets in the event of contention for an output port. In the current implementation, it is not possible for a noncontending cell to bypass a contending cell at the head of the queue. In the current prototype, bytes are strobed from the line interface and then immediately stored in the FIFO memory buer. The FIFO consists of a byte-wide, 2 kbytes 27

8-bit, 12.5MHz bus

8-bit, 12.5MHz bus 8-bit bus, 12.5 Mhz

8

Dout Latch_D Latch_C Din Clk Strobe

8

FIFO

CStrobe DStrobe Error 8

Taxi

Fiber

Fiber

8

Taxi Queue module

Fore SBA-100 FDDI-optics 1.3nm, 125MHz 4B/5B encoding

AMD Taxi prototype board

Desktop workstation (Sun)

Figure 2.5 Queue module and Taxi interface detail Unused Destination port Presence

Figure 2.6 Control word from queue module deep packaged device. At the output of the FIFO, the Virtual Path Identi er (VPI) and Virtual Circuit identi er (VCI) of the cell are examined and translated to an outgoing destination port. In the present version of the queue module, which does not yet have a translation table, the Virtual Path Identi er (VPI) eld is directly mapped to an outgoing destination port. A block diagram of the queue module appears in Figure 2.5. The queue module generates a control word, that is periodically strobed by the Pulsar switch. In this 8-bit eld is a 1-bit Boolean ag that determines whether or not the queue module has a cell ready to be transferred to the switch, and a 2-bit binary encoded destination port address, as shown in Figure 2.6. To support prioritized and real-time data, an extra priority eld can be added. To support multicast, the binary encoded destination address can be replaced by a destination bit vector.

2.4 The Core Switch The core Pulsar switch is implemented in a single-chip Field Programmable Gate Array. The Pulsar switch is clocked at a rate equal to the line rate of b bps divided by the word size of w bits. In the current implementation, the clock frequency of the Pulsar switch is f = b=w = 100 Mbps=8 bits = 12:5 MHz. 28

The core switch is responsible for moving the 53-byte ATM cells to the destination port and for resolving contention when multiple cells request to be transmitted to the same destination port. If two or more cells request the same output port, only one cell will be switched, while the others must remain in their respective input queue. Because the contention-resolution circuit and input buers have prevented outputport contention, the output ports need not be clocked any faster than the input ports. With the entire core of the Pulsar switch implemented in a single chip, every ip/ op can be clocked at a uniform frequency equal to b=w.

2.4.1 Cell scheduling With an optimum algorithm for switching cells to their respective destination ports, the performance of an input-queued switch can rival that of the output-queued or busbased counterpart. Finding an optimal schedule, however, has been determined to be an NP-complete problem [24]. However, one can nd a good heuristic for scheduling time slots on the switch. In one extreme, an algorithm examines only the cells at the front of each queue, which causes the so-called \head-of-line blocking." It has been shown that for random trac, such a switch can provide a 58% throughput [25].

2.4.2 Prototype scheduling The current implementation of the Pulsar switch examines only those cells at the head of the queues. If there is no contention, all cells are immediately switched to their destination ports, achieving optimal throughput. When there is contention for the same output port, the cell to be transmitted is deterministically chosen by the con ict resolution circuitry in the switch. In the current FPGA prototype, the core switch performs three steps. First, the control word is read from every queue module by strobing the LATCH-C signal. Next, the hardware decodes the destination addresses and uses a priority encoder to select the \winning" input queue module. For the \winning" ports, the LATCH-D signal is held active for the next 53 cycles, as the cell is switched to the output port. For the \losing" 29

ports, the LATCH-D signal remains inactive, causing the cells to remain queued and ready to contend in the next cycle. At the output ports, the VALID signal is used to drive the transmit side of the Taxi interface, optically sending the cell to the next switch or endpoint workstation.

2.4.3 Scheduling improvements A simple improvement to the cell scheduling algorithm would be to provide a roundrobin service discipline for contending ports. This would provide a uniform service discipline to all ports, but would still have head-of-line blocking. A future improvement would implement a time-slot scheduler in hardware. A centralized scheduler would receive requests from each of the queue modules whenever a cell arrives and would return a future slot number, indicating when the cell should be switched. Such an approach has been implemented for a 3232 switch using an LSI CMOS device [26]. A throughput of 19.44M cells/second was achieved for random packet arrival. Further improvement to input scheduling can be achieved by maintaining parallel controllers at each output port and pipelining the requests [27].

30

CHAPTER 3 THE IPOINT WORKSTATION SOFTWARE For the iPOINT project, ATM network software has been developed at both the user and kernel levels of the UNIX environment. Section 3.1 describes the hardware environment in which the software was written. Section 3.2 details a user-level network daemon process that has been demonstrated to provide simultaneous audio, image, and le transfers over the ATM interface. Finally, Section 3.3 describes a SVR4 STREAMS module that was written to support modular networking software within the UNIX kernel.

3.1 The Hardware Environment The Sun SPARCStation 10 and the Fore SBA-100 were employed as the workstation and ATM host interface adapter for network software development in the iPOINT testbed.

3.1.1 The Sun SPARCstation 10 The Sun SPARCstation 10 is an upgradable workstation that can accommodate up to four superscalar SPARC processors for symmetric multiprocessing. The shared memory bus (mbus) is 144 bits wide|capable of transferring 128 bits (16 bytes) in one memory cycle while using the extra 16 bits for error correction. Other relevant technical details of this workstation are discussed in Section 4.1.

3.1.2 The Fore SBA-100 host adapter The Fore SBA-100 provides a 100 Mbps ATM bidirectional link over a pair of multimode optical bers. Fore provides a series of host adapters for various workstation vendors, allowing internetworking within a heterogeneous environment of Sun, Dec, and Silicon Graphics workstations. 31

The SBA-100 attaches to the SBus of a Sun SPARCStation. The device is memory mapped (mmaped) within the workstation's address space, allowing input-output operations using standard memory read and write operations. To control the device, data are sent to control registers. Status information is retrieved by reading from status registers. An ATM cell is written to the device by writing 14 consecutive 32-bit words to the FIFO memory area reserved as the transmit FIFO. Likewise, a cell is read by reading 14 consecutive words from the receive FIFO memory region. The device is capable of generating interrupts after a timeout period or when the receive queue reaches a speci ed threshold [28]. A typical user of the SBA-100 does not access the card in the manner described above. Rather, the user would usually install Fore's device driver and use UNIX sockets or run applications built upon TCP/IP. Both of these functions implicitly employ the mechanisms described above. Because the Fore software is not publically available, the iPOINT research project embarked on developing networking software which could be customized and modi ed to support experimental signalling and adaptation layer services. A typical cross section of present and future workstation applications is illustrated in Figure 3.1. The networking software should be capable of providing high bandwidth, as is required for applications such as visualization and multimedia. The software should also provide real-time services, as needed for audio or video applications. Meanwhile, for distributed computing applications it is important to minimize the latency. Finally, the network software should be backward-compatible with existing applications, such as FTP or TELNET that use TCP/IP as an underlying protocol.

3.2 User-space Networking Program This section describes a user-level network daemon process that has been demonstrated to provide simultaneous audio, image, and le transfers over the ATM interface. This program memory maps the device, and sends and receives ATM cells by directly reading from and writing to the ATM device. The software consists of a daemon process, 32

ftp /abstracts /figures /thesis

ftp /abstracts /figures /thesis

Visualization

xphone

xphone

Distrib. computing

Distrib. computing

Application programs AAL

Visualization

Application programs

AAL AAL

AAL

AAL AAL

Packet switch

MUX

MUX

ATM packet assembly VPI... ...VPI ...VCI... ...VCI HEC

ATM packet assembly VPI... ...VPI ...VCI... ...VCI HEC

VCI... PT

Data (48 bytes)

Physical transmission

Workstation software/interface

VCI... PT

Data (48 bytes)

Workstation software/interface

Figure 3.1 Host interface software layers , an audio client, speak, as well as multiple client and server processes, pr_cli and pr_serv, as shown in Figure 3.2.

atm_d

3.2.1 Virtual paths These programs assume that a unique virtual path is available between every host pair. In the current implementation of our experimental system with three hosts, the source host identi cation is encoded in the two most signi cant bits of the VPI, and the destination host identi cation is encoded in the next two bits of the VPI eld, creating a fully connected graph of virtual paths.

3.2.2 Software initialization The atm_d process is rst started as a background process on every workstation that wishes to participate. This program initializes the Fore SBA-100, then monitors a known address for service requests on the host machine. 33

pr_cli atm_d pr_serv

if (audio_toggle) handle_audio handle_new_client read_fore_atm handle_cli_req

Fore SBA-100

Speak

/dev/audio

Figure 3.2 UIUC User-space ATM networking program 3.2.3 Digital audio To support audio, the built-in digital audio device is employed. To initiate audio, the program speak is run, which takes as command-line arguments the name of the machine to which the audio data should be sent and (optionally) the audio source (microphone or line-in). This program rst opens a connection to atm_d of the local machine using a UNIX socket, then sends across the address of the destination machine. After atm_d has accepted the connection, speak initializes the workstation's built-in audio device, /dev/audio, then begins collecting the digitally sampled audio data and passing the data to atm_d, which in turn packetizes the data, and sends the ATM cells over the ber with the endpoint-to-endpoint VPI and reserved audio VCI of 0x000fffe0.

3.2.4 Signalling To communicate between endpoints and switches, necessary for dynamically allocating and deallocating virtual circuits, a signalling mechanism is required. In the current implementation, the VCI of 0x000fff00 is reserved for signalling messages, which are in the format de ned by the structure a_msg. Some of the supported signalling messages included CONN_REQ, CONND, DISCONND, DROPPED, CREFUSED. Switched Virtual Channels (SVCs) are supported. A virtual circuit table is maintained by atm_d. 34

User application

User space Kernel space Data stream

Stream head

... Adaptation Layer Module (AAL)

ATM cells

Multiplexor/scheduler Mixed ATM cells

Fore device driver

Figure 3.3 STREAMS ATM software 3.2.5 Example client and server Sample client and server programs, called pr_cli and pr_serv, have been written, which transmit and receive a le over the ATM network. The server process rst connects to atm_d over a UNIX socket, registers a service by name (in our case \PRINT SERVER"), then waits until a remote client makes a request for the service. On a separate machine, one can run the client program, pr_cli. This program takes as a command line argument the name of the destination machine and the name of the le to transfer. This program contacts its local atm_d over a UNIX socket, sends along a connection request message, waits for the connect response, then sends the contents of the le. In the current implementation, a stop-and-wait protocol is used for ow control.

3.3 Kernel-space ATM Networking To support exible ATM networking within the kernel of the UNIX workstation, a SVR4 STREAMS device driver was written [29]. The model for this software is illustrated in Figure 3.3. At the bottom of the diagram resides the software that directly interacts with the Fore SBA-100, which is solely responsible for transferring ATM cells between the host adapter and internal memory buers. Above the device driver would exist a multiplexor, which is responsible for demultiplexing incoming ATM cells based on their VCIs, and multiplexing outgoing cells based on a scheduling algorithm of choice. Adaptation layer modules would t between the user application and the multiplexor, 35

performing the necessary protocols to transform the user's data stream into a series of ATM cells, and vice versa. At the top of the diagram are the user-space applications. This model provides an elegant solution to developing network software. An application is able to open a network connection by rst opening the stream, then attaching the desired adaptation layer module. Adapatation layers can be easily implemented as modular units with well-de ned boundries. Cell transmission scheduling algorithms can be developed and tested by modifying the multiplexor module. Since STREAMS modules can be dynamically loaded and removed, they can be changed and modi ed without rebooting the machine. A STREAMS device driver for the Fore SBA-100 has been developed within the iPOINT testbed [30]. This module is added to the device list and compiled into the UNIX kernel. During the workstation's boot process, the kernel calls upon the device driver to initialize the hardware. The current device driver supports the putmsg and getmsg functions to transfer cells to and from the device, as well as a single ioctl called FIB_GETSTATS, which returns statistics internally maintained by the driver.

36

CHAPTER 4 PERFORMANCE BENCHMARKS The chapter discusses the Input/Output (I/O) and memory bandwidth characteristics of desktop workstations and presents the performance benchmarks for the workstations and host adapters currently used in the iPOINT testbed. A series of benchmarks were run on the Sun SPARCStation 10 with the Fore SBA-100 host adapter to evaluate the memory-to- ber bandwidth that could be attained for typical UNIX applications. This chapter begins with a general discussion of workstation I/O.

4.1 Workstation Bandwidth Desktop workstations achieve high memory-to-CPU bandwidths by moving blocks of parallel data over the internal bus. Currently, the access time for the Dynamic RandomAccess Memory (DRAM) found in most workstations and personal computers is on the order of 60 ns. To achieve a high memory bandwidth, workstations typically read and write entire 32-bit or 64-bit words in a single memory cycle. Newer workstations are capable of transferring an entire cache line, or a fraction thereof, in one memory cycle. For a memory width of 128 bits, a memory bandwidth of 128bits=60 ns or 2:13 Gbps can be achieved. To transfer data between the CPU and the main memory, a word-wide bus is typically employed. For the 64-bit mbus operating at 36 MHz, as found on the Sun SPARCstation 10/30, a memory-to-cache peak bandwidth of 2:30 Gbps can be achieved [31].

4.1.1 Overhead limitations In current desktop networking, the protocol overhead and CPU context switching usually limit the bandwidth. Protocols are necessary to segment and reassemble packets of data and perform Cyclic Redundancy Checks (CRCs) to guarantee data integrity. 37

These operations often add an extra burden to the host processor. The eect of protocol overhead can be minimized by adding custom hardware logic, adding special-purpose protocol processors, or making eective use of a multiprocessor system. The latter approach is used in the Silicon Graphics FDDI interface, and is the most viable solution for the Sun SPARCstation 10's which are currently used in the iPOINT testbed. CPU context switching also degrades the performance.

4.1.2 Latency limitations Message latency can decrease the eectiveness of a high-performance network. Timeof- ight latency, store-and-forward delay, and buering latency add to the time that it takes to send a request message and receive the result. The time-of- ight latency refers to the time spent as the message propagates down the transmission medium. For glass ber, with an index of refraction n 1:5, the signals propagate at approximately 200; 000 km/s. To be put in perspective, at a line rate of 1 Gbps, a single bit occupies approximately one foot in space. Store-and-forward delays occur as messages are deserialized and serialized at intermediate switches. The short length of the ATM cell minimizes this delay. Buering latency naturally occurs as a byproduct of statistical multiplexing. Priority queues and intelligent queue service disciplines are used to provide a Quality Of Service (QOS), as needed by the application. By pipelining requests and allowing multiple outstanding messages, the eect of latency on throughput can be minimized.

4.2 Benchmark Test Conditions For all benchmarks done in this chapter, two workstations equipped with the Fore SBA-100 host adapters were connected in a point-to-point manner, as shown in Figure 4.1. Each workstation consists of a Sun SPARCstation 10/30, with the SBA-100 attached to the workstation's SBus, a 32-bit general-purpose input/output bus, operating at half of the 36 MHz CPU clock rate. The operating system on this machine was SunOS 4.1.3.

38

Ethernet Sun SS10

Sun SS10 Wavelength: 1300nm 100Mbps 62.5/125 multimode fiber

Fore SBA-100

Fore SBA-100

3m..10m..2km

Figure 4.1 Point-to-point workstation link For all tests, the workstation was run in single-user mode. This machine was equipped with 32 Mbytes of physical memory and a 1.3 Gbyte SCSI disk.

4.2.1 User-space performance The following performance benchmarks characterize the workstation's ability to transfer data from user-space memory to the ber. Each data point on Figure 4.2 represents the actual bandwidth achieved while transmitting 1,000,000 ATM cells (almost half a billion bits) through the host interface. A memory buer of holding Batch Size ATM cells was allocated, then the workstation cycled through this buer until sending the full number of cells. To characterize the workstation's ability to perform extra data copies, as may be needed for segmentation and reassembly protocols, extra memory copies of the data were made. The maximum bandwidth occurred when no extra memory copies were made, and the batch size was very small. In this case, the source data resided in the CPU's cache. From the upper-leftmost point of Figure 4.2, a bandwidth of almost 85 Mbps was achieved. With a batch size of a thousand cells and greater, the working set no longer t in the cache. We observed constant performance for batch sizes up to the size of the physical memory (in this case, 32 Mbytes). For batch sizes exceeding the size of physical memory, 39

No Driver - AAL3/4 90 Copies=0 Copies=1 Copies=2 Copies=3 Copies=4 Copies=5 Copies=6 Copies=7

80

70

Bandwidth

60

50

40

30

20

10 0

100000

200000

300000

400000 500000 Batch Size

600000

700000

800000

Figure 4.2 SBA-100 throughput using AAL3/4 the workstation began swapping to disk, causing the performance to severely degrade. From the series of curves, one should appreciate the importance of ecient protocols that minimize making extra copies in memory. The AAL3/4 protocol reserves part of the ATM cell's 48-byte payload for the sequence number, message identi er, message length, and cell CRC. Because of these elds, only 44 bytes are available for user data. An advantage of using AAL3/4 is that the data CRC can easily be calculated in hardware as the cell arrives. The disadvantages of AAL3/4 are wasted bandwidth due to sending extra header data and reduced message integrity, as compared to AAL5. The AAL5 protocol uses the full 48-byte payload to send data and a trailer cell to indicate the message length and the data CRC. Because each cell is capable of carrying 9% more data, one would expect an increased throughput. The SBA-100, however, was optimized for AAL3/4 cells. Because the AAL3/4 protocol uses 16 bits of the payload as a reserved eld at the beginning and end of the message, the data in the workstation's buer must be bit-rotated before it can be written to the SBA-100. This operation 40

No Driver - AAL5 90 Copies=0 Copies=1 Copies=2 Copies=3 Copies=4 Copies=5 Copies=6 Copies=7

80

70

Bandwidth

60

50

40

30

20

10 0

100000

200000

300000 400000 Batch Size

500000

600000

700000

Figure 4.3 SBA-100 throughput using AAL5 resulted in only a slight decrease in performance because of the eciency in which a RISC processor can perform a single operation. In the case of the SBA-100, the bene ts of the larger payload seemed to be oset by the penalty of the extra rotation operation, as shown in Figure 4.3. In this experiment, the calculation of the payload CRC was not performed. If this operation were to be done in software by the workstation's processor, one would expect a more dramatic performance penalty.

4.2.2 UIUC STREAMS device driver performance The performance of the UIUC kernel-level STREAMS device driver was characterized as a function of the block size of data sent to the device, as shown in Figure 4.4. For this experiment, the device driver was installed, then a user-space application opened the stream, and wrote messages of Block Size cells. After a message has been written to a queue at the head of the stream, a context switch from the user-space program to the kernel occurs, and the UNIX kernel schedules a time to process the stream. We observed that this overhead seemed to severely limit the bandwidth, especially for small (single 41

UIUC Kernel Driver 20 Kernel 18

16

14

Bandwidth

12

10

8

6

4

2 0

10

20

30 40 50 Block Size [Cells]

60

70

80

Figure 4.4 SBA-100 throughput using UIUC STREAMS module cell) messages. For messages larger than about 500 bytes, the performance remained relatively constant, indicating that the performance bottleneck was no no longer due to the context switching.

4.2.3 TCP/IP performance The Internet Protocol (IP) is currently the most commonly used network protocol. TCP and UDP applications have made email, Network File System (NFS), Telnet, and FTP available on almost every workstation and personal computer. The success of a network will be in uenced by its ability to eciently transport IP data [32]. For these experiments, Fore's TCP/IP device driver was installed. A simple shell script, as shown in Figure 4.5, executed ttcp to gather performance results. The TCP/IP performance results are plotted in Figure 4.6. For each data point, a constant amount of data was transmitted (in this case, 8 Mbytes). As the length of buers was increased according to f8,16,: : : ,8192g, the number of buers was decreased according to f1M,512k,: : : ,1024g. As a comparison, the same tests were run on an 42

#!/bin/sh # Run from pan to soliton. First parameter is {soliton,solitonf} netstat -r echo "--------------------------" BSIZE=8 NBUF=1048576 MAXBSIZE=8000000 while [ $BSIZE -le $MAXBSIZE ] do echo "Batch size is: $BSIZE" echo "Number of buffers is: $NBUF" rsh soliton "/home/ra/lockwood/ipoint/ttcp/ttcp -r -v -p5003 -n$NBUF -l$BSIZE " & sleep 5 ttcp -t -l$BSIZE -v -p5003 -n$NBUF $1 echo "------------------" BSIZE=`expr $BSIZE '*' 2` NBUF=`expr $NBUF / 2` done

Figure 4.5 Script to generate TCP/IP performance data Fore TCP/IP Performance 10000 Ethernet: KB/s Ethernet: KB/cpu-s Fore: KB/s Fore: KB/cpu-s

9000 8000 7000

KB/s [KB/cpu-s]

6000 5000 4000 3000 2000 1000 0 1

10

100 Batchsize [Bytes]

1000

10000

Figure 4.6 SBA-100 throughput vs. batch size for Fore's TCP/IP driver 43

unloaded Ethernet. Note that the bandwidth is measured in bytes/s, rather than bits/s. We observed that while the Ethernet performance leveled o at approximately 8 Mbps for messages larger than 50 bytes, the Fore driver was able to provide approximately twice the performance.

44

CHAPTER 5 SUMMARY AND FUTURE RESEARCH 5.1 Summary This document presented the iPOINT testbed and examined the role of optoelectronics for high-bandwidth computer networking. Chapter 1 began by describing critical OEIC devices which are the enabling technologies for future high-speed networks. The following sections explained why computer networks require packet switched services, and described the key packet switch architectures that have been suggested and prototyped. Next, Asynchronous Transfer Mode (ATM) was described as a useful standard that will eventually allow interoperable high-speed networking. The introductory material concluded by describing three optoelectronic computer networking testbeds. From the examples presented, it was made clear that high-performance networks require a mix of optical and electronic devices. Chapter 2 began by describing the iPOINT testbed environment and the CAD tools and FPGA devices that were used to quickly prototype a four-port, 400 Mbps aggregate bandwidth ATM switch. As a detailed example of the device requirements for a practical optoelectronic computer network, the functions of the physical and data-link layer devices used in the prototype were explained. Next, the operations of the prototype iPOINT Pulsar switch and queue module were described. Finally, the cell scheduling problem was stated and methods for resolving output contention were discussed. Chapter 3 described the user-space network software and kernel-space device drivers that were developed within the iPOINT research project using the Sun SPARCStation 10 and the Fore SBA-100 ATM host interface adapter. Using this network software, simultaneous voice, image, and le transfers over the optical ber using ATM cells have been demonstrated. The software model, signalling protocol, and sample application programs were described. 45

Chapter 4 presented benchmarks of the iPOINT hardware and software. An initial calculation indicated that the current generation of workstations internally operate on data at gigabit per second rates. Protocol overhead and inecient networking software, however, severely limit the I/O bandwidth. Using the iPOINT user-space software, a memory-to- ber bandwidth of over 75 Mbps was demonstrated between SPARCStation 10's using the ATM host adapter.

5.2 Future Research High bandwidth optoelectronic networking is an active topic for research. This section lists some of the current and future hardware and software research topics related to the iPOINT project.

5.2.1 Microelectronic device research Speci cations have been generated for integrated transmitter and receiver OEIC modules suitable for short-haul = 0:85 m gigabit networking [33]. The speci cations for the rst generation integrated receiver call for a sensitivity of ?20 dBm at a bit rate of 1.25 Gbps into dierential ECL outputs. The next generation integrated receiver should strive to incorporate Automatic Gain Control (AGC), clock recovery, and deserialization. The speci cations for the transmitter module call for direct modulation of a laser diode with an extinction ratio of six from an ECL input. The next generation integrated transmitter should strive to incorporate a monitor photodetector and automatic bias controls to maintain a regulated optical output power. Long-term microelectronic research eorts should aim to implement the same functionality as described above at = 1:55 m, where single-mode dispersion-shifted ber can easily transport multigigabit signals long distances.

46

To Chicago File server

Digital Computer Lab

Beckman Institute

Xunet switch Fore/Xunet Adapter [FXA]

Pulsar switch

Fore

Taxi/interface

Sun SPARCStation 10

IP router HIPPI/Xunet adapter [HXA] Optical Fiber [62.5/125]

Figure 5.1 Beckman-DCL Pulsar/XUNET optical link 5.2.2 Wide-area interoperable ATM networking While local area networking of desktop workstations at gigabit rates is useful in its own right, the driving force behind ATM networks is their ability to provide high bandwidth, low latency, real-time, interoperable, wide-area networking services. A dedicated 62:5=125 ber pair connect the iPOINT research laboratory at Beckman Institute and the Digital Computer Laboratory (DCL) is currently operational, as shown in Figure 5.1. Planning is underway for the hardware and software required to interface the Fore transmission protocol (as currently used in the iPOINT prototype) to the XUNET switch in DCL. This Fore-XUNET Adapter (FXA) would allow hosts within the iPOINT laboratory to send and receive messages to the routers and supercomputers attached to this nationwide network. Work is in progress to implement a compatible signalling protocol and adaptation layer for the XUNET switch.

47

5.2.3 Multigigabit packet switching By developing the prototype 400 Mbps Pulsar switch, a better understanding of the system requirements, system bottlenecks, device requirements, and device limitations were uncovered. The next phase of the iPOINT project involves investigation of multigigabit ATM packet switches. The design of a 32-port ATM packet switch with a line rate of 600 Mbps for an aggregate bandwidth of 18 Gbps has been proposed. Future research may investigate the implementation of such a switch using the queue module, switch controller, and line interfaces developed for the XUNET switch.

5.2.4 Network software development User-space and kernel-space ATM network software has been developed within the iPOINT testbed. This software can provide high-bandwidth networking services for multimedia and distributed computing applications in the UNIX environment. Future software development may include the kernel-space multiplexor and adaptation layer STREAMS modules. Future work may include evaluating the performance of real-time video applications over ATM networks. Long-term research may investigate the integration of ATM network interfaces into an object-oriented distributed operating system.

48

REFERENCES [1] J. B. Lyles and D. C. Swinehart, \The emerging gigabit environment and the role of local ATM," IEEE Communications, pp. 52{58, Apr. 1992. [2] J. W. Lockwood, C. Cheong, S. Ho, B. Cox, S. M. Kang, S. G. Bishop, and R. H. Campbell, \The iPOINT testbed for optoelectronic ATM networking," in Conference on Lasers and Electro{Optics, pp. 370{371, 1993. [3] J. Crow, C. J. Anderson, S. Bermon, A. Callegari, J. F. Ewen, J. D. Feder, J. H. Greiner, E. P. Harris, P. D. Hoh, H. J. Hovel, J. H. Magerlein, T. E. McKoy, A. T. S. Pomerene, D. L. Rogers, G. J. Scott, M. Thomas, G. W. Mulvey, B. K. Ko, T. Ohashi, M. Scontras, and D. Widiger, \A GaAs MESFET IC for optical multiprocessor networks," IEEE Transactions on Electron Devices, vol. 36, pp. 263{268, Feb. 1989. [4] A. A. Ketterson, M. Tong, J.-W. Seo, K. Nummila, J. J. Morikuni, S.-M. Kang, and I. Adesida, \A high-performance AlGaAs/InGaAs/GaAs pseudomorphic MODFETbased monolithic optoelectronic receiver," IEEE Photonics Technology Letters, pp. 73{76, Jan. 1992. [5] O. Wada, H. Nobuhara, T. Sanada, M. Kunu, M. Makiuchi, T. Fujii, and T. Sakurai, \Optoelectronic integrated four{channel transmitter array incorporating AlGaAs/GaAs quantum{well lasers," IEEE Journal of Lightwave Technology, pp. 186{ 196, Jan. 1989. [6] L. M. Miller, K. J. Beernink, J. T. Verdeyen, J. J. Coleman, J. S. Hughes, G. M. Smith, J. Honig, and T. Cockerill, \Characterization of an InGaAs{GaAs{AlGaAs strained{layer distributed{feedback ridge{waveguide quantum{well heterostructure laser," IEEE Photonics Technology Letters, vol. 4, pp. 296{299, Apr. 1992. [7] G. C. Papen, G. M. Murphy, and D. C. Brady, \Multiple wavelength operation of a diode array coupled to an external cavity," in Conference on Lasers and Electro{ Optics, 1993. Postdeadline Paper. [8] T. Iwama, T. Horimatsu, Y. Oikawa, K.Yamaguchi, M. Sasake, T. Touge, M. Makiuchi, H. Hamaguchi, and O. Wada, \4 by 4 OEIC switch module using GaAs substrate," IEEE Journal of Lightwave Technology, pp. 772{778, June 1988. [9] Corning Incorporated, Telecommunications Products Division, Corning 50/125 CPC3 Multimode Optical Fiber, July 1990. [10] N. S. Bergano, \Undersea lightwave tranmission systems using Er{doped ber ampli ers," Optics & Photonics News, pp. 8{14, Jan. 1993. 49

[11] P. E. Green, \An all{optical computer network: Lessons learned," IEEE Network Magazine, pp. 56{60, Mar. 1992. [12] A. G. Fraser, C. R. Kalmanek, A. E. Kaplan, W. T. Marshall, and R. C. Restrick, \Xunet 2: A nationwide testbed in high{speed networking," in INFOCOM, pp. 582{ 589, 1992. [13] C. R. Kalmanek, S. P. Morgan, and R. C. Restrick, \A high{performance queueing engine for ATM networks," in ISS, 1992. [14] Y. Yeh, M. G. Hluchyj, and A. S. Acampora, \The Knockout switch: A simple, modular architecture for high{performance packet switching," IEEE Journal on Selected Areas in Communications, vol. SAC-5, pp. 1274{1283, Oct. 1987. [15] J. Murakami, \Non-blocking packet switching with shift-register rings." Ph. D. Dissertation, University of Illinois at Urbana{Champaign, 1991. [16] A. Huang and S. Knauer, \Starlite: A wideband digital switch," in GLOBECOM, pp. 121{125, 1984. [17] R. Handel and M. N. Huber, Integrated Broadband Networks: An Introduction to ATM-Based Networks. Reading, Massachusetts: Addison-Wesley, 1991. [18] G. Miller, \IBM building all-lightwave network," Lightwave, pp. 1{23, Mar. 1991. [19] A. S. Acampora and M. J. Karol, \An overview of lightwave packet networks," IEEE Network Magazine, pp. 29{41, Jan. 1989. [20] D. J. Blumenthal, K. Y. Chen, J. Ma, R. J. Feuerstein, and J. R. Sauer, \Demonstration of a de ection routing 22 photonic switch for computer interconnnects," IEEE Photonics Technology Letters, vol. 4, pp. 169{173, Feb. 1992. [21] F. E. Ross, \An overview of FDDI: The ber distributed data interface," IEEE Journal on Selected Areas in Communications, vol. 7, pp. 1043{1051, Sept. 1989. [22] AMD, TAXIchip Integrated Circuits, rev. 1.3 ed., 1989. [23] AMD, TAXIchip (DC{CAB) Data Checker Board User's Manual, 1991. [24] Y. Oie, T. Suda, M. Murata, D. Kolson, and H. Miyahara, \Survey of switching techniques in high-speed networks and their performance," in INFOCOM, pp. 1242{ 1251, 1990. [25] M. J. Karol, M. G. Hluchyj, and S. P. Morgan, \Input vs. output queueing in space division packet switching," IEEE Transactions on Communications, vol. Com-35, pp. 1347{1356, Dec. 1987.

50

[26] M. Akata, S. Karube, and S. Yoshida, \An input buering ATM switch using a time{slot scheduling engine," NEC Research and Development, vol. 33, pp. 64{72, Jan. 1992. [27] H. Obara and Y. Hamazumi, \Parallel contention resolution control for input queueing ATM switches," Electronics Letters, pp. 838{839, Apr. 1992. [28] Fore Systems, Inc., SBA-100 SBus ATM Computer Interface User's Manual, version 1.2 ed., 1992. [29] AT&T UNIX System Laboratories, Inc., UNIX SYSTEM V RELEASE 4 Programmer's Guide: STREAMS, 1990. [30] B. Cox, \A STREAMS device driver for the fore SBA-100." Available via anonymous ftp from ipoint.vlsi.uiuc.edu as /pub/ipoint/Documents/device-driver.ps, Apr. 1993. [31] Sun Microsystems, Mountain View, CA, SPARCstation 10 System Architecture: Technical White Paper, 1992. [32] R. Caceres, \Eciency of ATM networks in transporting wide{area data trac." Tech. Rep., The University of California at Berkeley, 1991. [33] J. W. Lockwood, \iPOINT gigabit OEIC speci cations." Available via anonymous ftp from ipoint.vlsi.uiuc.edu as /pub/ipoint/Documents/specs.ps, Mar. 1993.

51