Optical design of programmable logic arrays - OSA Publishing

0 downloads 0 Views 1MB Size Report
Optical design of programmable logic arrays. Miles J. Murdocca, Alan Huang, Jurgen Jahns, and Norbert Streibl. Regular free-space interconnects such as the ...
Optical design of programmable logic arrays Miles J. Murdocca, Alan Huang, Jurgen Jahns, and Norbert Streibl

Regular free-space interconnects such as the perfect shuffle and banyan provided by beam splitters, lenses,

and mirrors connect optical logicgates arranged in 2-D arrays. An algorithmic design technique transforms arbitrary logic equations into a near-optimal depth circuit. Analysis shows that an arbitrary interconnect Gate count is normally makes little or no improvement in circuit depth and can even reduce throughput. higher with a regular interconnect, and we show cost bounds. We conclude that regularly interconnected circuits will have a higher gate count compared with arbitrarily interconnected circuits using the design

techniques presented here and that regular free-space interconnects are comparable with arbitrary interconnects in terms of circuit depth and are preferred to arbitrary interconnects

1.

for maximizing throughput.

tage of the capabilities of this approach without suffer-

Introduction

All-optical digital computers have the potential for high speed, cheap communications, and massive parallelism. Logic gates based on nonlinear dielectric con-

stants were investigated theoretically in the early 1960s by von Neumann. 2 '3

bistable devices,

1

In the last few years optical 4' 5

nonlinear Fabry-Perots, 6

and hy-

brid electrooptic devices have been studied experi-

mentally. These results encourage development of architectures suitable for optics. Historically, two ar-

chitectural approaches have dominated the field. One approach uses integrated optics to interconnect optical logic devices. A system designed with this approach is architecturally similar to a conventional computer, with logicgates connected in arbitrary configurations. This similarity means that an optical computer designed with this approach is worth building only if it can be made more cheaply or more powerful. An alternative approach makes use of 2-D arrays of devices interconnected in free space. This approach

uses space-variant interconnects (provided by holograms) or space-invariant

regular interconnects

(pro-

vided by beam splitters). We prefer the space-invariant regular interconnect approach for simplicity, extensibility, and high throughput. To take advan-

ing from the forced regularity of the interconnect, novel computing techniques are needed. 7 -"

Regular space-invariant interconnects are the most notable feature of the architecture presented here and are discussed in Sec. II. In Sec. III, we provide a brief

introduction to digital circuit design and show how Boolean equations can be generated from a description of a finite state machine. Section IV introduces a design technique for mapping Boolean equations onto regularly interconnected arrays of logic gates. Further details and a discussion of complexity are given in Sec. V. II.

Architecture

The architecture we propose consists of customizable space-invariant interconnects and optically nonlinear arrays arranged in an AND stage and an OR stage (Fig. 1). In the

AND

stage, a 2-D input pattern is split

into two identical images. Each image is perfect shuffled optically1 2 13 in the x dimension and is passed

through a mask before being combined onto an array of AND gates. The AND array is fed back to the AND stage and also to a similar OR stage. The output is produced at the OR stage. The feedback paths are imaged back onto the system one row lower as shown in the crosshatched areas of the input arrays, so that each row is

exposed to a different part of the mask. The entire Norbert Streibl is with University of Erlangen-Nuremberg, D8520 Erlangen, Federal Republic of Germany; the other authors are with AT&T Bell Laboratories, Holmdel, New Jersey 07733. Received 14 July 1987. 0003-6935/88/091651-10$02.00/0. ©1988 Optical Society of America.

image from the AND stage is copied to both the AND stage and to the OR stage so that a number of cycles can

be made through the AND stage before the OR stage is used. The AND-OR combination is modeled after the traditional

AND-OR

form of logic design. A more com-

plete description of how this architecture is used is presented in Sec. IV. 1 May 1988 / Vol. 27, No. 9 / APPLIEDOPTICS

AND

-

Stage

-

OR Stage

Opticaly

Nonlnear AND Ar ay

Horizontal Perfect Shufffl.

A-u|

Optcaly N:TE3

L

Mask/

Mask

FHEC

F.t.

nn~rr P.fL

Mn

M,

TE3

Non inear 5--aORA ray ...

MaskM Horizontal rPerfect Shuffle-

s^+F

Mask Ma M,

,.44

B

L

A

A

B

-M

T

tan.

(a)

(b)

Fig. 3. Exchange (a) and bypass (b) operations for a shuffle-exchange network. I~x~I

IWI

1 ---

Output

Input

t

Fig. 1. Schematic of a digital optical computer. The input array is split into two identical images which are each perfect shuffled (marked as P-S) and recombined onto an optically nonlinear array of AND gates. The AND array is imaged onto a similar setup with an OR array. Feedback is at two places as shown, and masks customize the interconnects so that only selected sites will be enabled.

~~~Bearnsplitte, / ~~~~~~~~4 0 3 O3s

00 0l00 1 23

0

45

A B C D E F G H a b C d e f

A a B b Cc D d E e F

Fig. 2.

h

A B C D E F G H a b C df

h

a A b B c C d D e E f FgGh

H

Sixteen-bit 1-D perfect shuffles.

input

3

of the perfect shuffle.

The rightmost shuffle is

obtained from the leftmost shuffle by swapping elements within pairs.

A.

Optical implementation

Toof

2

Input

Fig. 4.

G g H h

Perfect shuffle

0f 2f

A B C D E F GH a b c d e f

h

Perfect Shuffle and Butterfly Interconnects

The perfect shuffle is a special permutation of the string (ao,... aN-1) of length N = 2n. It can be ex-

pressed as a cyclic left rotation of the binary addresses of the elements: ax - aROL(x).Figure 2 illustrates two forms of a 1-D perfect shuffle interconnect for N =

16. The perfect shuffle14 is commonly used for inter-

connecting array processors,15 permutation net6 7

works,' sorting' and for special algorithms such as the fast Fourier transformation (FFT).18 The perfect shuffle can be combined with pairwise exchange/bypass operations (Fig. 3) to achieve any permutation of N inputs in 3 log2N - 1 levels.' 6 Figure 4 shows an optical implementation of the perfect shuffle.12 The input array is split into two copies, each copy is magnified by a factor of 2, shifted,

and interlaced. The magnification step may introduce special problems for diffraction-limited devices, so we also consider a butterfly interconnection pattern shown in Fig. 5. A butterfly on a string of length N =

2ncan be described as an exchange of the least and the most significant bits (bit 0 and bit n - 1) in the binary address x of each element ax of the string:

ax

aexchange(x). The following acronyms are used: least significant bit (LSB) and most significant bit (MSB). There are three angles of connections in the butterfly: a copy operation (for all elements of the string having a

binary address with LSB = MSB), a shift to the right

by N/2 - 1 (for all elements of the string having a 1652

APPLIEDOPTICS / Vol. 27, No. 9 / 1 May 1988

A a C c E e Gg B b D d F f H h Fig. 5.

Sixteen-bit 1-D butterfly.

binary address with LSB = 1 and MSB = 0), and a shift to the left by N/2 - 1 (for the rest of the elements). An optical implementation of the butterfly can be made using split/shift/combine operations in the style of Huang's symbolic substitution.7 The input is split into three imaging systems, the unwanted bits are masked out in each system, and the three resulting images are recombined with appropriate shifts. Unlike the perfect shuffle, magnification is not needed, which makes the butterfly attractive if small optical devices (near the diffraction limit) are used, or if the devices are densely packed. Butterflies can be arranged in log2N levels of in-

creasing granularity to form a banyan network as shown in Fig. 6(a).

Lettered boxes can be any two-

input two-output function, such as exchange/bypass operations in the case of permutation networks. For the technique we describe here, the boxes are AND and OR gates. The banyan and the perfect shuffle networks are isomorphic when mappings are made in multiples of log2N levels. The banyan network can be

mapped onto the perfect shuffle network by shuffling the boxes once for each level of depth and replacing

butterflies with perfect shuffles. The first level has

01/0

00/0

output Current Nextstate/Current

11/0

IA

B

E

CD

H

FG

state

Level 0

A

11/1

01/1 0/1

Level 1

B

00

01

(b)

(a) IL X I Y Z a b

c

Fig. 8. State transition diagram (a) and state transition table (b) for a serial adder.

OP

M d

e

f

Level 3

(b)

(a)

Isomorphism of banyan network and perfect shuffle network for N = 16.

Fig. 6.

depth 0 so the boxes are not shuffled. The next level is shuffled once, the next level is shuffled twice, and so

on. The operations inside the boxes remain unchanged. The mapping from shuffle network to banyan network can be made in a similar fashion using inverse perfect shuffles.

The proposed implementation of the butterfly is

interesting because masks can be placed on each path

to customize the interconnect. In the next few sections, we present an algorithmic design technique in the form of customized butterflies, although the target machine can use butterflies, perfect shuffles, or any of a number of similar interconnects. Ill.

Primer in Two-Level Logic Design

In this section we show how a simple digital design

FiniteState Machines

Any computable function can be described by a fi-

nite state machine.'9 A finite state machine can be represented

as a black box with a finite number of

internal variables that is connected to the rest of the system through input and output lines. The output at any given time is completely determined

by the cur-

rent input and internal state. A computer can be thought of as a finite state machine because it responds to an input in a deterministic way depending only on

the current input and content of its memory. Similarly, a computer program is a software state machine

because the response is completely determined by the current input and internal variables. A complete representation of a finite state machine can be given by a state transition diagram which shows all possible

...

Y Fig. 7.

Serial adder.

large computer can be very complex (the state space

grows exponentially for a linear growth in the number of variables) so it is common to decompose large state

machines into smaller state machines such as adders and registers. Figure 7 shows the model for a sequential adder that we will use as an example.

There are two input lines

on which two binary numbers enter, least significant bit first. The result appears on the output line, the least significant bit first. For two N-bit numbers N time steps are needed to complete the addition. Figure 8(a) shows the state transition diagram for this adder. The state machine has two internal states, The machine will be in

state A whenever there is no carry from the previous step and in state B whenever there is a carry from the previous step. The state transition diagram is read as follows: when both inputs are 0 and the machine is in state A (i.e., there is no carry), 0 is produced at the

output and no carry is produced (so remain in state A). This is shown as the arc labeled 0/00. The seven other

transitions are interpreted similarly. A mathematically equivalent state transition table shown in Fig. 8(b) can be derived from the state transition diagram. State minimization is not discussed here, but a discussion on the subject can be found in Ref. 20. B.

State Assignment

State variables A and B can be coded in binary, and

the state transition table can be translated into this new form as shown in Fig. 9. We assign to state A the

value 0, and to state B the value 1. The state assignment we choose here is arbitrary, although it does influence gate count. We refer the reader to Ref. 20 for further discussion on state assignment. The nextstate function and current-output function have been separated to make the translation to Boolean equations in the next section more obvious.

1 0 0 1

X

SERIAL ...

states and the transitions from each state for every possible input. The state transition diagram for a

A = no carry and B = carry.

problem can be transformed into a set of Boolean equations through existing computer design techniques. This is the first step of a two-step process in generating masks that customize the interconnects. A.

11

10/0

10/1 Level 2

10

A/O A/1 A/1 B/O A/1 B/O B/O B/1

0 1 0 1

ADDER

. . . 1 110

-Z = X+Y

Two numbers enter the adder least significant bit first. Two bits are added on each time step, and one more bit of the result is produced. The carry is stored within the adder as its internal state. 1 May 1988 / Vol. 27, No. 9 / APPLIEDOPTICS

1653

Next state St+,

Current output Zt+

Xtyt

00

St

A= o

xtYt

11

0 1

(a)

C.

10

00

01

10 11

o

1

1

0

1

0

0

1

ll

B= 1

Fig. 9.

01

0

1

1

1 (b)

State assignment (a) and state transition table with assignments (b) for a serial adder.

Canonical Sum-of-Products Form of Boolean

Equations

Four functions must be implemented to realize the serial adder. We need to implement the next state function St+1(XtYtt), the output function zt+i(xt,yt,st), and the complementary functions t+j(xtytst) and zt+1(xt,yt,st). We use dual-rail logic because it simpli-

fies optical setups by avoiding a relative inversion between the arrays. The cost of using dual-rail logicover single-rail logic in terms of gate count is a factor of -2,

since the complementary functions are implemented explicitly. As shown in Fig. 9(b), st+l will produce an output of 1 when xt, Yt, and st are 011, 101, 110, or 111. The corresponding Boolean expression is St=

Xy's + XYS + XYS + XYSt.

(1)

Subscripts have been removed from x and y for readability. The equation is said to be in canonical sumof-products form when variables are logically ANDed into groups (called minterms) that are logically Red

to form a function. Every variable appears in every minterm once in the canonical sum-of-products form. The complement of st+l will be 1 whenever x, y, and st are 000, 001, 010, or 100. The corresponding Boolean equation is st+1= xyst + xyst + yst + xyst.

(2)

In a similar manner, the output function and its complement can be obtained: Zt+= xyst + yst + xyst + xyst;

(3)

Zt+1 xyst + ys + xyst + xySt.

(4)

Minimization of logic functions is not discussed here, but a discussion can be found in Ref. 20. IV. Design Technique Based on Programmable Logic Arrays (PLAs)

In this section we show in detail how to map Eqs. (1)-(4) for a serial adder onto the architecture shown in

Fig. 1. The general approach is to first generate all possible unminimized minterms and then to select and combine the minterms that are needed to implement the functions. This technique is similar to the way functions are generated with electronic programmable logic arrays. Normally, both the AND matrix and the OR matrix are programmable. That is the case here as well, but we only discuss how to program the 1654

OR

matrix

APPLIEDOPTICS / Vol. 27, No. 9 / 1 May 1988

Fig. 10.

x x Network for generating minterms of one variable.

Flow of

information is from the top to the bottom. Unused connections are marked with dashed lines and are masked out via masks as shown in Fig. 1.

in this section in the interest of keeping the algorithm simple.

Section V shows how the

AND

matrix can be

programmed. Using the technique described in this section, all 2unminimized minterms of m variables can be generated in m + 1 levels, and the minterms can be combined

into arbitrary functions in m + 1 additional levels, givinga maximum depth of 2(m + 1) to implement any function of m variables. The depth is near-optimal (optimal depth for generating any function of m variables would be 2m for AND and

OR

gates with fan-in and

fan-out of 2), but gate count will generally be higher than if an arbitrary interconnect is used. There are three input variables (xt, Yt, and st) in Eqs. (1)-(4) so 2(3 + 1) = 8 levels are needed to implement the functions. A.

Generating Minterms in the

AND

Stage

A recursive formulation for implementing all unminimized minterms of any number of variables is presented in this section. For a network with m = 1, the connection pattern

shown in Fig. 10 should be

used. Networks for generating minterms of m > 1 variables are constructed recursively. For a network with m = n, two n - 1 networks should be placed side by side. A 2m+1butterfly should be added to the top and the mth variable, and its complement should be added to the 2 - 2 and 2 - 1 positions, respectively (counting from the left, starting with 0). Connections should be added in the top level and the next level to introduce the uncomplemented variable into every minterm on the left and the complemented variable into every minterm on the right. At the top level, the uncomplemented variable should be passed to the left n - 1 network, and the complemented variable should be passed to the right n - 1 network. At the next level, the uncomplemented variable should be connected to the most recently added uncomplemented variable. The complemented variable should be connected to the most recently added complemented variable. Fi-

0 1 2 3

nally, connections are added as needed to introduce the new variable or its complement to every minterm as needed, and the remaining variables should simply be passed through to both sides of the network.

4

6

5

7

For example, to create a network with m = 2, place two m - 1 networks side by side and add a butterfly of width 21+l to the top as shown in Fig. 11. Add the mth variable and its complement to the top row in positions 2m

2 = 2 and 2- - 1 = 3 as shown.

-

Connections

are

made in the top two levels of the network so that the new variable or its complement appears in every minterm. The other input variables are passed from the top stage through to both sides of the network. An m = 3 network can be constructed from two m = 2 networks as shown in Fig. 12.

For a network with m input variables (and the m corresponding complements) the network will be 2m+1 gates wide and m + 1 levels deep. In Fig. 12, the input variables are x, y, and s, and the complements are x, y, and s. m = 3so the network is 23+1= 16 gates wide and 3 + 1 = 4 levels deep. The eight unminimized min-

terms are shown at the bottom of the figure. Unused paths that are masked out are marked with dashed

xy

Generating Functions in the

OR

xy

Fig. 11. Network for generating minterms of two variables.

lines. B.

xy7

xy

Stage

For a network of width N and logic gates with a fanin and fan-out of 2, a path can always be made from any input to any output in log2 N levels. The process of generating a path can be viewed as a traversal through

B

-

_~~l

r

-

_

_I.l

_ __

_,

- --

a binary tree, whose root is the desired input and whose leaves are the outputs.

Each gate is assigned a binary

address from 0 to N - 1 according to its position from the left. The logicalXOR is taken between each input address and output address. To make connections between inputs and outputs, the ith bit is observed on the ith level. If the ith bit is 0, the straight path is taken, otherwise the angled path is taken. To combine minterms to create a function, a separate path is found from each needed minterm to the selected output. The paths are then combined to form the final circuit. Any two combinations can always be

D cE

F

- -

-0

0:

r

0X

"

.

I

It/\

GI

I

-

00000

0