A VHDL Implementation of a CORDIC Arithmetic ... - CiteSeerX

1 downloads 0 Views 539KB Size Report
Oct 10, 1994 - 4.2 A Hierarchical Design of the Adder/Subtracter for n = 4. ... puter) algorithm and a possible implementation using the VHDL hardware description ... CORDIC is an acronym for Coordinate Rotations Digital Computer and was derived by ... a redundant number system for the representation of a signed digit.
Faculty of Computing and Information Technology Department of Robotics and Digital Technology Technical Report 94-9

A VHDL Implementation of a CORDIC Arithmetic Processor Chip Grant Hampson, Student Member, IEEE Andrew Paplinski, Member, IEEE October 10, 1994

Enquiries:Technical Report Coordinator Robotics and Digital Technology Monash University Clayton VIC 3168 Australia [email protected]

+61 3 905 3402

Contents Abstract and Keywords Preface 1 The CORDIC Algorithm 2 CORDIC Hardware Implementations

4 5 6 10

3 Improving CORDIC Accuracy

14

2.1 CORDIC Processor Architecture : : : : : : : : : : : : : : : : : : : : : : : 10 2.1.1 A Word-Serial CORDIC Architecture : : : : : : : : : : : : : : : : : 10 2.1.2 A Word-Parallel CORDIC Architecture : : : : : : : : : : : : : : : : 11 3.1 3.2 3.3 3.4

Estimation of CORDIC Accuracy : : : : The Lower Bound of CORDIC Accuracy Reducing the z update error : : : : : : : Unexpected Truncation Errors : : : : : :

4 VHDL Implementation

: : : :

4.1 The Basic CORDIC Unit : : : : : : : : : : 4.2 VHDL Describes Structure and Behaviour 4.2.1 Hierarchical vs Flat Designs : : : : 4.2.2 The Viewlogic Synthesiser : : : : : 4.3 VHDL Design of the CORDIC Unit : : : : 4.3.1 The Rounding Unit : : : : : : : : : 4.4 Combining the CORDIC Units : : : : : : 4.4.1 A Solution : : : : : : : : : : : : : : 4.5 Improvements : : : : : : : : : : : : : : : :

Conclusion A CORDIC Functions B Upper Bound of CORDIC Error References

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

14 15 16 20

21 21 22 23 25 26 29 30 31 33

34 35 37 38

1

List of Tables 1.1 1.2 4.1 A.1

Elementary angles of i : : : : : : Various values of Kn : : : : : : : : Some CORDIC hardware statistics. The six CORDIC modes. : : : : : :

2

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

8 8 33 36

List of Figures 1.1 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 4.4 4.5 4.6

Rotation of a point in 2-D space. : : : : : : : : : : : : : : : : : : : : : : : Generic Processor Architecture. : : : : : : : : : : : : : : : : : : : : : : : : A Optimised Word-Serial CORDIC Architecture. : : : : : : : : : : : : : : Word-Parallel CORDIC architecture with possible data pipelining. : : : : : Numerical accuracy of the CORDIC processor. : : : : : : : : : : : : : : : : Predicted and Actual accuracy of a CORDIC processor with a 12 bit internal datapath. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A plot showing bits of error for a typical test vector rotated through all possible angles. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A 12 bit, 8 stage CORDIC processor produces 9 bit accurate results. : : : An 8 bit, 8 stage CORDIC processor produces 7 bit accurate results. : : : Simulation results from a CORDIC processor illustrating the e ects of the normalisation scheme. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : An 8 bit, 8 stage CORDIC processor (a) without rounding, (b) with rounding. The basic CORDIC unit. : : : : : : : : : : : : : : : : : : : : : : : : : : : : A Hierarchical Design of the Adder/Subtracter for n = 4. : : : : : : : : : : A Flat Design of the Adder/Subtracter for n = 4. : : : : : : : : : : : : : : A Behavioural Design of the Adder/Subtracter for n = 4. : : : : : : : : : : The structure of CORDIC unit showing the various entities. : : : : : : : : The top level schematic of an 4 stage CORDIC processor with Increased Convergence Range and Rounding components. : : : : : : : : : : : : : : :

3

6 11 12 13 15 15 16 17 17 19 20 21 24 24 25 28 32

Abstract This report describes the fundamentals of CORDIC (Co-ordinate Rotations Digital Computer) algorithm and a possible implementation using the VHDL hardware description language. An analysis of errors associated with a xed point implementation of CORDIC is also discussed and methods for reducing these errors. A normalisation scheme which reduces error and requires no extra hardware is such a method. Various CORDIC structures and possible VHDL implementations are described in detail, including design and language issues. Finally a parallel hardware implementation is described and simulated. CORDIC has many applications, of which, some can be used for array imaging techniques.

Keywords CORDIC, VHDL

4

Preface CORDIC is an acronym for Coordinate Rotations Digital Computer and was derived by Volder [1] in the late 1950's for the purpose of calculating trigonometric functions. Its popularity came about nearly twenty years later when VLSI solutions became a reality. The original algorithm describes the rotation of a 2-D vector which can be applied in applications such as Digital Signal Processing [2] (Fourier Transforms, Digital Filters), Computer Graphics [3] and Robotics [4]. CORDIC processing o ers high computational rates making it attractive to applications such as computer graphics where a combination of scaling and rotations are required in real time. CORDIC is also attractive to Robotics since the fundamental operation is coordinate transformations, however it could be used for more computationally intensive processes such as motion planning and collision detection. Array Imaging typically involves complex signal processing which may require many computationally intensive matrix operations. Increasing the complexity of the imaging model places greater demands on accuracy. Solutions to such complex systems requires better, and hence, more complex algorithms. Most of these algorithms are based on matrix factorization (decomposition) techniques, of which Singular Value Decomposition (SVD) is the most robust method. The SVD factorisation requires a two-sided transformation which involves several trigometric operations and rotations ideally suited to dedicated VLSI hardware (CORDIC processing) for real time calculations. CORDIC has also been applied to phase correction when dynamic range focusing when Digital Baseband Demodulation [5] techniques are employed in Interpolation Beamforming [6] . A complex signal is represented by the in-phase, I, and quadrature, Q, components, and are phase corrected by rotating the complex signal. Haviland and Tuszynski designed and built a CORDIC processor [7] in 1980 which used a iterative process to calculate circular, linear and hyperbolic functions. A more recent implementation (1993) by Duprat and Muller [8] discusses the possibility of using a redundant number system for the representation of a signed digit. This report is broken into four logical sections, namely, CORDIC Theory, Hardware Implementations, Improving CORDIC Accuracy and nally a VHDL Implementation.

5

Chapter 1 The CORDIC Algorithm Consider a 2-D vector (x; y) represented by a point v = x + |y in the complex plane. If the vector is rotated by an angle , the new co-ordinate vector is given by:

v~ = v ej

and shown in Figure (1.1).

(1:1)

y

v~ = x~ + |y~ v = x + |y



x

Figure 1.1: Rotation of a point in 2-D space. The angle can be expanded into a set of elementary angles i with pseudo-digits qi 2 f?1; +1g, and angle expansion error zn , such that

=

nX ?1 i=?1

qi  i + zn

(1:2)

and the sub-rotation angles i take on the following values: (

=2 for i = ?1 i = arctan(2 (1:3) ?i ) for i = 0; 1;    ; n ? 1 Note that i is approximately equal to but less than 2?i and the resulting angular expan< ?(n?1) sion error is therefore jznj  2 . 6

Substitution of Equation(1.2) into Equation (1.1) gives: nY ?1 e| qi i  e| zn i=?1 nY ?1 v  (|qi)  e| qi i  e| zn i=0

v~ = v  = and expanding ejqi i ,

(1.4)

ejqi i = cos qi i + j sin qi i = cos qi i (1 + j tan qi i) = cos i 1 + j qi 2?i

Finally

nY ?1

v~ = v 

i=0

!

cos i  (|q?1) 

nY ?1  i=0

1 + | qi 2?i

!

 e?j zn

(1:5)

The range of rotation angles which can be represented by Equation (1.2) is  max, where

max =

nX ?1 i=?1

i  190

(1:6)

and some values of i are given in Table (1.1). If the expected range of rotation angles is 90 then the initial rotation by 90 , that is, e|q?q 2 = j q?1, does not have to be performed and the initial rotation is by 45 . The second term is a constant scaling factor and for given value of n it can be preevaluated using Equation (1.7), and the rst 15 evaluated in Table (1.2).

Kn =

nY ?1 i=0

cos i =

nY ?1  i=0

1 + 2?2i

? 1 2

=

nY ?1 i=0

1 1 + 41i

q

(1:7)

The basic CORDIC algorithm which describes rotation of a unity length vector v = x + |y by an angle can be derived from Equation (1.5) using the initial conditions, where zi is the accumulated angular residue:

v?1 = v  Kn z?1 = And, proceeding with i = ?1; 0;    ; n ? 1

?1 if zi < 0 +1 0 (  |qi if i = ?1 = vvi (1 ? i + |q  2 ) if i0 i i = zi ? qi i

qi = vi+1 zi+1

(

7

(1.8) (1.9) (1.10)

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Angle Angle (degrees) 0 arctan(2 ) 45:0000 arctan(2?1 ) 26:5651 arctan(2?2 ) 14:0362 arctan(2?3 ) 7:1250 ? 4 arctan(2 ) 3:5763 arctan(2?5 ) 1:7899 arctan(2?6 ) 0:8952 ? 7 arctan(2 ) 0:4476 arctan(2?8 ) 0:2238 arctan(2?9 ) 0:1119 ? 10 arctan(2 ) 0:0560 arctan(2?11 ) 0:0280 arctan(2?12 ) 0:0140 arctan(2?13 ) 0:0070 ? 14 arctan(2 ) 0:0035 arctan(2?15 ) 0:0017 arctan(2?16 ) 0:0008

B400 6A43 3825 1C80 0E40 0729 0395 01CA 00E5 0073 0039 001D 000E 0007 0004 0002 0001

16-bit binaries = 110001:0000000000 = 011010:1001000011 = 001110:0000100101 = 000111:0010000000 = 000011:1001000000 = 000001:1100101001 = 000000:1110010101 = 000000:0111001010 = 000000:0011100101 = 000000:0001110011 = 000000:0000111001 = 000000:0000011101 = 000000:0000001110 = 000000:0000000111 = 000000:0000000100 = 000000:0000000010 = 000000:0000000001

Table 1.1: Elementary angles of i n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Kn 0.70710678118655 0.63245553203368 0.61357199107790 0.60883391251775 0.60764825625617 0.60735177014130 0.60727764409353 0.60725911229889 0.60725447933256 0.60725332108988 0.60725303152913 0.60725295913894 0.60725294104140 0.60725293651701 0.60725293538591 0.60725293510314

Table 1.2: Various values of Kn 8

The nal rotated vector is vn, with angle expansion error zn

vn = v~ = v  e|  e?|zn nX ?1 zn = ? qi i i=?1

(1.11) (1.12)

One complex operation on vi is equivalent to two operations on real numbers. For i = ?1

x0 + |y0 = |q?1(x?1 + |y?1) Hence =) x0 = ?q?1y?1 y0 = q?1x?1

(1.13) (1.14)

xi+1 + |yi+1 = (xi + |yi)(1 + |qi  2?i ) Hence =) xi+1 = xi ? qi  yi  2?i yi+1 = yi + qi  xi  2?i

(1.15) (1.16)

For i = 0; 1;    ; n ? 1

The CORDIC algorithm reduces to an iterative set of operations consisting of a binary shift and an accumulator for each of x; y and z. Refer to Appendix A for a list of transcendental functions.

9

Chapter 2 CORDIC Hardware Implementations A Hardware implementation of CORDIC processor is dependent on the number of functions required and the computational speed. If all functions are to be computed, then there will be a necessary overhead for selecting each function. However, a small fast design will result if a small number of functions are required. This chapter presents possible solutions to a mixture of design problems.

2.1 CORDIC Processor Architecture A CORDIC algorithm can take on two primary architectures, namely, word serial or word parallel. A word-serial processor minimises hardware requirements by utilising a single CORDIC unit repeatedly. However, iterative algorithms which are controlled by a small number of variables can be expanded on a two-dimensional area. ie., instead of executing a certain set of instructions n times using a single element (eg., a CORDIC unit), n times duplicated elementary cells are used in successive steps of an iteration [9]. This attened structure can now perform many operations in parallel and is so called a word-parallel CORDIC processor. A word-parallel architecture has the advantage of being up to n times faster, but due to the expansion requires, at worst, n times more hardware. However, the word-serial architecture requires complex controlling hardware and a variable shifter, decreasing the hardware saving ratio.

2.1.1 A Word-Serial CORDIC Architecture

The CORDIC algorithm has the advantage of not requiring any special hardware other than an accumulator and a variable shifter which are generally available in most microcontrollers. A multi-function word-serial CORDIC processor architecture could be realised using a basic micro structure consisting of a two-port register le, a variable shifter combined with an ALU interconnected by several data paths as shown in Figure (2.1). A generic controller could consist of a microcode instructions for the ALU and register 10

n ROM Kn 's

Result bus: xi+1 , yi+1 , zi+1

i ROM i 's

CC register

ALU

Register File

2?i  yi or 2?i  xi

Variable Shifter

i

Controlling micro-code

Input data buses: xi, yi , zi

Figure 2.1: Generic Processor Architecture. le, and would execute an iterative algorithm. This structure is simular to that of a microprocessor or DSP and allows many variations of the CORDIC algorithm as the order of operations and the expanded instruction set increases exibility. This type of structure illustrates that it would be possible to implement the CORDIC algorithm on any micro or DSP. Optimising the generic processor-structure for a word-serial CORDIC processor is achieved by reducing the functionality to operations only required by the CORDIC algorithm. A possible word-serial architecture is shown in Figure (2.2) where the ALU now contains three adders and dedicated registers. The microcode controller has been replaced by faster Combination Control Logic dedicated to the CORDIC operation sequence.

2.1.2 A Word-Parallel CORDIC Architecture

The word-parallel method expands the problem of a single dimensional algorithm into a two-dimensional problem and results in shorter computational times. Greater speeds of computation can be obtained by pipe-lining between stages so that many partial results can be calculated in parallel. A pipelined-word-parallel architecture is shown in Figure (2.3) where each iteration is represented by a separate CORDIC block and a latch is placed after each iteration, or, several iterations. The following chapters will develop, implement, and simulate such parallel CORDIC structure using the VHDL hardware description language.

11

Load Precision Reset Clock

Initial Inputs

z

}|

x0

Next State

Combinational Control Logic

y0

z0

Select

yi

xi

qi

?qixi2?i

counter

m-bit register

zi

qiyi2?i

q-bit register

Increment

{

Zero

i

Clock

P

P

P

Look up Table of i's

n-bit register n-bit register n-bit register

xi+1

yi+1

zi+1

Finished Flag

Figure 2.2: A Optimised Word-Serial CORDIC Architecture.

12

z0

y0 x0 Cell #0

1

y1 x1 Clock

0

Latch for Pipelining of data

yi

xi

?qi  xi  2?i qi  yi  2?i Cell #i

Clock

zi qi = sign[zi]

P

P

P

yi+1

xi+1

zi+1

i

Latch for Pipelining of data

Cell #n

yn xn

n?1 zn

Figure 2.3: Word-Parallel CORDIC architecture with possible data pipelining.

13

Chapter 3 Improving CORDIC Accuracy As expected, iterative algorithms calculate results by approximation and the solution will contain errors. CORDIC is not an exception and errors are introduced by a combination of quantisation and approximation errors. The accuracy of a CORDIC processor is dependent on the word length used for the three input variables x; y, and z, as well as the number of iterations or steps performed. The following chapter describes the errors associated with a xed point implementation and a means of reducing these errors.

3.1 Estimation of CORDIC Accuracy The fundamental operations performed by a CORDIC processor is the shift-and-add process of which xed point arithmetic will introduce errors. For example, consider the binary scaling of the vector vi = (xi; yi) at the ith stage: if i  m then vi+1 is updated with the truncated value vi  2?i if i > m then vi+1 = vi ; and the update will be 0 where m is the internal bus width of v and limits the maximumnumber of useful iterations. Peak accuracy could be achieved after m iterations since all accuracy has been exhausted in v. However, truncation errors may exceed the accuracy achieved by more iterations, and it is desirable to nd the optimal number of iterations. The accuracy of the rotation will be determined by how closely the input rotation angle was approximated by the summation of sub-rotation angles i. The error in v after n iterations will be proportional to the error in z. An increase in the z datapath width will increase the accuracy of the z update and hence the v update. The numerical accuracy of the CORDIC algorithm can be calculated by the examination of truncation and approximation errors. Truncation errors are due to the nite word length and approximation errors are due to the nite number of iterations. Walther [10] analyzed the x and y iterations independently of the z iterations and concluded that log n extra bits in the data paths can provide n bits of accuracy. This work was re-calculated by Kota and Cavallaro[11] in a non-independent manner and concluded that log n + 2 extra bits are required to achieve n bits of accuracy after n iterations. 14

This solution represents an upper bound of error in the CORDIC processor. A graph of this function appears in Figure (3.1) from which it can be seen that to achieve 8 or 16 bit accuracy, the internal datapaths need to be 13 and 22 bits respectively. Datapath resolution vs Output Resolution

Output resolution is (n) bits with (n) iterations

32 28 24 20 16 12 8 4 0 0

4

8

12 16 20 24 28 Internal Datapath Width (n+log(n)+2)

32

36

40

Figure 3.1: Numerical accuracy of the CORDIC processor.

3.2 The Lower Bound of CORDIC Accuracy A CORDIC processor can be presented with all possible input combinations to nd the lower bound of error. Simulation results are shown in Figure (3.2) where a 12 bit CORDIC processor with a variable number of stages is presented with all possible rotation angles between ?  z?1   and the resulting accuracy in bits is calculated. Kota and Cavallaro's upper bound of error (as de ned by their maximum error equation in Appendix (B)) is also shown in Figure (3.2). The upper bound of error has a well de ned peak of accuracy, however the simulation results indicate that accuracy will improve if more iterations are performed. Solid: Predicted Accuracy, Dashed: Actual Accuracy 12

10

Output Accuracy

8

6

4

2

0 0

2

4

6 8 Number of stages n

10

12

Figure 3.2: Predicted and Actual accuracy of a CORDIC processor with a 12 bit internal datapath. 15

Figure (3.3) illustrates the accuracy of a 12 bit, 12 stage processor, by simulation, and the resulting bits of error produced. About 0:3% of results are greater than 2 bits of error which indicates that the error bound of a CORDIC processor is positioned between the upper and lower bounds of error. Bits error 90 3 120

60 2

150

30 1

180

0

210

330

240

300 270

Figure 3.3: A plot showing bits of error for a typical test vector rotated through all possible angles. The simulation results indicate that n + log n + 2 is an over estimation of data path width required and a reduction in datapath width is possible if the number of iterations is increased. Simulation results of two 8 stage CORDIC processors with 12 bit and 8 bit datapaths, are shown for comparison in Figure (3.5) and Figure (3.4) respectively. The simulation results were obtained by varying the magnitude of v and in uniform steps. The di erence in resolution obtained is two bits, indicating that the lower bound of error is closer to the error bound of CORDIC.

3.3 Reducing the z update error In the rotational mode of CORDIC, converges towards zero by adding/subtracting subrotation angles and the nal iterations of the zi update will result in numbers approaching zero. More precisely, the angular error zi is approximately equal to 2?i , thus for a bus width m, only (m ? i) bits are used to represent error. To reduce the zi error a oating point system could be used, but it has complex hardware implementations not suited to word-parallel structures. A simpler method to 16

90

1.0

120

60 0.83 0.66

150

30

0.50 0.33 0.17

180

0

-150

-30

-120

-60 -90

Figure 3.4: A 12 bit, 8 stage CORDIC processor produces 9 bit accurate results. 90 120

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

150

60

30

180

0

-150

-30

-120

-60 -90

Figure 3.5: An 8 bit, 8 stage CORDIC processor produces 7 bit accurate results. 17

improve accuracy, ie., to utilise all m bits, a quasi- oating point scheme or normalisation scheme could be implemented by scaling the existing sequence by 2i , ie.,

z^i = 2i  zi Therefore, the new sequence becomes

z^i+1 = = = =

2i+1  zi+1 2  2i  (zi ? qi i) 2  (2i  zi ? qi  2i  i) 2(^zi ? qi ^i)

(3.1)

which requires a shift left at each iteration, and requires no extra hardware for a wordparallel structure. A new sequence of sub-rotation angles can be de ned as:

^i = 2i i = 2i tan(2?i) (3:2) where ^i approaches a nite value of 1 for increasing values of i, and will utilise most of the bus width. Since the scaling system results in full use of the databus width, over ow may occur if the bus width is too small. Using Equation (3.1), the maximum value zi+1 can have is when zi approaches zero, giving max[zi+1]  2  max[^ i]

(3:3)

To calculate the increase in accuracy is beyond the scope of this report, however, simulation indicates that there is a direct improvement in accuracy. The simulation results indicated that using the traditional scheme the accuracy of the rotation is

accuracy / log(zi datapath width) + log(number of stages) whereas the normalisation scheme has the advantage of

(3:4)

accuracy / log(number of stages) (3:5) since the z datapath is always in a semi-normalised state. Using the traditional scheme, i ! 0, limiting the number of useful stages. However when normalised, there is no limit on the number of stages and a signi cant reduction in hardware is possible by reducing buswidth of z. Figure (3.6) illustrates the error dependencies on the number of stages and bits for the scaled and unscaled CORDIC processors. Figure (3.6(a)) and Figure (3.6(b)) show the angular expansion error. Figure (3.6(c)) and Figure (3.6(d)) show the dependance of v error on the angular expansion error.

18

No alpha scaling

Alpha scaling

-3

-3

x 10

6

angle expans. error

angle expans. error

x 10

4 2 0 0

0 10 stages

10 20 20

4

2

0 0

bits

0

stages

No alpha scaling

bits

4 relative v error

relative v error

20 20

Alpha scaling

4

2

0 0

10

10

0 10 stages/bits in v

0 0

10 20 20

2

bits in z

0 10 stages/bits in v

10 20 20

bits in z

Figure 3.6: Simulation results from a CORDIC processor illustrating the e ects of the normalisation scheme.

19

3.4 Unexpected Truncation Errors Using xed point arithmetic in a CORDIC processor will introduce an unexpected truncation error. The error occurs when the vector (x; y) has a negative component. Consider the nal iterations where the update of vector v approaches 0 since a larger number of right shifts is performed at each iteration. However this is not the case if x or y is negative. For example, let xi!N equal some number hex X"2D", or positive 45. The right shifted value of xi!N approaches zero. However, the negative of X"2D" in twos-complement form is X"D3" and the right shifted value will produce a number approaching X"FF", or ?1, not the expected zero. This is a signi cant problem in the CORDIC processor, since the addition of extra iterations will only increase the error. A simple method of removing this error would be to round the shifted value, instead of the forced truncation. A simple method for rounding values is to add the bit that was last shifted out to the shifted value. The rounder could be implemented using a half-adder and typically requires three logic gates per bit to implement. Minimal extra hardware is required in the word-serial architecture, however a word-parallel structure requires two half-adders per stage. This will have a direct e ect on the performance of the processor with the additional delay. Figure (3.7) are the simulation results of two CORDIC processors, with and without, rounding units. The test vector was rotated in steps of 5, through 360 and the rounded results are signi cantly more accurate. The rounding maintains monoticity in the actual angle of rotation as well as uniform magnitude. 90

90 60

120

32.95

30

150

180

-150

30

150

180

0

-30

-120

60

120

32.95

0

-150

-30

-120

-60 -90

-60 -90

Figure 3.7: An 8 bit, 8 stage CORDIC processor (a) without rounding, (b) with rounding.

20

Chapter 4 VHDL Implementation Various tools can be used to implement the CORDIC processor, however, a standardised approach to this problem would unify the solution for further development in various applications. A VHDL (VHSIC Hardware Description Language) has been used here to describe the structural and behavioural characteristics of a Word-Parallel CORDIC processor. VHDL has become the standard of hardware description languages and has its own IEEE standard [12].

4.1 The Basic CORDIC Unit Any CORDIC structure will involve a basic unit containing three adders/subtracters, as shown in Figure (4.1). The binary scaler would be variable in the case of a Word-Serial device, however, much simpler in the Word-Parallel device as a shift translates directly to a misalignment of the data bus.

yi xi

zi

Cell i

i

yi+1 xi+1 zi+1 Figure 4.1: The basic CORDIC unit. This unit and a suitable FSM and registers could form a word-serial structure. A word-parallel implementation can be obtained by linking n CORDIC units. The rest of this chapter deals with development of a Word-Parallel unit and the interconnection of these devices using the VHDL language. It should be a relatively trivial task, but unfortunately there are many bugs in the Viewlogic VHDL Synthesiser, as well as only containing a subset of the full VHDL standard. The main aim of the project was to describe a CORDIC processor using the VHDL language and to allow the application designer to change the size of structure easily. This 21

exibility could include fundamental changes such as variable datapath widths and variable number of stages. Other options such as rounding intermediate nodes and pipelining could also be easily integrated. Currently, Viewlogic's VHDL is a partial implementation of the 1987 IEEE Standard VHDL, and many constructs are missing from their implementation. However, most of the useful constructs are there, but contain nasty ambiguous messages following to say sorry this only works partially. This made it very dicult to work with.

4.2 VHDL Describes Structure and Behaviour VHDL has the ability to describe a design in two ways

 in terms of its component structure,  in terms of behavioural functionality of the design and also the possibility of integrating the two streams. A requirement for structural descriptions is that the lowest level description will be a behavioural description to ensure portability between di erent synthesis libraries. An example of a lowest level operator is the logical operator AND (behavioural), and used to describe the ANDing of two operands. This may be synthesised as an AND standard cell from the library. In this way, there is no way of directly accessing a component from a cell library and limiting portability. Consider a slightly more complex design of an n-bit adder/subtracter, which could be described by the following behavioural description: addsub : PROCESS(a,b,sel) VARIABLE res : VLBIT_VECTOR(n DOWNTO 0); BEGIN res := zero(n DOWNTO 0);

-- needs to be initialised

IF sel = '1' THEN res := add2c(a,b); ELSE res := sub2c(a,b); END IF; s

Suggest Documents