Design and Simulation of an 8-bit Dedicated Processor for ... - arXiv

Proceedings of the 2011 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) IEEE Xplore: CFB1120J-ART; ISBN: 978-1-61284-694-1; Print Version: CFB1120J-PRT; ISBN: 978-1-61284-766-5

Design and Simulation of an 8-bit Dedicated Processor for calculating the Sine and Cosine of an Angle using the CORDIC Algorithm 1,2

3

Aman Chadha1,a, Divya Jyoti2,b and M. G. Bhatia3,c

Thadomal Shahani Engineering College, Bandra (W), Mumbai, INDIA Ameya Centre for Robotics and Embedded Technology, Andheri (W), Mumbai, INDIA a [email protected], b [email protected], c [email protected] these technologies into a tool for exploring and evaluating micro-architectural designs [4]. Because of their advantage of real-time in-circuit reconfigurability, FPGAs based processors are flexible, programmable and reliable [1]. Thus, higher speeds can be achieved by these customized hardware solutions at competitive costs. Also, various simple and hardware-efficient algorithms exist which map well onto these chips and can be used to enhance speed and flexibility while performing the desired signal processing tasks [1],[2],[3]. One such simple and hardware-efficient algorithm is COordinate Rotation DIgital Computer (CORDIC) [5]. Primarily developed for real-time airborne computations, it uses a unique computing technique highly suitable for solving the trigonometric relationships involved in plane co-ordinate rotation and conversion from rectangular to polar form. John Walther extended the basic CORDIC theory to provide solution to and implement a diverse range of functions [7]. It comprises a special serial arithmetic unit having three shift registers, three adders/subtractors, Look-Up Table (LUT) and special interconnections. Using a prescribed sequence of conditional additions or subtractions, the CORDIC arithmetic unit can be designed to solve either of the following equations:

Abstract - This paper describes the design and simulation of an 8-bit dedicated processor for calculating the Sine and Cosine of an Angle using CORDIC Algorithm (COordinate Rotation DIgital Computer), a simple and efficient algorithm to calculate hyperbolic and trigonometric functions. We have proposed a dedicated processor system, modeled by writing appropriate programs in VHDL, for calculating the Sine and Cosine of an angle. System simulation was carried out using ModelSim 6.3f and Xilinx ISE Design Suite 12.3. A maximum frequency of 81.353 MHz was reached with a minimum period of 12.292 ns. 126 (3%) slices were used. This paper attempts to survey the existing CORDIC algorithm with an eye towards implementation in Field Programmable Gate Arrays (FPGAs). A brief description of the theory behind the algorithm and the derivation of the Sine and Cosine of an angle using the CORDIC algorithm has been presented. The system can be implemented using Spartan3 XC3S400 with Xilinx ISE 12.3 and VHDL. Keywords - CORDIC, VHDL, dedicated processor, datapath, finite state machine.

I. INTRODUCTION Over the years, the field of Digital Signal Processing (DSP) has been essentially dominated by Microprocessors. This is mainly because of the fact that they provide designers with the advantages of single cycle multiply-accumulate instruction as well as special addressing modes [4]. Although these processors are cheap and flexible, they are relatively less time-efficient when it comes to performing certain resource-intensive signal processing tasks, e.g., Image Compression, Digital Communication and Video Processing. However as a direct consequence of rapid advancements in the field of VLSI and IC design, special purpose processors with custom-architectures are designed to perform certain specific tasks. They need fewer resources and are less complex than their general purpose counterparts. Instructions for performing a task are hardwired into the processor itself, i.e., the program is built right into the microprocessor circuit itself [2]. Due to this, the execution time of the program is considerably less than that if the instructions are stored in memory. Emerging high level hardware description and synthesis technologies in conjunction with Field Programmable Gate Arrays (FPGAs) have significantly lowered the threshold for hardware development as opportunities exist to integrate

= Y’

K ( Ycos λ + Xsin λ )

= X’

K ( Xcos λ − Ysin λ )

(1)

Where, K is a constant. By making slight adjustments to the initial conditions and the LUT values, it can be used to efficiently implement trigonometric, hyperbolic, exponential functions, coordinate transformations etc. using the same hardware. Since it uses only shift-add arithmetic, the VLSI implementation of such an algorithm is easily achievable [4]. II. CORDIC ALGORITHM The CORDIC algorithm is an iterative technique based on the rotation of a vector which allows many transcendental and trigonometric functions to be calculated. The key aspect of this method is that it is achieved using only shifts, additions/subtractions and table look-ups which map well into hardware and are ideal for FPGA implementation. The CORDIC algorithms presented in this paper are well known in the research and super-computing circles.

1


A. Algorithm Fundamentals

system or treated as part of a system processing gain. That product approaches 0.6073 as the number of iterations reaches infinity. Therefore, the rotation algorithm has a gain, An ≈ 1.65. The exact gain depends on the number of iterations, and follows the following equation:

Vector rotation is the first step to obtain the trigonometric functions. It can also be used for polar to rectangular and vice-versa conversions, for vector magnitude, and as a building block in certain transforms such as the Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT). The algorithm is derived from Givens [6] rotation as follows:

A n = ∏ 1 + 2-2i

The angle of a composite rotation is realized by the sequence of the directions of the elementary rotations. That sequence can be represented by a decision vector. The set of all possible decision vectors is an angular measurement system based on binary arctangents. Conversions between this angular system and any other can easily be accomplished using a LUT. A better conversion method uses an additional adder-subtractor that accumulates the elementary rotation angles post iteration. The angle accumulator adds a third difference equation to the CORDIC algorithm:

Fig. 1. Illustration of the CORDIC algorithm

z i + 1 = z i − d i ⋅ tan −1 ( 2− i )

In Fig. 1, the diagonal blue line is at an angle φ1 above the horizontal. The diagonal red line is actually the blue line rotated anti-clockwise by an angle φ . The new X and Y values are related to the old X and Y values as follows:

x cos φ − y sin φ y cos φ + x sin φ

x' = y' =

(2)

B. Rotation Mode

For CORDIC, the final angle φ2 the angle whose sine

The first mode of operation, called rotation by Volder [5],[4], rotates the input vector by a specified angle (given as an argument). Here, the angle accumulator is initialized with the desired rotation angle. The rotation decision based on the sign of the residual angle is made to diminish the magnitude of the residual angle in the angle accumulator. If the input angle is already expressed in the binary arctangent base, the angle accumulator is not needed [4],[1]. The equations for this are:

to φ2 in one full sweep, we move in steps with careful choice of step values. Rearranging (2) gives us:

cosφ ⋅ [ x − y tanφ ]

(3)

y' = cosφ ⋅ [ y + x tanφ ]

Restricting the rotation angles such that tan φ = ± 2-i, transforms the multiplication by the tangent term to a simple shift operation [1]. Arbitrary angles of rotation are obtained by successively performing smaller elementary rotations. If i, the decision at each iteration, is which direction to rotate rather than whether to rotate or not, then cos(δi) is constant as cos(δi) = cos(-δi). Then the iterative rotation can be expressed as:

x= i +1

K i  x i − yi ⋅ d i ⋅ 2-i 

y= i +1

K i  yi + x i ⋅ d i ⋅ 2 

1 −1 -i 2 )= Where, K i cos(tan = = 1+2-i

(

1+2-i

)

x i +1 =

x i − yi ⋅ d i ⋅ 2-i

yi +1 =

yi + x i ⋅ d i ⋅ 2-i

z i + 1 = z i − d i ⋅ tan

−1

(2 )

(7)

−i

 −1 if z i < 0 Where, di =  +1 otherwise

(4)

-i

(6)

As discussed above, when the angle is in the arctangent base, this extra element is not needed. The CORDIC rotator is normally operated in one of two modes, i.e., the Rotation mode and the Vectoring mode.

or cosine we want to calculate and initial angle φ1 is set to a convenient value such as 0. Rather than rotating from φ1

x'=

(5)

n

= xn

A n [x 0 cos z 0 − y 0 sin z 0 ]

= yn

A n [y 0 cos z 0 + x 0 sin z 0 ]

zn = 0

(8)

A n = ∏ 1 + 2-2i

−1

n

C. Vectoring Mode

di = ± 1

In the vectoring mode, the CORDIC rotator rotates the input vector through whatever angle is necessary to align the result vector with the x axis. The result of the

Removing the scale constant from the iterative equations yields a shift-add algorithm for vector rotation. The product of the Ki's can be applied elsewhere in the

2


D. Evaluation of Sine and Cosine using CORDIC

vectoring operation is a rotation angle and the scaled magnitude i.e. the x component of the original vector. The vectoring function works by seeking to minimize the y component of the residual vector at each rotation. The sign of the residual y component is used to determine which direction to rotate next. When initialized with zero, accumulator contains the traversed angle at the end of the iterations [4]. The equations in this mode are:

In rotational mode the sine and cosine of the input angle can be computed simultaneously. Setting the y component of the input vector to zero reduces the rotation mode result to:

-i

x i +1 =

x i − yi ⋅ d i ⋅ 2

yi +1 =

yi + x i ⋅ d i ⋅ 2-i

z i + 1 = z i − d i ⋅ tan −1 ( 2− i )

Then:

x n = A n x 02 + y 02 yn = 0

An = ∏ 1 + 2

A n ⋅ x 0 cos z 0

y= n

A n ⋅ x 0 sin z 0

(13)

If x0 is equal to 1/An, the rotation produces the unscaled sine and cosine of the angle argument, z0. Very often, the sine and cosine values modulate a magnitude value. Using other techniques (e.g., a LUT) requires a pair of multipliers to obtain the required modulation. The algorithm performs the multiply as part of the rotation operation, and therefore eliminates the need for a pair of explicit multipliers. The output of the CORDIC rotator is scaled by the rotator gain. If the gain is not acceptable, a single multiply by the reciprocal of the gain constant placed before the CORDIC rotator will yield unscaled results [1].

(9)

+1 if yi < 0 Where, di =   -1 otherwise

y  z n = z 0 + tan -1  0   x0 

x= n

(10)

E. Advantages

-2i

• Number of gates required in hardware implementation on an FPGA, are minimum. Thus, hardware complexity is greatly reduced compared to other processors such as DSP multipliers. Hence, it is relatively simple in design. • Due to reduced hardware requirement, cost of a CORDIC hardware implementation is less as only shift registers, adders and look-up table (ROM) are required. • Delay involved during processing is comparable to that of a division or square-rooting operation. • No multiplication and only addition, subtraction and bit-shifting operation ensures simple VLSI implementation. • Either if there is an absence of a hardware multiplier (e.g. microcontroller, microprocessor) or there is a necessity to optimize the number of logic gates (e.g. FPGA), CORDIC is the preferred choice [4].

n

The CORDIC rotation and vectoring algorithms as stated are limited to rotation angles between -π/2 and π/2. For composite rotation angles larger than π/2, an additional rotation is required [1]. Volder [4] describes an initial rotation of ± π/2. This gives the correction iteration:

x' = − d ⋅ y y' = d ⋅ x π z' = z + d ⋅ 2

(11)

 +1 if y < 0 Where, di =  -1 otherwise There is no growth for this initial rotation. Alternatively, an initial rotation of either π or 0 can be made, avoiding the reassignment of the x and y components to the rotator elements. Again, there is no growth due to the initial rotation:

x'= d ⋅ x y' = d ⋅ y if d = 1 z z' =  z - π if d = − 1

F. Applications (12)

• The algorithm was basically developed to offer digital solutions to the problems of real-time navigation in B-58 bomber [5]. • This algorithm finds use in 8087 Math coprocessor, the HP-35 calculator [8], radar signal processors [8] and robotics. • CORDIC algorithm has also been described for the calculation of DFT, DHT, Chirp Z-transforms, filtering, Singular value decomposition and solving linear systems [4].

 -1 if x < 0 Where, di =  +1 otherwise Both reduction forms assume a modulo 2π representation of the input angle. The second reduction may be more convenient when wiring is restricted, as is often the case with FPGAs.

3


• Most calculators, especially the ones built by Texas Instruments and Hewlett-Packard use CORDIC algorithm for calculation of transcendental functions. III. SYSTEM ARCHITECTURE In this paper, the FPGA implementation of simple 8bit dedicated processor for calculating the sine and cosine of an angle using CORDIC Algorithm is presented. The processor was implemented by using Xilinx ISE Design Suite 12.3 and VHDL. Fig. 2 shows functional block diagram of our 8-bit processor. It mainly consists of an 8bit multiplexers, registers, arithmetic logic unit (ALU), tri-state buffer, comparator, and control unit. The logic circuit for dedicated microprocessor is divided into two parts: the datapath unit and control unit [9]. Input of two registers can be either from an external data input or from the output of ALU unit. Two control signals ln_X and ln_Y select which of two sources are to be loaded into registers. Two control signals XLoad and YLoad load a value into respective registers. Bottom multiplier determines the source of two operands of ALU. This allows the selection of one of the two subtraction operations X-Y or Y-X. A comparator unit is used to test condition of equal to or greater than and it accordingly generates status signals. Tristate buffer is used for outputting result from register X.

Fig. 3. A simple, general datapath circuit for the dedicated microprocessor

B. Control Unit Fig. 4 shows the block diagram of control unit and Fig. 5 shows the corresponding state diagram.

Fig. 3. Block diagram of the control unit

Fig. 2. Functional block diagram of the 8-bit processor

A. Datapath Unit

Fig. 4. State diagram of the control unit

Datapath is responsible for the actual execution of all data operations performed by the dedicated processor [9]. Fig. 3 shows the datapath unit for the 8-bit dedicated processor.

Control signals are generated by the control unit which is modelled as a finite state machine with 6 states say S0-S5. There are 9 control signals which form control word and control the operation of datapath, as per the following table:

4


TABLE I CONTROL SIGNAL STATUS DURING DIFFERENT STATES ln_X

ln_Y

XLoad

1 0 0 0 0 1

1 0 0 0 0 1

1 0 1 0 0 0

ln_X

ln_Y

ALU(0,1,2)

1 0 0 0 0 1

1 0 0 0 0 1

101 101 101 101 101 101

YLoad 1 0 0 1 0 0

OE 0 0 0 0 1 0

XY

Clear

0 0 1 0 0 0

0 0 0 0 0 1

Done

State

0 0 0 0 1 0

S0 S1 S2 S3 S4 S5

Fig. 7. Datapath unit simulation

The following table shows the synthesis report of the datapath unit: TABLE II SYNTHESIS REPORT OF THE DATAPATH UNIT Number of Slices Maximum Frequency Minimum Period

61 (31%) 114.05 MHz 8.76 ns

B. Control Unit

If Reset = ‘1’ then state S5 occurs. In this state registers are initialized to ‘0’ by asserting the Clear signal. If Reset = ’0’ then at rising edge of clock, the state is upgraded from S5 to state S0. During the state S0 two inputs are loaded in two registers. After completion of state S0, state S1 is reached. In this state output of registers is checked in comparator for equality and greater than conditions. If both values are same, state S4 occurs else S2 or S3 will continue depending on status of signal neq1. State S1 is repeated again. Default state is S5.

Simulation result of the control unit is shown in Fig. 8.

Fig. 8. Control unit simulation

The following table shows the synthesis report of the control unit:

IV. IMPLEMENTATION AND VERIFICATION All the units in dedicated processor were designed. These units were described in VHDL-modules and synthesized using ISE Design Suite 12.3. ModelSim simulator was used to verify the functionalities of each unit. Finally all the units were combined together and once again tested by using ModelSim simulator. Fig. 6 shows the RTL schematic of the CORDIC processor generated from Xilinx ISE.

TABLE III SYNTHESIS REPORT OF THE CONTROL UNIT Number of Slices Maximum Frequency Minimum Period

4 (2%) 264.34 MHz 3.78 ns

C. Dedicated CORDIC Processor Once the datapath unit and control unit were simulated, they were combined and a dedicated processor was constructed. Simulation shows the CORDIC calculation operation of the Sine and Cosine of an angle. Simulation result for the dedicated processor is as shown in Fig. 9.

Fig. 6. RTL schematic of the CORDIC processor

A. Datapath Unit Fig. 9. Test-Bench waveforms indicating the Sine and Cosine output, obtained after the CORDIC Core Simulation

Simulation result of datapath is shown in Fig. 7.

5


The following table shows the synthesis report of the 8-bit dedicated processor:

V. CONCLUSION We have successfully simulated an 8-bit dedicated processor for calculating the Sine and Cosine of an angle, on ModelSim simulator using the VHDL language. Our processor has six main components namely, control unit, multiplexer unit, ALU unit, register unit, tristate buffer unit and comparator. Our dedicated processor has a maximum frequency of 81.353 MHz was reached with a minimum period of 12.292 ns. 126 (3%) slices were used. Our System can be implemented on Xilinx Spartan 3 XC3S400 using ISE Design Suite 12.3 and VHDL language. Our dedicated processor has a distinct advantage over a general purpose processor, since it repeatedly performs same task its design is more efficient and consumes less resources and is less time intensive.

TABLE IV

SYNTHESIS REPORT OF THE 8-BIT DEDICATED PROCESSOR Logic Utilization

Used

Available

Utilization

Number of Slices Number of Slice Flip Flops Number of 4 input LUTs Number of bonded IOBs Number of GCLKs

126 58 238 28 1

3584 7168 7168 141 8

3% 0% 3% 19% 12%

For a Speed Grade of -5, the minimum period required is 12.292 ns, which corresponds to a maximum frequency of 81.353 MHz. The minimum input arrival time before clock and maximum output required time after clock are 9.657 ns and 8.133 ns respectively.

REFERENCES

The following table shows the comparison between the actual Sine and Cosine values and the ones obtained from Test-Bench analysis in VHDL.

[1]

TABLE V COMPARISON OF SUCCESSIVE ANGLE ROTATION VALUES Angle (A)

Rotation

0.000000 0.000000 0.000000 0.000000 0.523599 0.523599 0.523599 0.523599 1.000000 1.000000 1.000000 1.000000 3.141593 3.141593 3.141593 3.141593

5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20

Angle (A)

Rotation

0.000000 0.000000 0.000000 0.000000 0.523599 0.523599 0.523599 0.523599 1.000000 1.000000 1.000000 1.000000 3.141593 3.141593 3.141593 3.141593

5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20

sin(A) (Actual) 0.00000000 0.00000000 0.00000000 0.00000000 0.50000000 0.50000000 0.50000000 0.50000000 0.84147098 0.84147098 0.84147098 0.84147098 0.00000000 0.00000000 0.00000000 0.00000000

sin(A) (Test-Bench) 0.01483516 0.00117259 0.00001292 -0.00000043 0.48362630 0.49892865 0.50003905 0.50000106 0.80881306 0.84080033 0.84149350 0.84147186 -0.01483516 -0.00117259 -0.00001292 0.00000043

cos(A) (Actual) 1.00000000 1.00000000 1.00000000 1.00000000 0.86602540 0.86602540 0.86602540 0.86602540 0.54030231 0.54030231 0.54030231 0.54030231 -1.0000000 -1.0000000 -1.0000000 -1.00000000

cos(A) (Test-Bench) 0.99988995 0.99999931 1.00000000 1.00000000 0.87527459 0.86664307 0.86600286 0.86602479 0.58806584 0.54134537 0.54026724 0.54030094 -0.99988995 -0.99999931 -1.00000000 -1.00000000

[2]

Error

[3]

-1.4835e-002 -1.1725e-003 -1.2922e-005 4.2874e-007 1.6373e-002 1.0713e-003 -3.9047e-005 -1.0561e-006 3.2657e-002 6.7065e-004 -2.2515e-005 -8.7478e-007 1.4835e-002 1.1725e-003 1.2922e-005 -4.2874e-007

[4] [5] [6] [7] [8] [9]

Error 1.1004e-004 6.8748e-007 8.3498e-011 9.2037e-014 -9.2491e-003 -6.1766e-004 2.2545e-005 6.0975e-007 -4.7763e-002 -1.0430e-003 3.5067e-005 1.3623e-006 -1.1004e-004 -6.8748e-007 -8.3498e-011 -9.2037e-014

6

R. Andraka, “A survey of CORDIC algorithms for FPGA based computers,” Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays, pp. 191 – 200. V. Sharma, FPGA Implementation of EEAS CORDIC based Sine and Cosine Generator, M. Tech Thesis, Dept. of Electronics and Communication Engineering, Thapar University, Patiala, 2009. S. Panda, Performance Analysis and Design of a Discrete Cosine Transform Processor using CORDIC Algorithm, M. Tech Thesis, Dept. of Electronics and Communication Engineering, NIT Rourkela, Rourkela, Orissa, 2010. R. K. Jain, B. Tech Thesis, Design and FPGA Implementation of CORDIC-based 8-point 1D DCT Processor, NIT Rourkela, Rourkela, Orissa, 2011. J. Volder, "The CORDIC Trigonometric Computing Technique," IRE Transactions on Electronic Computing, Vol EC-8, Sept 1959, pp. 330-334. F. Ling, “Givens rotation based least squares lattice and related algorithms,” IEEE Transactions on Signal Processing, Jul 1991, pp. 1541 – 1551. J. S. Walther, "A unified algorithm for elementary functions," Proceedings of the Spring Joint Computer Conference, 1971, pp. 379-385. R. Andraka. "Building a High Performance Bit-Serial Processor in an FPGA," Proceedings of Design SuperCon, Jan 1996, pp. 1-2. E. O. Hwang, Digital Logic and Microprocessor Design with VHDL, Thomson/Nelson, 2006, pp. 379-413, pp. 290-311.

Design and Simulation of an 8-bit Dedicated Processor for ... - arXiv

Design and Simulation of an 8-bit Dedicated Processor for ... - arXiv

Suggest Documents

Intelligent Implementation Processor Design for Oracle ... - arXiv

New Software Dedicated to the Design and Simulation of ... - gests.org

The Design of a Specialised Processor for the Simulation ... - CiteSeerX

Design and Simulation of 32-Bit Block Processor - IJMETMR

APRON: A Cellular Processor Array Simulation and Hardware Design ...

APRON: A Cellular Processor Array Simulation and Hardware Design ...

Design of an Elliptic Curve Cryptography Processor for RFID ... - MDPI

Processor Design for Portable Systems

Design of an Elliptic Curve Cryptography Processor for RFID ... - MDPI

Design of laser-electron storage ring lattice dedicated to ... - arXiv

Non-linear Digital Audio Processor for dedicated ... - CiteSeerX

performance study and simulation of an anycast protocol for ... - arXiv

Direct Processor Access for Non Dedicated Server using Multi Core ...

Dedicated object processor for mobile augmented ... - Springer Link

Design and Simulation of an Independent Solar

Design and Evaluation of an FFT Processor Utilizing ... - Wsimg.com

design and evaluation of an auto-memoization processor - TsumuLab.

Design of High Performance MIPS Cryptography Processor ... - arXiv

design and evaluation of an auto-memoization processor - CiteSeerX

Design and Implementation of an OS-CFAR Processor ... - IEEE Xplore

A Dedicated Image Processor Exploiting Both Spatial And Instruction ...

design and simulation analysis of an electrostatic actuator for

An Integrated Simulation and Visualisation Platform for the Design of ...

Design Space Exploration for an Embedded Processor with Flexible ...