FPGA Implementation of Frequency Output and ... - Semantic Scholar

FPGA Implementation of Frequency Output and Input Using Handel-C Mayela Zamora, Manus Henry Invensys University Technology Centre for Advanced Instrumentation Department of Engineering Science, University of Oxford Parks Road, Oxford OX1 3PJ, UK [email protected] Abstract – The use of digital frequency input and output for data transmission remains common in the design of many embedded applications. Conventional methods of frequency generation, based on counting clock cycles, have a precision which is inversely proportional to the frequency to be generated. This paper describes a simple frequency generation technique which, when implemented in low-cost FPGA hardware, provides a precision of 5 x 10-6 % or better for all frequencies. The method represents an intermediate nonavailable frequency by dithering between two exact frequencies. Laboratory measurements show that, averaged over 2s, the desired frequency is generated to the required precision. This application is used to illustrate the high level of abstraction in the Handel-C language for describing FPGA functionality.

I. INTRODUCTION The use of digital frequency or pulse output is well established as a means of communicating a quantity, such as a measurement or desired output level, in embedded applications. Either the frequency or the duty cycle of a rectangular wave can provide quantitative information. The technique offers several advantages over the 4-20mA signalling standard that remains widespread within industry: these include simpler conditioning circuitry, computer interface and measurand-to-code conversion, higher noise immunity, accuracy, resolution, and output power, as well as wider dynamic range [1]. Various integrated frequency converters are available commercially, based on classical conversion methods. Techniques have been developed for simultaneous transmission of multiple channels of data. More recently, smart transducers have been developed which use frequency output to communicate data. However, increasing sensor accuracy of up to 10-5% demands more advanced techniques of frequency conversion [2]. Fig. 1 illustrates one application of frequency output, in an instrumentation application. The transmitter maps the current measurement onto an equivalent frequency. A square wave of this frequency is generated and sent out over a dedicated pair of wires. At the control system, the signal is received and the frequency estimated, and the inverse of the mapping function is applied to recover the measurement value. The purpose of this paper is to present very simple techniques for digital implementation of frequency output and input, and to illustrate how, with the use of modern digital technology and hardware compilation tools, very high precision (i.e. 1 part in 10,000,000 or better) frequency generation and collection can be achieved. Complete descriptions of frequency transmitter hardware are provided in the form of C-like software (Handel-C), which is readily adapted to the requirements of particular applications.

Figure 1. The transmitter converts the measurement to frequency; the controller recovers the measurement.

II. HANDEL-C PHILOSOPHY An embedded design usually has a processor and a hardware interface, with tasks distributed between software and hardware. The quality of an embedded design largely depends on the algorithm defining the system functionality and on how fast that algorithm is executed. Typically these algorithms are modeled and refined in C code. This is convenient for programming an embedded processor, but performance requirements will often call for custom hardware to serve as a coprocessor. C-based design flows to programmable logic overcome this issue. Low-cost hardware such as a Field Programmable Gate Array (FPGA) consists of arrays of logic that can implement the custom hardware portions of the design. These devices are very flexible and share a property in common with processors in that FPGAs can be programmed and reprogrammed as the algorithm is designed and refined. The drawback to using FPGAs is the difficulty in implementing algorithms in hardware using Register Transfer Logic (RTL) design flows. Traditional FPGA design flows require early hardware/software partitioning decisions, and translations to Hardware Description Language (HDL) and their associated tools. More recent design flows are C-based, making the programming of FPGA coprocessors as easy as compiling code to a processor and allowing a common coding language for both software and hardware. There are several C-based languages available for hardware design, with a level of abstraction above the RTL level. The choice of language depends on the tool sets available for a particular application. Celoxica [3], for example, provides tools for Handel-C, a high-level hardware compilation language which maps programs into hardware at netlist level, for example in Electronic Design Interchange Format (EDIF).

Handel-C has a well-defined semantics, based ultimately on the CSP computational model [4], which guarantees the action and timing of every program statement. Language constructs are provided to capture parallelism and synchronization, without reference to gate-level hardware, libraries or macros. For example, “seq” and “par” statements specify which functions are to be executed sequentially, and which are to be operated in parallel. The use of a common language base for both software and hardware design in a system is a huge advantage for productivity. C-based design makes easier to control the tradeoff between hardware size, performance and functional accuracy. Partitioning and testing can be done and reviewed until the performance required is achieved. Celoxica’s tools for Handel-C include simulation to allow the design to be tested and refined before its transfer to a FPGA. More specifically, because of the well-defined language semantics, simulation can take place at the C-language level, rather than at the gate level, leading to an enormous increase in simulation speed.

allowing quick and easy debugging and re-design at the C level, without the need for a further round of PAR followed by slow, gate-level simulation. III. GENERATING FREQUENCY-TIME SIGNALS Frequency generation and capture are typical functions of embedded applications where the use of dedicated hardware is advantageous. Relatively simple logical and counting steps must be carried out continuously, for which processors are relatively inefficient. In this discussion it is assumed that an embedded system with a processor and dedicated hardware is required to generate a frequency f, where the clock frequency of the hardware subsystem is FCLK.

Frequency generation

Frequency measurement Fig. 3 Simple frequency generation and capture

Fig. 2 Design path for a Xilinx FPGA using Handel-C code

The work described in this paper uses a hardware/software platform previously developed for rapid prototyping [5], which includes an off-the-shelf PC104 processor board and a bespoke FPGA board supporting a Xilinx Spartan IIE. Fig. 3 shows the design path of a Xilinx FPGA device using Handel-C and Celoxica’s development environment DK, with EDIF files as output. Xilinx’s tools for mapping and place and routing (PAR) are integrated in ISE, which translates, maps, and does the PAR of the EDIF file onto a bitstream (BIT) format file. The tool to convert the BIT file into a EXO1 file for FPGA configuration, IMPACT, can be invoked from ISE. The design can be simulated in DK 1

A Motorola hexadecimal S-Record format for PROM files.

Figure 3 shows a basic hardware implementation of a digital frequency generation (a) and capture (b). In the frequency generator, a timing register (or timer) counts clock ticks. When the timer reaches the specified target value, the output line is inverted and the accumulator is reset to zero. The regular inversion of the output signal generates the square wave at the desired frequency. In the most common form of frequency capture, a register counts the number of input edges over a finite time interval, as measured by a timer. When the timer reaches the set interval tsample, the number of waveforms is stored for transmission to the processor (by polling or interrupt), and the counter and timer are reset. The frequency range used for mapping the measurement value is constrained at both extremes. The lowest frequency that can be generated is limited by the size of the counter register. If this is of length k bits, then the maximum count is 2k - 1, and so the minimum frequency that can be generated is 0.5* FCLK / (2k – 1). Theoretically, the highest frequency that can be generated is simply 0.5* FCLK , corresponding to a target count value of 1. In most practical applications, however, the determining factor is the ability of the communication medium to transmit square wave pulses

without significant distortion, and typical frequency values are in the range 1-100 kHz.

maximum

Many methods have been developed over the last three decades for accurate or easy frequency measurement [6], [7], but there is less in the literature describing improved frequency generation. Commercial sensors and transmitters typically offer frequency output precision of 0.001%, at best [1]. Zhou et al. [8] developed a method of deleting clock ticks to achieve precisions of the order of 10-8 %, but at the expense of very complicated circuitry. This paper describes and provides the code for a simple frequency generation technique which, when implemented on low-cost hardware, provides a precision of 10-6 % [2],[9]. A. Conventional method of frequency generation In the most common approach to frequency generation the number of clock ticks target_count (or N in Fig. 2a) is computed to complete half a period of the desired frequency f. The processor calculates the value of target_count, given by

⎛ FCLKout ⎞ ⎟, ⎝ 2f ⎠

target _ count = round ⎜

(1)

where round(x) is the nearest integer to x, and sends this value to the dedicated frequency output hardware. A counter is incremented each clock tick and compared with target_count. When the target is reached, the pulse output is inverted and the counter is reset. The following pseudo-code shows the procedure implemented in the dedicated hardware: while TRUE { count=0; while (count < target_count) count++; invert_pulse_output(); }

The accuracy of this algorithm is primarily a function of how accurately the true clock frequency is known – the issue of crystal frequency calibration and compensation for temperature variation is well understood and not discussed further. Of greater interest is the precision of the algorithm i.e. assuming the true clock frequency is known, how precisely can an arbitrary frequency be generated? The conventional method has a single parameter, target_count, which is an integer. The only frequency values that can be generated are thus: FCLKout ⎛ ⎞ exact _ fout ( target _ count ) = ⎜ ⎟ , (2) ⎝ 2 ⋅ target _ count ⎠ For low output frequencies, target_count is high, and so the precision error is small. For higher output frequencies, the precision error grows rapidly. For example, with a 40 MHz clock, to generate a 1 Hz signal, the number of clock ticks to complete half a cycle, target_count, is 20,000,000, so the frequency precision is 1 part in 20 million; for a 10 kHz signal, however, target_count is 2,000 and the resolution is

only 1 part in 2 thousand. Adjacent obtainable exact frequencies are 9,995 Hz and 10,005 Hz, and so precision error may be as high as 2.5Hz. B. Dithering technique For a given clock frequency, each square wave pulse takes place over an integer number of clock ticks, and therefore can take one of only a few discrete frequencies. To represent an intermediate frequency, dithering can be used to provide a sequence of pulses which, averaged over some reasonably short timescale, provides the desired average frequency to high precision. This results in a more complex dedicated hardware design, but much improved performance, especially at high frequencies. Two parameters are used: a variable step with which to increment the counter, along with the conventional target value. Modulo arithmetic is used to restart the counter instead of a simple reset. The counter accumulator accum, initially set at zero, is increased on every clock tick by a quantity step. When the accumulator reaches or exceeds the value of target_count, target_count is subtracted from the accumulator (equivalent of a modulo operation for step < target_count). This may leave a small remainder in the accumulator, as shown in the following pseudo-code : accum0 = 0; while TRUE { while (accumt < TARGET_COUNT) accumt = accumt-1 + step; invert_pulse_output(); accumt = accumt - TARGET_COUNT; }

It can be shown that the accumulator is reset to zero every target_count clock cycles, and the number of output cycles in this time window is equal to step [2]. Thus the output sequence repeats every target_count clock cycles, generating an exact number of cycles equal to step. The two integer values, step and target_count, are calculated by the processor such that their ratio is as close as possible to ( 2 f / FCLK ): step target _ count

≈

2f FCLKout

,

(3)

The maximum error in f is inversely proportional to target_count [2], therefore, step and target_count are further selected such that target_count is as large as possible for the given hardware. This suggests a simple algorithm for their selection, while providing expressions for the maximum frequency precision error, and ensuring this error is kept small. For example, suppose that the clock frequency is 40 MHz and the desired value of f is 10 kHz. If target_count can be as high as 40,000,000, a step value of 20,000 generates an exact frequency of 10 kHz at the output. Increasing target_count to 40,000,001 generates a sequence of pulses with an average frequency over 40,000,001 clock cycles (i.e. just over 1 second) of 9,999.999,75 Hz. The available precision in f is thus 1 part in 40 million, i.e. 0.000,25 Hz.

The algorithm can be implemented in silicon more efficiently by assuming target_count will be as close as possible to the largest power of two representable in the accumulation register, MAX_VAL. Their difference is denoted offset. Comparing the accumulated value with MAX_VAL is equivalent to testing its top bit; subtracting MAX_VAL is equivalent to resetting the top bit of the accumulator. The modified algorithm is: accum0 = 0; while TRUE { while (accum_top_bit_not_set()) accumt = accumt-1 + step; invert_pulse_output(); reset_accum_top_bit(); accumt = accumt + offset; }

The processor is required to calculate two integer values, step and offset, for transmission to the hardware subsystem. A simple, but not necessarily optimal, method for calculating their values for any particular input frequency f is as follows. The value of step is calculated using MAX_VAL as a rough approximation of target_count in Eq. 3.

⎛ MAX _ VAL ⋅ 2 f ⎞ ⎟ , (4) FCLKout ⎝ ⎠

=

floor ⎜

where floor(x) denotes the integer part of x. Once step is calculated, the best value of target_count to approximate the output frequency fout to the desired frequency f is: target _ count

=

⎛ step ⋅ FCLKout ⎞ ⎟, 2f ⎝ ⎠

round ⎜

(5)

=

This statement implicitly declares a one bit variable dig_output, the value of which will be mapped to the output pin every clock cycle. The function generating the frequency output is: while (1) { // infinite loop if (accum[24] != 0) { // if target reached par { accum &= 0xFFFFFF; // switch off top bit dig_output = ~dig_output; // invert output } accum += offset_plus_two_steps; // add offset } // plus 2 steps else accum += step; // else just add one step }

Here, MAX_VAL is 224 and the accumulator is a 25-bit integer. The function continuously checks the accumulator’s top bit (bit index goes from 0 to 24) in an if-else statement. Handel-C has a simple rule for timing: assignment, input and output, and delay statements are all executed in exactly one clock cycle; all other statements and expressions require no clock cycles. It can be seen that while the top bit of the accumulator is not set, it is incremented by step every clock cycle. The other branch of the IF statement, when the top bit has been set, requires two clock cycles: one for the two parallel assignments (resetting the top bit of the accumulator – equivalent to subtracting MAX_VAL, and inverting the output), and the second to add the constant offset to the accumulator. As this branch of the IF statement takes two clock cycles, the value added is offset as defined in Eq. 6, plus twice the value of step, defined in Eq. 4. Corresponding code for reading an input frequency is as follows: interface bus_clock_in (int 1) dig_in_pin() with {data={"P33"}};

Therefore, offset is: offset

interface bus_out() dig_out_pin( dig_output) with {data={"P167"}};

void generate_freq( void) {

In any real-time implementation, the value of offset may need augmenting by a multiple of step in order to compensate for the clock cycles used to carry out the instructions for when the top bit is set.

step

interface, specifying the FPGA pin connected to the output circuitry.

⎛ ⎝

round ⎜ MAX _ VAL −

step ⋅ FCLKout ⎞ 2f

⎟, ⎠

void read_freq( void) {

(6)

MAX_VAL limits the accuracy as well as the lower bound of f. Its value must be such that the quantity (MAX_VAL·2f) is greater or equal than FCLK for the lowest frequency f, so step, as calculated in Eq. 4, is greater than zero for all frequencies in the range. The maximum frequency is obviously limited by FCLK. MAX_VAL also dictates the maximum sequence length and hence the minimum time period over which the theoretical frequency precision can be achieved, this being MAX_VAL/ FCLK seconds. IV. HANDEL-C CODE Implementation in Handel-C code of the algorithm described above starts with a construct defining the frequency output

while(1) { if (set_frequency_input_data_req) { // if processor request par { // send counts tick_per_cycle_sent = tick_per_cycle; cycle_count_sent = cycle_count; // indicate task completed set_frequency_input_data_req = 0; // reset all counters tick_per_cycle = 0; cycle_count = 0; // except current-cycle ticks counter current_cycle_tick_count++; } } else { // check for input rising edge par { if ((last_dig_in_pin == 0) & (dig_in_pin.in == 1)) {// rising edge par {

Functions to communicate with the processor and to deal with its requests complete the design. Initialisation of variables is done prior to the execution of the rest of the functions, which run in parallel in the design’s main function.

// increment cycle counter cycle_count++; tick_per_cycle += current_cycle_tick_count; current_cycle_tick_count = 1; } } else

void main ( void) { do_initialise();

current_cycle_tick_count++;

} } }

// main loops par { record_last_input (); read_freq (); generate_freq (); do_control(); // respond to PC requests do_pc104(); // interface with PC }

}

The key idea here is that instead of simply giving the number of edges since the last enquiry, the FPGA also reports the exact number of clock ticks between these edges, thus giving a more precise estimate of the input frequency. This function runs continuously, checking for a rising edge in the frequency input. When an edge is detected, the cycle counter is incremented, while the ticks-per-cycle counter is updated, resetting the current-cycle-ticks counter to 1. Otherwise, the current-cycle-ticks counter is incremented. The ticks counters are 32-bit unsigned integer numbers, while the cycle counter is a 16-bit unsigned integer. To prevent overflow, the processor should regularly read the counters, sending a request to get the counters from the FPGA. When this request is received, the cycle and tick-per-cycle counters values are transferred to the processor and their values are reset; all these operations are performed in a parallel statement. This is possible because in an assignment statement, the variables on the right hand side take their values at the end of the last clock cycle. The current-cycle-tick counter is updated while attending a processor request, as the parallel block takes also one clock cycle. At the processor level, the value of the input frequency is calculated as: fin =

FCLKout .cycle _ count tick _ per _ cycle

,

(7)

The frequency input range is defined by the FPGA clock frequency and the size of the counters. In this case, the maximum frequency is 0.5* FCLK, while the minimum value is in theory 2-32* FCLK. The algorithm presented above relies on the processor to prevent the counters from overflowing. A practical range of 1 Hz – 100 kHz, requires an update rate of a least 1.5Hz to prevent the overflow of the cycle counter when input frequency is at its maximum value, but it will read zero every other update when input frequency is at its minimum value. In order to prevent zero readings at low frequency, the minimum value should be greater than the processor’s update rate.

// initialize variables

}

When the code presented in this section is compiled with Celoxica’s DK, the estimated number of NAND gates for the frequency functions is 2,715 and of flip-flops is 90, with no memory usage. The total design including the interface with the processor uses an equivalent of about 6,000 gates when targeting a Xilinx Spartan IIe FPGA, with a clock speed up to 50 MHz . With relatively minor changes to the code, multiple frequency inputs and outputs channels can be implemented while maintaining high clock speeds. V. EXPERIMENTAL RESULTS The dithering approach for frequency generation was tested for several frequencies in the range 2 Hz – 20 KHz. The output frequency was measured using a digital frequency meter (Agilent Universal Counter 53131A) with 10-digit precision, gate time of 2s, and 10 measurements for the statistics. Simulation and experimental results for 10 kHz, are presented in this section. Small variations in the desired frequency, of the order of 1 part in 10 million were introduced to test the precision of the method. Table I shows the values of the optimal parameters, step, target_count and offset for the desired frequency, using a 40 MHz clock and with MAX_VAL chosen to be 224. The theoretical values for the frequency output are given, computed as:

f out

=

step × FCLKout 2 × target _ count

,

(8)

along with the value generated during simulation as the average over a window with step number of output cycles.

TABLE I. OPTIMAL PARAMETERS FOR DESIRED FREQUENCY F, AND THEORETICAL AND SIMULATION VALUES OF FREQUENCY OUTPUT FOUT Required frequency f (Hz) 9,999.997 9,999.998 9,999.999 10,000.000 10,000.001 10,000.002

Optimal parameters Step 8388 8388 8388 8388 8388 8388

target_count 16776005 16776003 16776001 16775999 16775997 16775996

Offset 1211 1213 1214 1216 1218 1219

Theoretical fout (Hz) 9,999.997,020 9,999.998,212 9,999.998,808 10,000.000,000 10,000.001,192 10,000.001,788

Simulated fout (Hz) 9,999.997,020 9,999.998,212 9,999.998,808 10,000.000,000 10,000.001,192 10,000.001,789

Quantization error (µHz) -20 -212 192 0 -192 212

TABLE II. EXPERIMENTAL RESULTS Required frequency f (Hz) 9,999.997 9,999.998 9,999.999 10,000.000 10,000.001 10,000.002

Experimental results Average fout (Hz)

Standard deviation (µHz)

9,999.996,822,04 9,999.998,076,21 9,999.999,379,71 10,000.000,203,8 10,000.001,297,0 10,000.002,430,7

49.16 38.25 25.23 21.21 27.59 34.65

Table I also shows the precision error due to the selection of integer values for step and offset. It can be seen that this error is always less than half of the quantization step, which at 10 kHz and for the chosen value of MAX_VAL is ±0.0006 Hz. For the experimental tests, the instrument error (typically < ±0.0720 Hz) and the FPGA crystal frequency error, mainly caused by slow temperature drift, were compensated for by adjusting the assumed value of FCLK such that fout is as close as possible to f at 10 kHz, which in theory has a zero quantization error. The results can be found in Table II, including mean and standard deviation of the real frequency output fout, as well as the absolute and the relative error with respect to f, computed with 2σ confidence. For the given value of MAX_VAL, the theoretical precision is 6×10-6 %2, which is achieved in the experimental data. VI. CONCLUSIONS Measurements taken under laboratory normal conditions show that the dithering technique for digital frequency generation provides high precision, of the order of 10-7 can be achieved. This precision, constant for all the frequency range, depends in theory only on MAX_VAL; but in practice the true accuracy is limited by the stability of the frequency reference. Temperature fluctuations are the main contributors to crystal clock instabilities. VII. REFERENCES [1] Yurish, S.Y., Kirianaki, N.V., Pallàs-Areny, R., “Universal frequency-to-digital converter for quasidigital and smart sensors: specifications and applications”, Sensor Review, 25(2), 2005, pp. 92-99. [2] Zamora, M., Henry, M., Peter, C., “Generation of frequency output for instrumentation applications using digital hardware”, Sensor Review, 23(2), 2003, pp143-9. [3] Celoxica, Data sheet: “Handel-C for hardware design”, available: http://www.celoxica.com, May 9, 2006. [4] Henry, M.P., “Keynote Paper: Hardware compilation - a new technique for rapid prototyping of digital systems applied to sensor validation”, Control Engineering Practice, 3(7), 1995, pp. 907–924. [5] Tombs, M.S., Henry, M.P., Peter, C., “From research to product using a common development platform”, Control Engineering Practice, 12(4), 2004, pp. 503-510. [6] Stein, S.R., “Frequency and time, their measurement and 2

Note: ∆f / f ≈ 1 / MAX_VAL = 0.000,000,06 = 6×10-8, or 6×10-6 %

Frequency error (µHz) 177.96 ± 98.32 -76.21 ± 76.50 -379.71 ± 50.46 -203.8 ± 42.4 -297.0 ± 55.2 -430.7 ± 69.3

Frequency error (%) ( 1.8 ± 1.0)×10-6 ( 7.6 ± 0.8)×10-7 ( -3.8 ± 0.5)×10-6 ( -2.0 ± 0.4)×10-6 ( -3.0 ± 0.6)×10-6 ( -4.3 ± 0.7)×10-6

characterization”, in Gerber, E.A. and Ballato, A. (Eds.), Precision frequency control, Academic Press, New York: 1985, pp. 191-416. [7] Kirianaki, N.V., Yurish S.Y., Shpak N.O., “Methods of dependent count for frequency measurements”, Measurement, 29(1), 2001, pp. 31-50. [8] Zhou, W., Li, Z., Zhou, H., “A practical method to process time and frequency signal”, IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control, 47(2), 2000, pp. 480-3. [9] Henry, M., “Frequency output generation through alternating between selected frequencies”, US Patent 6,914,463, Jul. 5 2005.

FPGA Implementation of Frequency Output and ... - Semantic Scholar

FPGA Implementation of Frequency Output and ... - Semantic Scholar

Suggest Documents

FPGA Implementation of Digital Modulation ... - Semantic Scholar

FPGA-based implementation of intelligent ... - Semantic Scholar

FPGA Implementation of Image Adaptive ... - Semantic Scholar

SAD implementation in FPGA hardware - Semantic Scholar

A Compact and Efficient FPGA Implementation of ... - Semantic Scholar

FPGA Implementation of Optimal and Approximate ... - Semantic Scholar

design and implementation of fpga based signal ... - Semantic Scholar

Design and Implementation of FPGA-Based ... - Semantic Scholar

FPGA Implementation of 2D and 3D Image ... - Semantic Scholar

FPGA Implementation Of A Fully And Partially ... - Semantic Scholar

fpga implementation of soft output viterbi algorithm using memoryless ...

fpga implementation of soft output viterbi ... - Aircc Digital Library

algorithms and FPGA implementation

FPGA Implementation of Digital Controller for ... - Semantic Scholar

FPGA Implementation of ADPLL with Ripple ... - Semantic Scholar

FPGA-Implementation of a Sequential Adaptive ... - Semantic Scholar

A FPGA Implementation of Neural/Wavelet Face ... - Semantic Scholar

Parameter-Specific FPGA Implementation of Edit ... - Semantic Scholar

An FPGA implementation of real-time QRS detection - Semantic Scholar

FPGA implementation of higher degree polynomial ... - Semantic Scholar

FPGA Implementation of a Minutiae Extraction ... - Semantic Scholar

FPGA Based Implementation of an Invisible ... - Semantic Scholar

FPGA-Implementation of an Adaptive Neural ... - Semantic Scholar

Efficient FPGA Implementation of H.264 CAVLC ... - Semantic Scholar