REAL TIME. DIGITAL. SIGNAL. PROCESSING www.electron.frba.utn.edu.ar/
dplab ..... RG Lyons, Understanding Digital Signal Processing. 2nd ed. Prentice
Hall ...
Eng. Julian S. Bruno
Introduction
REAL TIME DIGITAL SIGNAL PROCESSING
Why Digital? A brief comparison with analog.
www.electron.frba.utn.edu.ar/dplab
UTN - FRBA 2011
Simposio Argentino de Sistemas Embebidos UTN - FRBA 2011
Advantages
Eng. Julian S. Bruno
The BIG picture
Flexibility. Easily modifiable and upgradeable. Reproducibility. p y Don’t depend p on components tolerance. Exactly reproduced from one unit to other. Reliability. No age or environmental drift. Comple it Allows Complexity. Allo s sophisticated applications in only one chip.
D t Data Real time algorithms Results FAST Real Time DSP System
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Sampling signals: A very important fifirst step.
Sampling low-pass low pass signals (CT) The sampling theorem indicates that a continuous signal can be properly sampled, only if it does not contain frequency components above one-half of the sampling rate. t
Fc
N
LF
Fs MIPS MFLOPS
Real Time DSP System UTN - FRBA 2011
Nyquist sampling theorem Eng. Julian S. Bruno
Aliasing and frequency ambiguity
UTN - FRBA 2011
IF sampling Harmonic sampling Sub-Nyquist sampling p g Undersampling
2 fc B 2f B fs c m 1 m
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Sampling band-pass band pass signals
Fs = 6KHz
UTN - FRBA 2011
fS 2 fN
UTN - FRBA 2011
for any positive integer m m, where fs ≥ 2B is accomplished. Eng. Julian S. Bruno
Sampling band-pass band pass signals m
(2Fc-B)/m
(2Fc-B)/(m+1)
Optimum Fs
1
35.0 MHz
22.5 MHz
22.5 MHz
2
17.5 MHz
15.0 MHz
17.5 MHz
3
11.66 MHz
11.25 MHz
11.25 MHz
4
8 75 MH 8.75 MHz
9 0 MH 9.0 MHz
-
5
7.0 MHz
7.5 MHz
-
Reconstruction signals
Optimum Fs is defined here p frequency q y as that optimum where spectral replications do no butt up against each other except at zero Hz
Real Time DSP System
UTN - FRBA 2011
Eng. Julian S. Bruno
Reconstruction signals
Analog signal X can be reconstructed from its samples by using the f ll i formula: following f l The reconstruction is based on the interpolation of shifted sinc functions. functions. It is very difficult to generate sinc i functions f ti by b electronic l t i circuitry. An approximation of a sinc function is a pulse. Sample and hold circuit performs this approximation approximation.
UTN - FRBA 2011
Eng. Julian S. Bruno
Reconstruction Errors
t kTS X (t ) X (kTS ) sinc k TS
UTN - FRBA 2011
Sample and hold circuits
The gain in the desired central band is not constant The are high-frequency replica of the signal spectrum Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Reconstruction Solutions
Reconstruction Errors Example
The gain in the desired central band is not constant It is p possible to compensate p for this non-ideality y byy using an inverse filter as part of the DSP component
The are high-frequency replica of the signal spectrum which can be removed by using a lowpass filter UTN - FRBA 2011
Eng. Julian S. Bruno
Real Time constraints
UTN - FRBA 2011
Eng. Julian S. Bruno
Real time constraints
Signal Path
Algorithms time (tA) MUST fit between two consecutive sampling periods (tS). Thus tA limits the maximum frequency that a system can work. The definition of real time is VERY application y ) dependant ((faster speed of evolution of the system).
Real Time DSP System UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Real time constraints
DSP hardware
Block Bl k Processing P i Mode M d 4 memory buffers of length N are required f double-buffering for ff method. 2 memoryy buffers (in ( and out) are needed for internal processing by the processor.
A delay of 2NTs is incurred in block processing. More complicated programming is needed to manage the switching between buffers. Can be configured the ADC and DAC to transfer data samples into the internal memory of processor using the serial ports and the DMA.
UTN - FRBA 2011
Real Time DSP System Eng. Julian S. Bruno
UTN - FRBA 2011
What can we do with a DSP?
Almost any linear and nonlinear system (PID controller). Digital filters (FIR-IIR). Adaptive systems (LMS algorithm) algorithm). Modulators and demodulators. Any mathematical intensive algorithm (FFTDCT-WT). Image compression algorithms Real time video processing
UTN - FRBA 2011
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Linear systems implementation
Being x(n) and h(n) are arrays of numbers. If we want to compute y(n) we have to multiply and sum the last M samples being M the length of h(n) samples, h(n). This repeated for every new sample received from de ADC. As you can see, any linear system uses multiplications, accumulations ( (sums), ) and d lloops iintensively. t i l
UTN - FRBA 2011
Eng. Julian S. Bruno
Summary of desirable features of a DSP
Fast Fourier Transform FFT
UTN - FRBA 2011
Eng. Julian S. Bruno
So those are DSP math features So,
Multiply M lti l and dA Accumulators l t (MAC’s) units. ALU’s ALU s (fixed and floating point). Barrel shifters. Depending on DSP application, more than one unit it are presentt in i modern d DSP’s, allowing parallelism. Harvard (modified) architecture provide multiple operations per cycle.
UTN - FRBA 2011
Fastt in F i mathematics th ti operations, ti and d combinations of them (multiply and sum specially). i ll ) Flexible addressing modes (bit reversal, circular buffers, ff zero overhead loops)) DSP specific instruction set ((arithmetic shifting, g saturating arithmetic, rounding, normalization) Minimum overhead p peripherals p ((communications devices specially) DSP instructions for specific applications (Video, Control, Audio)
UTN - FRBA 2011
Architectural Features for Efficient P Programming i
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
UTN - FRBA 2011
Eng. Julian S. Bruno
Architectural Features for Efficient P Programming i
Specialized Addressing Modes Circular buffering
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
UTN - FRBA 2011
B0 = 0x00; L0 = 0; // Base and length
B0 = 0x00; L0 = 44; // Base and length I0 = 0x00; M0 = 16; // Index and increment R0 = [I0++M0]; // R0=1 & I0=0x10 R0 = [I0++M0]; // R0=5 & I0=0x20 R0 = [I0++M0]; // R0=9 & I0=0x04 R0 = [I0++M0]; [I0 M0] // R0 R0=2 2 & I0=0x14 I0 0 14 R0 = [I0++M0]; // R0=6 & I0=0x24 Eng. Julian S. Bruno
Architectural Features for Efficient P Programming i
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
UTN - FRBA 2011
I0=0; M0=1; // Index and increment I2=256; P0 = 8; LOOP(start, end) LC0 = P0; start: // I0 automatically incremented in B-R progression R0 = [I0] || I0 += M0 (BREV); end: // I2 point to bit bit-riversed riversed buffer [I2++] = R0;
UTN - FRBA 2011
Eng. Julian S. Bruno
Hardware Loop Constructs
Bit-Riversal
Looping is a critical feature in communications processing algorithms. There are two key looping-related features that can improve p p performance on a wide variety y of algorithms: “zero-overhead zero overhead
hardware loop” loop “hardware loop buffers”
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Architectural Features for Efficient P Programming i
Cacheable memories
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Architectural Features for Efficient P Programming i
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
Eng. Julian S. Bruno
Multiple operations per cycle
Today’s high-speed processors would effectivelyy run at much slower speeds because larger applications would only fit in slower external memory. Programmers would be forced to manuallyy move key code in and out of internal SRAM. Adding g data and instruction caches into the architecture, external memory becomes much more manageable. bl
In addition to performing multiple ALU/MAC operations each core processor cycle, additional data loads and stores can also be completed in the same cycle. y The memory is typically portioned into sub-banks sub banks that can be dualaccessed by the core and optionally p y by a DMA controller. There are two multi-issue architectures: VLIW and superscalar
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Architectural Features for Efficient P Programming i
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
Interlocked pipeline
UTN - FRBA 2011
Eng. Julian S. Bruno
Architectural Features for Efficient P Programming i
UTN - FRBA 2011
Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features
UTN - FRBA 2011
Eng. Julian S. Bruno
Another important features
I order In d tto increase i th throughput, h t DSPs DSP are d designed i d tto be pipelined When assembly programming is required required, the pipeline can make programming more challenging. The processor automatically handles stalls and bubbles.
RISC lik like registers i t and d iinstruction t ti sett Multiple data/program buses. DMA controller for handling peripherals In traditional fixed fixed-point point DSPs, word sizes are usually fixed. However, there is an advantage to having data registers that can be treated as: One
64-bit word Two 32 32-bit bit word Four 16-bit word Eight 8-bit word Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
DSP clasification
Why DSP hardware?
Fixed or Floating point arithmetic. Millions of multiply–accumulate py operations p p per second, MMACs. Millions of floating floating-point point operations per second, MFLOPS. Application specific feat features res ((video, ideo a audio, dio control, communications). Memory
UTN - FRBA 2011
Eng. Julian S. Bruno
TI Processors
Special-purpose (custom) chips such as application-specific integrated circuits (ASIC). Field-programmable Field programmable gate arrays (FPGA). General-purpose microprocessors or microcontrollers (μP/μC). General-purpose digital signal processors (DSP processors). DSP processors with application-specific hardware (HW) accelerators.
UTN - FRBA 2011
Eng. Julian S. Bruno
C5000™ DSP Platform Roadmap C5000
C6000™ High Hi h Performance P f DSP DSPs Ideal for imaging, broadband infrastructure and performance audio applications.
C6000™ Performance Value DSPs C6000 Ideal for broadband infrastructure and performance audio applications. Lower cost.
C6000™ Floating-point DSPs Ideal for professional audio products, biometrics, medical, industrial, digital imaging, speech recognition, conference phones and voice voice-over over packet
C5000™ Power-Efficient DSPs Optimized for power- and cost-efficient embedded signal processing solutions
C2000™ 32-bit Real-time MCUs Optimized core can run multiple complex control algorithms at speeds necessary for f demanding d di control t l applications li ti
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
TI’ss ARM Processor TI Processor-Based Based
C6000™ DSP Platform Roadmap C6000
Sitara™ ARM Microprocessors
Stellaris® MCU
Eng. Julian S. Bruno
ARM9-based devices. LP, general-purpose, multimedia and graphics processing ARM® Cortex™-A8 Cortex™ A8 core C64x+™ DSP and Video Accelerators 3-D Graphics Acceleration
DaVinci™ Digital Media Processors
UTN - FRBA 2011
ARM Cortex-M3 Cl k Speed: Clock S d Up U to t 100 MHz MH Up to 125 MIPS (at 100 MHz) Advanced integration: Serial interfaces, motion control, system, analog
OMAP™ Applications A li ti P Processors
Cortex™-A8 and ARM9™-based embedded microprocessors Clock Speed: 300 MHz to 1.5 GHz 3D Graphics Accelerator and Power technology (OMAP)
Optimized for digital video systems only, ARM9 + DSP and DSP only.
ARM9 UTN - FRBA 2011
Eng. Julian S. Bruno
ADI Processors
TI’ss ARM Processor TI Processor-Based Based Products
TigerSHARC® Processors
SHARC® Processors
32-Bit floating-point Clock Speed: 150MHz to 400MHz / 2.4 GFLOPs. Accelerator Architecture: FIR, IIR, FFT.
Blackfin® Processors
32-bit fixed-point as well as floating-point Clock Speed: 250MHz to 600MHz 4.8 GMACs of 16-bit 16 bit performance / 3.6 GFLOPs 24 Mbits of on- chip memory 5 Gbytes of I/O bandwidth
16/32-bit fixed point Clock Speed: p 200MHz to 756MHz / 1.5 GMACs Very low power consumption: 0.23mW/Mhz RTOS supported. Multicore 600MHz / 2.4 GMACs.
ADSP-21xx Processors
16/32-bit fixed point Clock Speed: 75MHz to 160MHz Analog Devices brought first programmable processor to market in 1986 UTN - FRBA 2011
UTN - FRBA 2011
Eng. Julian S. Bruno
Eng. Julian S. Bruno
ADSP-21xx Processors
Blackfin Processors
ADSP-2191 BLOCK DIAGRAM
ADSP-BF536/ADSP-BF537 BLOCK DIAGRAM
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
SHARC Processors
TigerSHARC g Processor
ADSP-2146x BLOCK DIAGRAM
ADSP-TS201S BLOCK DIAGRAM
UTN - FRBA 2011
Eng. Julian S. Bruno
UTN - FRBA 2011
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Markets and Applications
Recommended bibliography
RG L Lyons, U Understanding d t di Di Digital it l Si Signall P Processing i 2nd ed. Prentice Hall 2004.
SW Smith, The Scientist and Engineer’s guide to DSP. California Tech. Pub. 1997.
UTN - FRBA 2011
Eng. Julian S. Bruno
Questions? Thank you! Eng. Julian S. Bruno
UTN - FRBA 2011
Ch1: The Breadth and Depth of DSP Ch3: ADC and DAC
SM Kuo, K BH L Lee. R Real-Time l Ti Digital Di it l Si Signall P Processing i 2nd ed. John Wiley and Sons. 2006
Ch2: Periodic Sampling
Ch1:Introduction to Real Real-Time Time Digital Signal Processing
NOTE: Many images used in this presentation were extracted from the recommended bibliography.
UTN - FRBA 2011
Eng. Julian S. Bruno