Performance Enhancement on Digital Signal Processors with

IEICE TRANS. FUNDAMENTALS, VOL.E82–A, NO.2 FEBRUARY 1999

238

PAPER

Special Section on VLSI for Digital Signal Processing

Performance Enhancement on Digital Signal Processors with Complex Arithmetic Capability Yoshimasa NEGISHI† , Student Member, Eiji WATANABE† , Akinori NISHIHARA†† , and Takeshi YANAGISAWA† , Members

SUMMARY Digital Signal Processors with complex arithmetic capability (DSP-C) are useful for various applications. In this paper, we propose a method for the effective implementation of specific circuits with real coefficients on DSP-C. DSPC has special hardware such as a complex multiplier so that a complex calculation can be performed with only one instruction. First, we show that nodes with two real coefficient input branches can be implemented by complex multiplications. We apply this implementation to 2D circuits and transversal circuits with real coefficients. Next, we introduce a new computational mode (Advanced mode) and a new multiplier into PSI, a kind of DSP-C which has been proposed already, in order to process the circuits effectively. The effectiveness of the proposed method is shown by simulation in the last part. key words: digital signal processor, digital signal processing, complex signal processing, digital filter, complex multiplier

1.

Introduction

Recently, many application-specific digital signal processing systems have been designed for multimedia applications. They can be classified into two major categories: Application Specific Integrated Circuit (ASIC) based [1] and Digital Signal Processor (DSP) based [2]– [4], [6], [10]. ASIC based solutions provide the most optimized systems but they require more initial cost. When less initial cost is demanded, DSPs are often adopted in such systems. The problem of DSP based systems is less processing capability than ASIC based systems. To improve it, the following methods have been proposed: designing a new architecture DSP [2]– [4], using multiprocessors [6], adding special functions to the conventional DSP [10]. Application-specific digital signal processing systems sometimes need complex arithmetic computations. For instance, systems for fractal problems [8], complex digital filters [9], FFT algorithm, etc. If complex arithmetic computations are performed using the conventional DSPs (DSP-R) in the system, the program becomes complicated because real part and imaginary part must be manipulated separately. Although an efManuscript received July 6, 1998. Manuscript revised September 7, 1998. † The authors are with the Faculty of System Engineering, Shibaura Institute of Technology, Omiya-shi, 330–8570 Japan. †† The author is with the Center for Research and Development of Educational Technology, Tokyo Institute of Technology, Tokyo, 152–8552 Japan.

fective method to process complex arithmetic computations using DSP-R has been proposed in [5], it doesn’t simplify the program after all. In this case, DSP with complex arithmetic capability (DSP-C) is useful. DSPC has a dedicated adder and multiplier for complex arithmetic so that it can operate them like real calculations performed by DSP-R. Despite such an advantage, DSP-Cs haven’t been used widely yet. To use it widely, the function other than complex arithmetic is desired. In this paper, we propose a new function which is useful in 2D circuits [7] and transversal circuits with real coefficients. As one of the DSP-C, PSI has been proposed in [10]. It has a double clocked real multiplier based on Booth’s algorithm for an effective complex multiplication. It also has two computational modes called real mode and complex mode. The multiplier works as a real multiplier in real mode while a complex multiplier in complex mode. But due to the lack of bus bandwidth, it doesn’t work fully in the real mode. Generally speaking, most DSP-Cs have much greater computational capability than DSP-Rs. This is because complex arithmetic requires more calculations than real arithmetic. But the capability isn’t used fully in real arithmetic due to some difficulties, for instance, lack of bus bandwidth etc. If DSP-C multiplier can be used fully in real arithmetic, we can obtain higher performance in real arithmetic. In this paper, we describe a new method of real circuit implementation using DSP-C and an improved multiplier for PSI in order to obtain higher performance in real arithmetic. First, we show a method to implement real circuits such as the 2D type and the transversal type by using complex arithmetic. When we use this method, a node with two input branches can be calculated by a complex multiplication. However, if we use the original complex multiplier of PSI, such a complex multiplication cannot lead to any performance enhancement. So next we point out the timing problem of the PSI multiplier [10] and propose an improved one suitable for realizing our method. Finally, we confirm that a 2D circuit and a transversal circuit is implemented maximally twice as fast by simulation.

NEGISHI et al: PERFORMANCE ENHANCEMENT ON DIGITAL SIGNAL PROCESSORS

239

2.

The Features of the PSI

A complex number must be divided into two operands on DSP-Rs. One is for the real part and the other is for the complex part. This is because DSP-Rs can not handle one complex number as one operand. When complex additions are performed on DSP-R, the real part and the imaginary part are added separately with a special address modification. In the case of a complex multiplication, four real multiplications and two real addition with special address modification are needed. On the other hand, the PSI has both real computational mode and complex computational mode [10]. On the PSI in complex mode, a complex number can be handled as an operand like a real number on the DSP-R. This shows that both an addition and a multiplication are processed by the same method without special address modification. An address generation is controlled automatically by the address computation unit (ACU). Address modification is realized ±1 in real mode while ±2 in complex mode. There are three types of instruction cycles. One instruction cycle in real mode is equal to one system instruction cycle (160 ns), which is the base cycle of the PSI. One complex mode instruction cycle consists of two system instruction cycles (320 ns). The multiplier is double clocked so that two real multiplications can be completed in a system instruction cycle. One system instruction cycle has two multiplier cycles (80 ns). Every operation for complex arithmetic is completed in a complex mode instruction cycle (320 ns) in complex mode. In the case of complex addition, there is no difference in execution time between real mode (160 ns ×2) and complex mode (320 ns). However in the case of complex multiplication, the execution time in complex mode is twice as fast as in real mode due to the double clocked multiplier. This means complex multiply-accumulate operation can be proceed continuously since the execution time is the same for complex addition and complex multiplication. 3.

Using Complex Arithmetic in Real Circuits

In this section we describe two examples of the method using complex arithmetic to implement real circuits. This method is applied to circuits having nodes with two or more input branches. 3.1 2D Circuit The product of two complex numbers can be represented in the following equation: (XR + jXI ) (YR + jYI ) = (XR YR − XI YI ) + j (XR YI + XI YR )

Fig. 1

2D circuit with real coefficients.

imaginary part of originals. On the other hand, the 2D circuit shown in Fig. 1 is represented by the following equations: y(n) = a0 x(n) + C1 (n − 1) C1 (n) = a1 x(n) − b1 y(n) + C2 (n − 1) C2 (n) = a2 x(n) − b2 y(n)

(2) (3) (4)

where x(n) is an input, y(n) is an output and Ci (n) is a temporary variable. Since Eq. (4) and the real part of Eq. (1) have the same form, Eq. (4) can be calculated as the product of two complex numbers, a2 + jb2 and x(n) + jy(n). Equation (3) can also be obtained in the same way, that is a1 x(n) − b1 y(n) can be calculated as the product of two complex numbers, then the result of Eq. (3) is generated by adding C2 (n − 1) to it. This process can be achieved by a multiply-accumulate operation. Equation (2) can be calculated as the product of a0 + jC1 (n − 1) and x(n) − j. Using this method, a 2D circuit can be implemented in the following steps: i. y(n) ← a0 x(n) + C1 (n) ii. C1 (n + 1) ← Re[(a1 + jb1 ) · (x(n) − jy(n))] iii. C1 (n + 1) ← C1 (n + 1) + C2 (n) C2 (n + 1) ← Re[(a2 + jb2 ) · (x(n) − jy(n))] where Re[ ] is the real part of the product. 3.2 Transversal Circuit A transversal circuit shown in Fig. 2 is represented by the following equation: y(n) =

M

hi x(n − i)

(5)

i=0

If M is odd, Eq. (5) can be rewritten: y(n) = h0 x(n) + h1 x(n − 1) + · · · + hM−1 x(n − M + 1) + hM x(n − M ) = h0 x(n) − H1 x(n − 1) + · · · + hM−1 x(n − M + 1) − HM x(n − M )

(1)

(6)

where XR , YR are the real part and XI , YI are the

where H1 = −h1 , H3 = −h3 , · · · . Equation (6) can


240

Fig. 2

Transversal circuit with real coefficients.

also be calculated as a sum of the real part of a complex product because every term has the same form as Eq. (1) except that coefficients (h1 , h3 , · · ·) should be reversed sign data (H1 , H3 , · · ·). If M is even, Eq. (5) can be rewritten: y(n) = h0 x(n) + h1 x(n − 1) + · · · + hM−2 x(n − M + 2) + hM−1 x(n − M + 1) + hM x(n − M ) (7) Although hM (x) x(n − M ) is left at the end of Eq. (7), it can be calculated by the following method: hM x(n−M ) = hM x(n−M ) + 0 x(n−M −1) (8) In this case, x(n − M − 1) can be any value. Finally, it can be shown that Eq. (5) is calculated by complex multiplications in spite of the value of M . Using this method, the transversal circuit is implemented as the following steps: I. P0 ← (h0 + jH1 )(x(n) + jx(n − 1)) II. Repeat (M-1)/2 times, (i = 1, 2, · · · M-1/2) Pi ← Re[(h2i +jH2i+1 )·(x(n−2i)+jx(n−2i−1))] y(n) ← y(n) + Pi−1 III. y(n) ← y(n) + P(M-1)/ 2

where Re[ ] is the real part of the product. 4.

Timing Problem of the Conventional PSI Multiplier

We discuss the number of system instruction cycles in both 2D and transversal circuits implemented by the conventional way and the proposed one. In the conventional method, which is the implementation on DSP-R (or real mode PSI), N stages of the 2D circuit are completed in 17 + 6(N − 1) system instruction cycles and those of the transversal circuit is done in 11 + (N − 1) system instruction cycles [14]. All instruction cycles are written in α + β · f (N ) format. The first term “α” is related to the cycles for some initializations and operations which are required

by the first stage of the circuit. The coefficient “β” of the latter term shows the minimum cycles for a stage. When the pipeline works fully, it can be completed in the cycles. The total cycles depend on the coefficient “β” when the circuit has multiple stages. That is, the smaller coefficient is better. On the other hand, the proposed way for the 2D circuit shown as step (i)–(iii) requires three complex mode instruction cycles when the pipeline works fully. Considering that a complex mode instruction cycle is twice as long as system instruction cycle, there is no difference in the processing time on the pipeline between the proposed and the conventional methods. However, if each step was completed in a system instruction cycle, higher throughput could be obtained. We can say the same thing about the transversal circuit. Now, according to steps (i)–(iii), just the real part of the product is required. As previously indicated, the multiplier works four times in a complex mode instruction cycle (Table 1). This is because a complex multiplication requires four real multiplications — two for the real part, the others for the imaginary part. As we are not interested in the imaginary part with our method, these operations are wasteful (shade cells in Table 1). If other operations are calculated here, the real part of the product can be obtained in every system instruction cycle (Table 2). We introduce this mechanism as “advanced mode.” To realize this mode, the following instructions at least must be introduced: • a instruction for changing into advanced mode • instructions for “Advanced mode” calculations – advmul *X0, *Y0, dist – advmulc *X0, *Y0, dist Both advmul and advmulc instructions calculate the real part of the product of the operands pointed by X0 and Y0. However, these two instructions are slightly different. advmul handles each operand as (a + jb) where a, b are given operands. advmulc handles the operand pointed by X0 as (a+jb) and the other operand pointed by Y0 as (a − jb ). To illustrate these, suppose that the operands A, B are pointed by X0 and C, D by Y0. The instruction: advmul *X0, *Y0, dist produces the result AC − BD and restores the result to dist, while the instruction: advmulc *X0, *Y0, dist produces AC + BD and restores the result to dist. This is very useful in cases such as 2D circuits and transversal circuits. For example, in the case of a 2D circuit, Eq. (3) and Eq. (4) are calculated by using advmul instructions. Equation (2) is calculated by using a advmulc instruction.


241 Table 1 L-bus, R-bus XR , YR XI , YI XR , YR XI , YI real OP

Timing of the conventional multiplier in complex mode.

four stage pipelined Booth algorithm Level 1 Level 2 Level 3 XR.YR XI.YI XR.YR XI.YR XI.YI XR.YR XR.YI XI.YR XI.YI XR .YR XR.YI XI.YR XI .YI XR .YR XR.YI XI .YR XI .YI XR .YR XR .YI XI .YR XI .YI XR .YI XI .YR XR .YI

imag OP

Table 2 L-bus, R-bus xR, yR

xI, yI

XR , YR

XI , YI

xR , yR

xI , yI

real OP

imag OP

XR.YR XI.YI XI.YR XR.YI XR .YR XI .YI XI .YR XR .YI

output RP

IP

XR.YR−XI.YI XR.YR−XI.YI XR.YR−XI.YI XR.YR−XI.YI XR .YR −XI .YI XR .YR −XI .YI XR .YR −XI .YI

XI.YR+XR.YI XI.YR+XR.YI XI.YR+XR.YI XI.YR+XR.YI XI .YR +XR .YI

Timing of the proposed multiplier in advanced mode.

four stage pipelined Booth algorithm Level 1 Level 2 Level 3 XR.YR XI.YI XR.YR xR.yR XI.YI XR.YR xI.yI xR.yR XI.YI XR .YR xI.yI xR.yR XI .YI XR .YR xI.yI xR .yR XI .YI XR .YR xI .yI xR .yR XI .YI xI .yI xR .yR xI .yI

In the case of a transversal circuit, Eq. (6) can be calculated using only advmulc instructions and there is no need to store the reverse signed coefficients as previously described. 5.

Multiplier Level 4

Restructuring the Multiplier

The advanced mode can be realized by the arithmetic unit using the Redundant Complex Number System [11], [12]. However, the unit must be reconfigured among the computational modes such as the real mode, the complex mode and the advanced mode. Because of the reconfiguration, the unit requires its own Input/Output data format for each mode. This implies that we need to change the operand data format in each computational mode in the programs. To avoid this, we must introduce additional circuits. The arithmetic unit based on Booth’s algorithm is free from such problems. This is a major reason why we adopt the unit based on Booth’s algorithm. However, the advanced mode cannot be realized by the conventional multiplier due to the lack of bus bandwidth. There are two solutions for this problem. One is to expand the L-bus and the R-bus to 32 bit. As a by-product, this solution improve the precision of real computation. However all components such as registers, RAMs, multiplier and ALU must be expanded to 32 bit. The other solution is to introduce other dedi-

Multiplier Level 4

XR.YR XI.YI xR.yR xI.yI XR .YR XI .YI xR .yR xI .yI

output RP

IP

XR.YR−XI.YI XR.YR−XI.YI XR.YR−XI.YI XR.YR−XI.YI XR .YR −XI .YI XR .YR −XI .YI XR .YR −XI .YI

xR.yR + xI.yI xR.yR + xI.yI xR.yR + xI.yI xR.yR + xI.yI xR .yR + xI .yI

cated 16bit busses into the multiplier. There is no need to change conventional components except a multiplier and RAMs in this solution. We adopt the latter solution and propose a new multiplier shown in Fig. 3. It is based on a four stage pipelined Booth’s algorithm multiplier like the conventional one. Two dedicated 16 bit busses from XRAM and YRAM are introduced. They are named advanced bus and only work in advanced mode. While only two operands can be fetched in a system instruction cycle by the conventional multiplier, the proposed one fetches real operands through the conventional busses (L-bus and R-bus) and imaginary operands through advanced busses simultaneously. Figure 4 shows a reconstructed PSI architecture with the proposed multiplier. When PSI goes into advanced mode, ACU manipulates ±2 for an address register. The data pointed by *X0 is sent to XR through L-bus while the data pointed by *(X0+1) is sent to XI through advanced bus. The result is stored in dist. The instructions, advmul and advmulc, are realized easily because the multiplier has a dedicated adder/subtracter of its own. Table 2 shows the process to produce the result (AC + BD). First, AC and BD are made. Next, these two values are added or subtracted by the adder/subtracter shown in Fig. 3.


242

Fig. 3

Fig. 4

Proposed multiplier.

Proposed PSI structure.


243

Fig. 5

6.

Programs for the simulation.

Simulation

6.1 Simulator Specifications We develop an architecture level behavior simulator for PSI to evaluate the effectiveness of advanced mode. It calculates system instruction cycles of a bench mark program detecting pipeline and resource conflicts. The following instructions are implemented in the simulator: • Syntax: store src , dist Operation: src → dist Operands: src = any register/indirect, dist = any register/indirect System cycles: 1 • Syntax: add src1 , src2 , dist Operation: src1 + src2 → dist Operands: src1, src2 = any register/indirect, dist = any register/indirect System cycles: 1 • Syntax: cm mode Operation: Change computational Mode Operands: mode = real, complex, advanced System cycles: 1 • Syntax: advmulc *X, *Y, dist Operation: AC + BD Operands: *X = A, *(X+1) = B, *Y = C,

*(Y+1) = D System cycles: 1 + 4 latency • Syntax: advmul *X, *Y, dist Operation: AC − BD Operands: *X = A, *(X+1) = B, *Y = C, *(Y+1) = D System cycles: 1 + 4 latency 6.2 Results We simulate a 2D circuit and a transversal circuit on both the original and the proposed PSI with the architecture level behavior simulator. In this simulation, the 2D circuit has 5 stages and the transversal circuit has 100 stages. In our simulation of the original PSI, both circuits are examined in the complex mode by using complex multiplications. The major parts of the benchmark programs are shown in Fig. 5. The programs for the PSI in the advanced mode are written in TMS320C30-like operation code format. Table 3 shows that both DSP-R and the original PSI implement the transversal circuit in (11 + 99) system instruction cycles. This implies that no performance enhancement is achieved by the original PSI structure. On the other hand, the proposed PSI completes the transversal circuit in (11 + 49) system instruction cycles. Hence, the performance is 1.83 times faster. As previously indi-


244 Table 3

Simulation results.

architecture

transversal

2D

DSP-R

11 + 99

17 + 6 × 4

orig. PSI

11 + 99

17 + 6 × 4

proposal

11 + 49

17 + 3 × 4

cated, the total system instruction cycles can be represented “α + β · f (N )” format. In this case, the initial cycles “α” is 11 and the pipeline cycles “β” is 0.5. As the total cycles depend on β, if the number of circuit stages (N) is large enough, we can obtain maximally twice faster performance. In the case of 2D circuits, DSP-R and the original PSI spend (17+6×4) system instruction cycles (β = 6), while the proposed PSI spends (17 + 3 × 4) system instruction cycles (β = 3). In the case of DSP-R, β is defined by the lines 10-21 of the “IIR 2D (TMS32C030)” program shown in Fig. 5. In the case of advanced mode, the lines 17-23 of the “IIR 2D (PSI in Advanced mode)” program shown in Fig. 5 are equivalent to these and they defines β. As β is improved from 6 to 3, we can obtain twice faster performance if N is large enough. However, in this case, we obtain only 1.5 times faster performance because N is 5 and not so large. If the circuits like wave digital filters are performed in the advanced mode, we can not obtain higher performance because it has many adder nodes compared to multiplier nodes. To obtain higher performance in advanced mode, it is necessary that the circuits have adder nodes with two input branches from multipliers instead of simple adder nodes. 7.

Conclusion

In this paper, we have proposed a new implementation method of real circuits using DSP-C. We have introduced advanced mode and advanced bus into PSI. As a result, on the pipeline, 2D circuits and transversal circuits can be processed twice as fast as the conventional method. However, the initial cycles prevent us from obtaining higher performance when a circuit does not have so many stages. As the initial cycle depends on the number of pipeline stages of the multiplier, reducing the pipeline stages would be one solution. This remains to be studied. References [1] S. Morikawa, K. Okada, S. Takeuchi, and I. Shirakawa, “A design of high-performance FIR filter for digital video transmission,” The 9th Workshop on Circuits and Syst. in Karuizawa, pp.359–364, April 1996. [2] K. Nakamura, K. Sakai, and T. Ae, “Multimedia data processing using VLIW hardware stack processor,” IEICE Trans., vol.J81-D-I, no.1, pp.21–27, Jan. 1998. [3] T. Ae, K. Nishimura, R. Aibara, K. Sakai, and K.

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14]

Nakamura, “Real-time multimedia network system using VLIW hardware stack processor,” Proc. IEEE Workshop on Parallel and Distributed Real-Time Systems, Santa Barbara, USA, pp84–89, April 1995. K. Toriumi, K. Otani, K. Nakamura, K. Sakai, and T. Ae, “A performance estimation of the VLIW processor for a multimedia data processing,” IEICE Technical Report, CAS97-32, June 1997. R. Leupers and P. Marwedel, “Time-constrained code compaction for DSP’s,” IEEE Trans. VLSI Syst., vol.5, no.1, pp.112–122, March 1997. M. Yoshida, K. Yoshii, H. Matsumoto, J. Onuki, and M. Sone, “A simulation for structure decision on multiDSP system considering data transfer time,” IEICE Trans., vol.J80-D-I, no.1, pp.11–20, Jan. 1997. L.B. Jackson, “Roundoff-noise analysis for fixed-point digital filters realized in cascade or parallel form,” ibid., vol.AU18, no.2, pp.107–122, 1970. T. Abiko, M. Kawamata, and T. Higuchi, “An algorithm for solving inverse problems of fractal images using the complex moment method,” IEICE Soc. Conf. ’97, SA-4-3, pp.217– 218, Sept. 1997. N. Murakoshi, E. Watanabe, and A. Nishihara, “A synthesis of low-sensitivity second-order real digital filters using complex two-port dapters,” IEICE Trans., vol.J75-A, no.5, pp.889–895, May 1992. B. Barazesh, J.C. Michalina, and A. Picco, “A VLSI signal processor with complex arithmetic capability,” IEEE Trans. Circuits & Syst., vol.35, no.5, pp.495–505, May 1988. T. Aoki, H. Amada, and T. Higuchi “Design of real/ complex reconfigurable arithmetic circuits using redundant complex number systems,” IEICE Trans., vol.J80-D-I, no.8, pp.674–682, Aug. 1997. Y. Harata, Y. Nakamura, H. Nagase, M. Takigawa, and N. Takagi, “A high-speed multiplier using a redundant binary adder tree,” IEEE J. Solid-State Circuits., vol.SC-22, no.1, pp.28–33, Feb. 1987. V.K. Madisetti, “VLSI Digital Signal Processors,” IEEE Press, 1995. Texas Instruments, “TMS320C3x User’s manual,” 1996.

Yoshimasa Negishi recieved B.E. and M.E. degrees in electronic information systems from Shibaura Institute of Technology in 1995 and 1997, respectively. His research interest is in VLSI digital signal processing.


245 Eiji Watanabe was born in Ehime Prefecture, Japan on August 20, 1958. He received the B.E. and M.E. degrees in radio communication from the University of Electro-Communications in Chofu-shi, Japan in 1981 and 1983, respectively. He received the Dr.E. degree in physical electronics from Tokyo Institute of Technology in Tokyo, Japan in 1986. From 1986 to 1991 he was a research associate in the Department of Information Processing of Tokyo Institute Technology in Yokohama-shi, Japan. From 1991 to 1995 he was a lecturer in the Department of Electronic Information Systems at Shibaura Institute of Technology in Ohmiyashi, Japan. He is currently an associate professor in the same department. His research interests are in circuit theory and digital signal processing. Dr. Watanabe is a member of IEEE.

Akinori Nishihara received the B.E., M.E. and Dr.Eng. degrees in electronics from Tokyo Institute of Technology in 1973, 1975 and 1978, respectively. Since 1978 he has been with Tokyo Institute of Technology, where he is now Professor of the Center for Research and Development of Educational Technology. His main research interests are in one- and multi-dimensional signal processing, and its application to educational technology. He is now serving as Editor-in-Chief of the Transactions of IEICE (A) and Treasurer of IEEE Region 10 (Asia Pacific Region). Dr. Nishihara is a member of IEEE, EURASIP, ECS and JET.

Takeshi Yanagisawa was born in Nagasaki Prefecture, Japan, on March 14, 1931. He received B.E, M.E. and Dr.Eng. degrees in electronic engineering from Tokyo Institute of Technology (T.I.T.), Tokyo 1953, 1955, and 1958, respectively. He joined T.I.T. in 1958 as a Research Associate. He was a Professor at T.I.T. from 1970 to 1991. He is currently a Professor at Shibaura Institute of Technology. His research interests are in the fields of circuit theory, electronic circuits, active filters, and digital signal processing. He is a recipient of 1986 Achievement Award of the IEICE of Japan.

Performance Enhancement on Digital Signal Processors with

Performance Enhancement on Digital Signal Processors with

Suggest Documents

PDF Digital Signal Processors: Architectures ... - Google Sites

PDF Digital Signal Processors: Architectures ... - Google Sites

CHAOTIC COMMUNICATION USING DIGITAL SIGNAL PROCESSORS

[PDF] Digital Signal Processors: Architectures ... - Google Sites

Programmable Digital Signal Processors Architecture, Programming

CHAOTIC COMMUNICATION USING DIGITAL SIGNAL PROCESSORS

Programmable Digital Signal Processors Architecture, Programming

Acoustics Timbre Enhancement of Guitar Pickup Signal with Digital ...

Floating-to-Fixed-Point Conversion for Digital Signal Processors

A Novel Register Organization for VLIW Digital Signal Processors

Floating-to-Fixed-Point Conversion for Digital Signal Processors

Exploiting Dual Data-Memory Banks in Digital Signal Processors

Book Online Digital Signal Processors: Architectures ... - Google Sites

Reverse Compilation for Digital Signal Processors - IEEE Computer ...

Performance of digital radiography with enhancement filters ... - SciELO

Digital Signal Processing with FPGAs

Performance Scalability on Embedded Many-Core Processors

The influence of signal quantization on the performance of digital ...

EMGdi signal enhancement based on ICA

Signal Processing for Digital Image Enhancement Considering APL in ...

Biometric speech signal processing in a system with digital signal

Speed Scaling on Parallel Processors with Migration

System performance enhancement with pre ... - OSA Publishing

Performance Enhancement of Microturbine Engines Topped With ...