Fully-Programmable Computing Architecture for ...

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

TITB-00200-2007

1

Fully-Programmable Computing Architecture for Medical Ultrasound Machines Fabio Kurt Schneider, Member, IEEE, Anup Agarwal, Member, IEEE, Yang Mo Yoo, Member, IEEE, Tetsuya Fukuoka, and Yongmin Kim, Fellow, IEEE.  Abstract— Application-specific integrated circuits (ASICs) have been traditionally used to support the high computational and data rate requirements in medical ultrasound systems, particularly in receive beamforming. Utilizing the previously-developed efficient front-end algorithms, we present in this paper a simple programmable computing architecture, consisting of a field-programmable gate array (FPGA) and a digital signal processor (DSP), to support core ultrasound signal processing. It was found that 97.3% and 51.8% of the FPGA and DSP resources are needed, respectively, to support all the front-end and back-end processing for B-mode imaging with 64 channels and 120 scanlines per frame at 30 frames per second. These results indicate that this programmable architecture can meet the requirements of low- and medium-level ultrasound machines while providing a flexible platform for supporting the development and deployment of new algorithms and emerging clinical applications. Index Terms—Ultrasound systems, Biomedical imaging.

I. INTRODUCTION

T

meet the high data rate and demanding computational requirements in ultrasound imaging, application-specific integrated circuits (ASICs) have been typically used in medical ultrasound machines. With the recent technological advances in digital signal processors (DSPs) and multi-core processors, programmable computing architectures have been introduced and are now commonly used to support back-end processing in O

Manuscript received August 2007; accepted June 4, 2009. F. K. Schneider was with the Department of Electrical Engineering, University of Washington, Seattle, WA 98195-5061 USA. He is now with the Department of Electronics and the Grad School of Electrical Engineering and Applied Computer Science, Federal University of Technology - Paraná, Brazil (e-mail: [email protected]). A. Agarwal was with the Department of Electrical Engineering, University of Washington, Seattle, WA 98195-5061 USA. He is now at Philips Healthcare, Bothell, WA 98021 (e-mail: [email protected]). Y. Yoo was with the Department of Bioengineering, University of Washington, Seattle, WA 98195-5061 USA. He is now with the Department of Electronic Engineering and the Interdisciplinary Program of Integrated Biotechnology, Sogang University, Seoul, Korea (email: [email protected]). T. Fukuoka was with the Department of Electrical Engineering, University of Washington, Seattle, WA 98195-5061 USA. He is now with Hitachi Ltd., Tokyo, Japan (e-mail: [email protected]). Y. Kim is with the Department of Bioengineering, University of Washington, Seattle, WA 98195-5061 USA (e-mail: [email protected]).

commercial ultrasound systems [1-3]. However, ASICs are still used in many ultrasound systems to support more challenging front-end processing. Several approaches were proposed and evaluated for supporting front-end processing with programmable off-the-shelf devices [3-6]. For example, Hazard and Lockwood [3] proposed a fully-programmable architecture for ultrasound machines by utilizing a network of programmable processors. However, their architecture required 132 DSP chips, making it impractical for commercial systems. On the other hand, Pelissier [4] utilized high-end field programmable gate arrays (FPGAs) for front-end processing in a PC-based architecture. While these FPGAs can deliver high computational performance, they are currently 5 to 10 times more expensive than high-end DSPs. Tomov’s proposal [6] on a beamformer architecture based on sparse sampling (i.e., 512 points) and 1-bit oversampled A/D converters utilizes only one low-cost FPGA. However, developing sigma-delta modulators that can provide the required sensitivity (in terms of signal-to-quantization noise ratio) for supporting various medical ultrasound imaging modes remains challenging [7]. As an alternative to Tomov’s work, the goal of our research has been to develop a cost-efficient fully-programmable architecture based on conventional multi-bit A/D converters. We have previously developed new front-end algorithms, e.g., multi-stage uniform coefficient (MSUC) filter [8] and two-stage demodulation (TSD) [7], to reduce the computational and data rate requirements. In this paper, we present a hybrid cost-efficient programmable computing architecture for ultrasound machines.

II. METHODS AND MATERIALS In this section, we present the computational and data rate requirements of a typical medical ultrasound imaging system. In addition, a simple programmable architecture that can support both front-end and back-end processing is presented. A. Computational and Data Rate Requirements in an Ultrasound System Figure 1 shows the functional block diagram of a modern ultrasound imaging system based on the commonly-used phase rotation beamformer (PRBF). The received ultrasound echoes

Copyright © 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. Authorized licensed use limited to: Sogang University Loyola Library. Downloaded on February 3, 2010 at 03:48 from IEEE Xplore. Restrictions apply.


TITB-00200-2007

2 Front-end Transmitter

LNA LNA +LNA ADC ADC + ADC + TGCTGC ADC TGC TGC

Transducer

TGC V1

Front-end Demodulation

Phase rotation beamforming

Mixer

I filter

Delay

Mixer

Q filter

Delay Phase compensation and summation

Mixer

I filter

Delay

Mixer

Q filter

Delay

To back-end

ADC

Vv V2

Vectors

B-mode Envelope detection

From front-end

Log compression

Image enhancement

Color Doppler

Clutter filter

Power estimation Velocity estimation

Scan conversion Image enhancement

V V 1 V2 1 V2

Vv

Vv

Back-end

Fig. 1. Block diagram of a typical ultrasound system.

are first amplified in proportion to depth in order to compensate for signal attenuation (i.e., time-gain compensation, TGC). After TGC, these echo signals are digitized by analog-to-digital converters (ADCs) for digital processing. Quadrature demodulation (QD), which consists of mixing and lowpass filtering, is utilized to shift the RF data originally centered at the transducer center frequency (f0) into complex baseband data. The signal-to-noise ratio (SNR) and spatial resolution are improved in receive beamforming through the coherent summation (i.e., delay, phase rotation and summation) of complex baseband data. Subsequently, various back-end modules are applied to the demodulated data. For example, B-mode processing is performed to generate the gray-scale image that shows anatomic information while color Doppler produces blood flow information. The scan conversion maps the data from all modes to be displayed on a raster monitor. Table 1 summarizes the computational and data transfer rate requirements for a B-mode imaging system with 64 channels, 120 scanlines per frame, 4096 ADC samples per channel, 1024 samples per scanline, and 30 frames per second with 600  420 output display. As listed in Table 1, demodulation filtering dominates the computational requirements (i.e., 34 billion operations per second (BOPS) out of a total 42.1 BOPS). A symmetric 16-tap directly-implemented FIR (DI-FIR) filter is assumed for this analysis [8]. Furthermore, very high data rates, 2.85 Gbytes/s, are needed during demodulation and phase rotation beamforming compared to B-mode processing (90 Mbytes/s). These computational and data transfer requirements are still challenging even with the advances in modern DSPs. For example, supporting B-mode back-end processing alone would require all the resources of a commercially-available DSP [1]. Thus, it is critical to reduce the computational requirements of front-end processing. To reduce the computational requirements of demodulation

filtering in PRBF, we previously developed a MSUC filter [8] where uniform coefficients are utilized in each of the filter stages so that the multiplications can be replaced by additions. For example, for a 5-stage MSUC filter, only 12 operations (2 additions per stage, 1 load in the first stage, and 1 store in the last stage) are needed compared to 48 operations in DI-FIR, enabling a significant reduction in computing [8]. TABLE I COMPUTATIONAL AND DATA TRANSFER RATE REQUIREMENTS OF AN ULTRASOUND IMAGING SYSTEM WITH 64 CHANNELS, 120 SCANLINES PER FRAME, 4096 ADC SAMPLES PER CHANNEL, 1024 SAMPLES PER SCANLINE, AND 30 FRAMES PER SECOND. Computational Data transfer rate requirement Function requirement (Billion (Mbytes/s) operations/s) Mixing 2.5 2,123 Demodulation 34.0 filtering Phase compensation 4.2 730 and summation Back-end processing 1.4 90 Total 42.1 2,944

A. Proposed System Architecture Figure 2 shows the block diagram of the proposed programmable architecture where ultrasound signal processing is divided into two stages. The first stage, which requires a high data rate, handles demodulation and dynamic focusing delay. In the second stage, phase compensation and summation and subsequent back-end processing are performed at a lower data rate. In the first stage, low voltage differential signaling (LVDS) is used to transfer data into the FPGA to reduce the power consumption and number of pins. The pre-beamforming module selects the appropriate data samples based on dynamic focusing delay. Since computing the sample addresses

Authorized licensed use limited to: Sogang University Loyola Library. Downloaded on February 3, 2010 at 03:48 from IEEE Xplore. Restrictions apply.


TITB-00200-2007

3

on-the-fly either is very demanding or requires approximations that lower the time delay accuracy [6], lookup tables (LUTs) are utilized to reduce the computational burden without sacrificing image quality. Low-cost FPGA

DSP

1st to 2nd-stage interface

Pre-beamforming

ADC

MSUC filter-based demodulation

ADC ADC

ADC to 1st-stage interface

Sample selection LUT

Phase compensation LUT

Phase compensation and summation

TABLE III EXECUTION TIME IN TERMS OF MILLIONS OF CYCLES PER FRAME IN A DSP.

Back-end

System memory

the unnecessary data transfers to and from the DSP, multiple tasks were integrated together (e.g., phase compensation and summation, envelope detection, range compression and axial filtering). As summarized in Table 3, 17.29 Mcycles per frame are needed to support the various processing tasks. Since the DSP runs at 1 GHz, this results in DSP’s resource utilization of 51.9% to support 30 frames per second. Alternately, about 57 frames per second could be achieved if all the computing resources are utilized.

Function

Fig. 2. Block diagram of a fully programmable architecture.

B. Experimental Setup The proposed architecture has been evaluated on a DSP starter kit (TMS320C6416 1 GHz DSK, Spectrum Digital, Stafford, TX) and an FPGA board (Cyclone II DSP development board, Altera, San Jose, CA), communicating through a 32-bit connector. The FPGA usage is summarized in terms of the percent use of logic elements (LEs) and 4096-bit memory blocks (M4k), while the DSP usage is expressed in terms of the number of clock cycles per frame [1]. III. RESULTS AND DISCUSSION The FPGA resource usage is summarized in Table 2. To support the first stage processing, 88.6% of LE and 97.4% of M4k were used. As listed in Table 2, demodulation filtering dominates LE usage in the FPGA, while coarse delay selection and FPGA-DSP interfacing dominate the memory utilization. It should be noted that these FPGA numbers do not depend on the mode (B-mode or color Doppler) and/or other system parameters (e.g., the number of frames per second), but only on the number of receive channels. TABLE II EP2C35 FPGA RESOURCE USAGE. Function ADC-to-FPGA stage interface Demodulation Pre-beamforming FPGA-to-DSP interface Total

LES (%)

M4k (%)

3.6 69.4 15.2 0.2 88.4

--2.0 64.8 30.5 97.3

The data transfer and computational burden to support various modules in DSP is summarized in Table 3. To eliminate

Phase compensation and summation Envelope detection Dynamic range compression Axial filtering Data transposition Lateral filtering Scan conversion Total execution time

15.63

0.39 1.27 17.29

For a given number of scanlines per frame, the maximum frame rate that can be displayed depends on the acquisition frame rate (facq) in addition to the maximum DSP processing frame rate (fpro) [1]. Figure 3 shows the maximum acquisition and processing frame rates as a function of the number of scanlines per frame when the imaging depth is 150 mm. As seen in Fig. 3, the maximum frame rate is limited by the acquisition frame rate. Thus, the developed architecture can meet the requirements of a 64-channel B-mode ultrasound system when the imaging depth is 150 mm. To support the processing for color Doppler imaging with a region of interest (ROI) with 90 scanlines and an ensemble size of 10, beamforming requires additional 61.8 Mcycles on the DSP, while the back-end processing requires an additional 5.3 Mcycles. Therefore, a total of 84.4 (61.8 + 5.3 + 17.3) Mcycles are needed for processing one color Doppler frame, resulting in a maximum of 11.8 fps for this configuration, which is higher than the acquisition frame rate. 120 Processing Acquisition

100

Maximum frame rate

The phase compensation and summation is the most demanding task performed in the second stage. While the computation requirement (e.g., multiplications and additions) is high, the multiply and accumulate (MAC) or inner-product instructions available in modern DSPs enable an efficient implementation of these operations. For B-mode imaging, envelope detection, dynamic range compression, image enhancement filtering (i.e., axial and lateral filtering), and scan conversion are performed as discussed by Sikdar et al. [1].

LES (%)

80 60 40 20 0 60

120

180 240 300 Scanlines per frame

360

Fig. 3. The maximum acquisition and processing frame rates as a function of scanlines per frame when the imaging depth is 150 mm. Thus, we have demonstrated that the developed architecture can meet the requirements of both B-mode and color Doppler



TITB-00200-2007

4

imaging for a 64-channel system. We believe that this architecture could help in cost reduction and miniaturization of current portable ultrasound machines. This kind of low-cost handheld ultrasound machines could facilitate the adoption of ultrasound from the current settings (e.g., radiology, cardiology and OB/GYN) to more decentralized settings (e.g., patient’s bedside in hospitals, general practitioners’ offices, skilled nursing facilities, and patient home) [9, 10]. Finally, the miniaturization of portable ultrasound machines can help realize the vision of “ultrasound stethoscope” [11]. This ultrasound stethoscope could fit in the pockets of clinicians and substantially increase the availability of medical ultrasound in the primary care setting.

standard echocardiographic instrument," Chest, vol. 94, pp. 270-274, 1988.

IV. CONCLUSION We have developed a simple programmable computing architecture for ultrasound machines. In the developed architecture, a low-cost FPGA and a DSP are utilized to support all the front-end and back-end processing needed for B-mode and color Doppler imaging. From the feasibility study, we have found that the developed architecture can support various system configurations. This architecture can lead to the miniaturization and cost reduction of ultrasound machines for use in more diverse healthcare settings. Finally, this architecture will continue to benefit from the advances in semiconductor electronics and algorithms enabling even further miniaturization and sophistication.

REFERENCES [1]

S. Sikdar, R. Managuli, L. Gong, V. Shamdasani, T. Mitake, T. Hayashi, and Y. Kim, “A single mediaprocessor-based programmable ultrasound system,” IEEE Trans. Inform. Technol. Biomed., vol. 7, pp. 64-70, 2003. [2] U. Bae, M. Dighe, T. Dubinsky, S. Minoshima, V. Shamdasani, and Y. Kim, “Ultrasound thyroid elastography using carotid artery pulsation: preliminary study,” J. Ultrasound Med., vol. 26, pp. 797-805, 2007. [3] C. R Hazard and G. R. Lockwood, “Theoretical assessment of a synthetic aperture beamformer for real-time 3-D imaging,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 46, pp. 972-980, 1999. [4] L. Pelissier, “Ultrasound imaging system,” U.S. Patent No. 6,558,326, 2002. [5] L. Y. L. Mo, J. Ting-Lan, C. Ching-Hua, D. Napolitano, G. W. McLaughlin, and D. DeBusschere, “Zone-based color-flow imaging,” in Proc. IEEE Ultrasonics Symposium, pp. 29-32, 2003. [6] B. G. Tomov and J. A. Jensen, “Compact FPGA-based beamformer using oversampled 1-bit A/D converters,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 52, pp. 870-880, 2005. [7] A. Agarwal, Y. M. Yoo, F. K. Schneider, C. Gao, L. M. Koh, and Y. Kim, “New demodulation method for efficient phase rotation beamforming,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 54, pp. 1656-1668, 2007. [8] F. K. Schneider, Y. M. Yoo, A. Agarwal, L. M. Koh, and Y. Kim, “New demodulation filter in digital phase rotation beamforming,” Ultrasonics, vol. 44, pp. 265-271, 2006. [9] E. H. Kim, J. J. Kim, F. A. Matsen III, and Y. Kim, “Distributed diagnosis and home healthcare (D2H2) and patient-centered electronic medical record,” Proc. of 1st Int. Bioeng. Conf., Singapore, pp. 461-468, 2004. [10] C. M. Ligtvoet, H. Rijsterborgh, L. Kappen, and N. Bom, “Real time ultrasound imaging with a hand-held scanner. Part I - technical description,” Ultrasound Med. Biol., vol. 4, pp. 91-92, 1978. [11] F. Xie, M. S. Breese, M. Nanna, G. S. Lichtenberg, M. N. Allen, and R. Meltzer, "Blinded comparison of an "ultrasound stethoscope" and


Fully-Programmable Computing Architecture for ...

Fully-Programmable Computing Architecture for ...

Suggest Documents

Mobile Edge Computing Architecture

cloud computing architecture

Memristive Reservoir Computing Architecture for ...

Geometric computing for freeform architecture - Journal of ...

An Open Architecture for Palpable Computing - Palcom

Hierarchical Fog-assisted Computing Architecture for ...

Broker Architecture for Collaborative UAVs Cloud Computing

Reconfigurable Service-Oriented Architecture for Autonomic Computing

anew system architecture for pervasive computing - arXiv

a reconfigurable computing architecture for implementing artificial ...

Cloud computing architecture for technical ...

An Autonomic Computing-based Architecture for ...

Cloud computing architecture for collaborative ...

A Performance Counter Architecture for Computing ... - CiteSeerX

Enterprise Architecture Frameworks for Enabeling Cloud Computing

Vision, Issues, and Architecture for Nomadic Computing

Scalable and Systolic Architecture for Computing Double ...

Autonomic Computing Architecture for Business Applications

architecture support for approximate computing - Cornell Computer ...

Enterprise Architecture Frameworks for Enabeling Cloud Computing

A COMPUTING ARCHITECTURE FOR CORRECTING PERSPECTIVE

A Stream Redirection Architecture for Pervasive Computing ...

Protocol Architecture for Universal Personal Computing

Distributed Data Management Architecture for Embedded Computing