Spatial Resolution Enhancement of Ultrasound Images ... - IEEE Xplore

3 downloads 0 Views 3MB Size Report
Abstract—Spatial resolution in modern ultrasound imag- ing systems is limited by the high cost of large aperture transducer arrays, which require a large number ...
ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002

1039

Spatial Resolution Enhancement of Ultrasound Images Using Neural Networks Riccardo Carotenuto, Member, IEEE, Gabriele Sabbi, and Massimo Pappalardo, Member, IEEE Abstract—Spatial resolution in modern ultrasound imaging systems is limited by the high cost of large aperture transducer arrays, which require a large number of transducer elements and electronic channels. A new technique to enhance the spatial resolution of pulse-echo imaging systems is presented. The method attempts to build an image that could be obtained with a transducer array aperture larger than that physically available. We consider two images of the same object obtained with two different apertures, the full aperture and a subaperture, of the same transducer. A suitable artificial neural network (ANN) is trained to reproduce the relationship between the image obtained with the transducer full aperture and the image obtained with a subaperture. The inputs of the neural network are portions of the image obtained with the subaperture (low resolution image), and the target outputs are the corresponding portions of the image produced by the full aperture (high resolution image). After the network is trained, it can produce images with almost the same resolution of the full aperture transducer, but using a reduced number of real transducer elements. All computations are carried out on envelope-detected decimated images; for this reason, the computational cost is low and the method is suitable for real-time applications. The proposed method was applied to experimental data obtained with the ultrasound synthetic aperture focusing technique (SAFT), giving quite promising results. Realtime implementation on a modern, full-digital echographic system is currently being developed.

I. Introduction he resolution of the image generated by an echographic B-scan system is severely limited by the finite aperture and the finite overall bandwidth of the ultrasonic transducer array used. Assuming space invariance and linearity, the resolution capabilities of the system can be expressed in terms of the point spread function (PSF) (i.e., the image of a point reflector) by the following relation:

T

m(x, z) = h(x, z)∗∗ f (x, z) + n(x, z),

(1)

where f (x, z) is the spatial reflectance distribution of internal organs of the human body to be imaged, m(x, z) is the degraded echographic image of the object f (x, z), h(x, z) is the blurring degradation function (i.e., the PSF), which accounts for the finite aperture and bandwidth of Manuscript received September 10, 2001; accepted February 11, 2002. The authors are with the Dipartimento di Ingegneria Elettronica, Universit` a degli Studi Roma Tre, Via della Vasca Navale, 84,00146 Roma, Italy (e-mail: [email protected]).

the transducer, and n(x, z) accounts for both the additive electronic noise and the physical effects not included in the convolution model. In the above equation the symbol ∗∗ denotes the convolution operator with respect to the variables x and z, representing the lateral and the axial coordinates. The z axis is related to the two-way transit time t = 2z/c, where c is the average sound velocity in the human body [1], [2]. Over the last years, several attempts have been made to improve a posteriori the image resolution by means of the Fourier-based digitally implemented deconvolution approach [3]–[6]. This restoration technique applies the inverse process to the image m(x, z) to retrieve the original information f (x, z). Two main problems arise with this inverse technique. The first is to have a sufficiently good and reliable knowledge of the PSF, which depends not only on the geometrical characteristics of the transducer array, but also on the tissue under investigation. The second is related to the mathematical operation involved in Fourierbased deconvolution, which requires the ratio between the transformed image and the transformed PSF. Because this operation involves large amplification factors for frequencies in which the transformed PSF approaches zero, small deviations from the assumed model can create very large errors and greatly amplify the noise. Recently, many authors have developed the idea of estimating the PSF on-line during the scanning process; the so-called blind deconvolution performs the estimation of the PSF and the deconvolution at the same time both in lateral and axial direction [7]–[12]. In general, deconvolution methods are analytic and based on complex numerical algorithms that deal with the impulse response of the imaging system. However, these techniques require large computational effort and must be improved and fully understood for practical application. In past years, ANNs have been extensively used in image processing. Traditionally, ANNs have been applied in image compression or image restoration. Several reported applications are direct replacements of conventional image processing techniques such as eigenvalue extraction, vector quantization, two-dimensional (2-D) filtering [13]– [17]. Other approaches treat images as 2-D surfaces instead of considering them as a series of individual pixels [18]. The ANNs also have been extensively used for quantitative analysis in medical imaging [19]. In this paper, resolution enhancement is approached from a different point of view. We do not attempt to clean the image by removing the blurring effect of the PSF, but we try to produce an image that would be obtained with

c 2002 IEEE 0885–3010/$10.00 

1040

ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002

a better PSF, i.e., with a larger transducer aperture. We try to build the mapping from a blurred low resolution image to a high resolution image, assuming that the main component of the blurring is due to the transducer aperture limit, i.e., to the echographic imaging system, and not to the tissue under investigation. The proposed method works in the time domain in a nonlinear way because it is applied to, after decimation, the envelope-detected signals. The effectiveness of the proposed nonlinear processing is demonstrated for a typical medical echographic phasedarray system by means of experimental data, obtaining a significant improvement of the spatial resolution. Section II provides an explanation of the operating principle of the proposed method. Section III briefly introduces the neural networks used. In Section IV, experimental results are presented.

II. Direct Mapping for Resolution Enhancement The luminance of each pixel is a complicated function of many different factors: the reflectance of the corresponding region of the explored field, the attenuation suffered by the acoustic pulse in the round trip, the amplification of the system and the bandwidth of the transducer array, and the effective shape assumed by the PSF in the examined region. The PSF depends on many factors, including the tissues under investigation, but it mainly depends on the array aperture, i.e., on the number N of the active elements. However, the final resolution of the image depends on the PSF shape (i.e., on the number of elements), the higher the number of active elements, the higher the image resolution. The basic idea of the proposed method to enhance image resolution is to identify and reproduce the underlying nonlinear mapping existing between a low resolution image obtained with a small array of N elements and a high resolution image obtained with a larger array of N  (with N  > N ) elements. We can write: m1 (x, z) = h1 (x, z)∗∗ f (x, z) + n1 (x, z) m2 (x, z) = h2 (x, z)∗∗ f (x, z) + n2 (x, z),

(2)

where m1 is the low resolution radio frequency (RF) image obtained with a wide PSF h1 , m2 is the high resolution RF image obtained with a narrow PSF h2 ; n1 and n2 are additive noise. Using the Fourier transform we obtain: M1 = H1 F + N1 M2 = H2 F + N2 M2 = H2 H1−1 M1 − H2 H1−1 N1 + N2 = K M1 + NT ,(3) where K = H2 H1−1 is the desired relationship between the low resolution image and the high resolution image, and NT = K N1 + N2 accounts for the total additive noise. K is image independent, and mainly depends on the imaging system aperture; the power level of the noise term NT is

expected to be well below the signal power level in a real ultrasound imaging system. If F and M are little subregions of the whole image, K can be seen as a local relationship. A direct computation of K starting from the two images is practically unsuitable due to the required inversion of matrix H1 , which is very sensitive to noise. The proposed method is based on one suitable ANN to reproduce the mapping between the two images: during the training phase, the low resolution image is scanned shifting by one pixel a rectangular window of nx ×nz pixels, so that the image is completely covered. We assume that the luminance of a given pixel of the high resolution image is related to the luminance of a neighborhood of the corresponding pixel in the low resolution image. The 2-D size of the neighborhood of the given pixel, here assumed to be a rectangular window, is not larger than the size of the local PSF. In fact, it is reasonable to assume that the farthest scatterer of the scene related with the luminance of the central pixel of the rectangular window is placed at a distance equal to the half width of the local PSF. The luminance value of the pixels belonging to the window is fed as inputs of the ANN. Each input window is related with the corresponding central pixel of the window in the high resolution image. The luminance value of this pixel is the desired output of the mapping that the ANN is trained to reproduce (see Fig. 1). After the training is completed, by providing as input a new low resolution image obtained with the same echographic system, the ANN can produce an image that shows a resolution very similar to the image used during the training step. The obtained mapping takes into account all the deterministic phenomena, such as diffraction, tissue absorption, the effective shape assumed by the PSF in the examined region, depending on the scatterer distribution, and the effects due to the acquisition and measuring chain. Shadowing and PSF shape aberration due to the scatterer pattern also can be represented if these phenomena are included in the training data set. After the mapping is built, we can use an echographic probe with the same characteristics of that used to generate the high resolution images, but actually having only a subset of the active transmit-receive channels, obtaining a relevant hardware cost reduction both for the probe and the transmit-receive electronics. It is worth noting that the relationship between the two images is highly space variant in practice, i.e., the PSF and the ratio K can significantly vary in different regions of the image. This problem can be overcome by training different ANNs on limited regions of the image, or by including in the input data some information about the current position of the input windows.

carotenuto et al.: enhancement of ultrasound images with neural networks

1041

Fig. 1. Schematics of the proposed method: a) training on a suitable set of example data. Here the same image provides both the learning and the validation data sets. The neural network output is compared to the target pixel of the high resolution image, and the error is fed back to the weight adjustment algorithm. (b) Output generation during normal operation: the neural network generates the output pixel value associated with the input window.

III. Mapping Representation with Neural Networks A static mapping is a function f : n →  from the input domain to the output co-domain; in this paper, n is the number of the pixels of the input windows, whose luminance is related to the corresponding pixel luminance of the high resolution image. Let us define an input and output range and a quantization level M for the luminance values. This can be reasonably assumed, if we consider that typical experimental values are the output of an analog to digital converter. Such a quantization implies that the mapping f is considered only within a n-dimensional box consisting of M n discrete points. A direct mapping representation is a very troublesome problem, because of the so-called “curse of dimensionality”, which takes place dealing with every nonparametric techniques for mapping representation. In general,

we have the exponential growth of the memory requirements in the number of the inputs. Even for small M and n, this exponential growth makes this approach impracticable in a direct way [21]. For example, with M = 256 and n = 10, 280 memory locations should be allocated. Moreover, the acquisition effort of the input-output pairs also has exponential growth. At present, many different methods have been reported in the literature to compactly represent a given data set, from classical interpolation functions to n-dimensional splines [22]–[25]. Several theorems justify a great variety of computational structures, called artificial neural networks, which are currently used as mapping approximations (see for example [26]–[28]). Among the best neural network architectures for such applications, the multilayered perceptron (MLP) provides a good approximation in case of smooth f , but it requires

1042

ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002

time-consuming learning sessions and its convergence is not guaranteed, despite the modern learning algorithms used today [29]. In this work, we use the sum decomposition (SD) net proposed by one of the authors and co-workers [30], [31], and we compare the results obtained with a properly sized MLP. The MLP and the SD are briefly summarized below. The MLP net is composed of different layers, whose identical units are linked exclusively with units of the adjacent layers; no links are present between units of the same layer. It has been demonstrated (see [28]) that the minimal net capable to represent an arbitrary continuous mapping y = f (x1 , x2 , . . . xn ) is composed of three layers. The output of a n inputs net can be computed as follows:   N n   y∼ w2i ϕ  w1ij xj − ϑi  , = f˜ (x1 , x2 , . . . xn ) = i=1 j=1 (4) where f˜ is the approximation of f , w1ij is the element of the weight matrix W1 , w2i is the element of the weight vector W2 , and ϑi is a suitable constant. Usually, the function ϕ is a sigmoid as follows: ϕ(x) =

1 , 1 + e−λx

(5)

where the parameter λ modifies the slope of the transition from 0 to 1, and it is usually chosen independently of the current application. The value of N and the weights in W1 and W2 are computed using a training algorithm such as the back-propagation or the Levenberg-Marquardt algorithms [29]. The SD net belongs to the class of nonparametric mapping techniques; the SD net avoids the “curse of dimensionality” of direct mapping representations by approximating the n-dimensional mapping with a proper set of 1-D mappings [30], [31]. Consider N (where N ≥ n) auxiliary variables W = [w1 , . . . wN ] generated by N suitable affine transformations: wq = Aq X + bq ,

(6)

where X = [x1 , . . . xn ]T , Aq = [aq1 , . . . aqn ] and q = 1, 2, . . . N . Next, the approximation of f in suitable discrete points can be computed as: f˜ =

N 

gq Aq X + bq ,

(7)

q=1

where the gq are discrete input 1-D functions, the   parentheses represent a quantization transformation (from real number to integer index). For simplicity, we assume that each gq is a 1-D array of M memory cells. The values in the arrays gq are the weights of the SD net. All the N neurons of a MLP net have the same activation function independent of the mapping f , and real

numbers as input. On the contrary, each array gq is different from the others, its content depends on the mapping f and it is addressed by integer numbers. Noticeable advantages of the SD net compared to the MLP are the much faster convergence during the learning phase and the lower computational complexity; from (4) the computational complexity of the output generation is O(N 2 ) for the MLP; from (7) it is O(N ) for the SD net. A currently opened issue common to the MLP and to the SD net is how to determine the value of the parameter N ; unfortunately, at present only approximate techniques are available [29]– [31]. In this paper, we make the assumption that the mapping can be satisfactorily approximated by: f˜ =

n 

gq xq ,

(8)

q=1

i.e., the sampled mapping f of n variables can be perfectly decomposed in, or satisfactorily approximated by, a summation of n 1-D arrays gq [i.e., each Aq vector of (7) has only one 1 in position q, and zeros elsewhere]. This assumption will be verified a posteriori, i.e., we carry out the learning process; and, if the error decreases under a given threshold, the desired mapping belongs to the class of decomposable mappings, or at least, it can be satisfactorily approximated in this way. In order to compute the content of the n arrays gq from a sequence of experimental input-output pairs (i.e., the mapping samples), the following iterative technique, or learning rule, has been proposed: gqi+1 wq  = gqi wq  + α{y − [g1i w1  + g2i w2  + . . . gni wn ]},

(9)

where i is the iteration index, q = 1, 2, . . . n, and gqi is the array computed at the ith iteration. The following steps are needed for the practical application of the method: • n one-dimensional arrays gq , of M elements each, are allocated; the initial value of all the cells is set to zero; • the n inputs are quantized with M levels (i.e., each input value becomes an integer number ranging from 0 to M −1); each integer input value is used to address a different cell in each gq ; • the contents of the n cells addressed, one for each gq , are read and summed; the result is the current api proximation yp = g1i w1  + g2i w2  + . . . gN wn  of the desired output y; • the representation error is computed as ep = y − yp , where y is the desired mapping output. αep is accumulated by the iteration of (9) in all memory cells involved in the current approximated output; note that this value is accumulated in different locations of each gq , depending on the current input pattern. • Repeat the second, third, and fourth steps for all mapping input-output pairs.

carotenuto et al.: enhancement of ultrasound images with neural networks

The convergence of the iteration, or learning algorithm, of (3) depends on α; it is possible to show that, by properly choosing the parameter α, the representation error decreases; and, if the mapping is decomposable in a sum of 1-D mappings, it decreases to zero [31]. It is worth noting that, in general, cells in the same position in different gq s are addressed at different moments by the learning algorithm, depending on the flow and on the pattern of the input data values; as a consequence, after convergence of the learning algorithm, each gq will be different from the others. Decomposable mapping functions can be represented with enormous benefits in memory requirements, without loss in information content; in fact, only n 1-D arrays gq have to be computed and stored in memory, with a maximum storage of n · M memory locations, in place of the M n locations of the direct mapping representation. One of the main problems in building a mapping representation is the high number of experimental samples and the long computation time required to obtain the gq s when M is considerably high. A remedy for the above problem is related to the generalization concept. Generalization allows an associative memory (i.e., a mapping representation) to handle associations never learned before, under the assumption that these associations are neighbors of those previously learned. Generalization techniques are very effective when a complete training is not allowed by the current problem constraints, and it easily can be added to the proposed algorithm [31]. The learning procedure (i.e., the weight adjustment process to produce the correct input-output association) can be divided into two steps regardless of the net used: the weight adjustment and the validation. For this purpose, the available data set, which is composed of input-output pairs, is divided into two subsets. The first subset is fed to the learning algorithm during the weight adjustment. After the adjustment is completed, the second subset, never shown before to the net, is used to verify that the net effectively can represent the mapping with the desired grade of approximation.

IV. Experimental Results The proposed algorithm has been applied to three experimental ultrasound images acquired at the Biomedical Ultrasonics Laboratory (BUL), University of Michigan (http://bul.eecs.umich.edu) using SAFT [20]. The first image is a cyst phantom composed of a uniform parenchyma and some circular cylinders of absorbing material, plus one wire on the right side. Imaged by an echographic system, the absorbing cylinders appear as black circular regions, mimicking cysts in the human body; the wire appears as a single white point. By properly processing the data sets with the SAFT technique, we obtained two images of the same phantom at different resolutions, i.e., using two different effective apertures. The low resolution image was obtained simply by

1043

discarding a subset of the available data set. After SAFT processing, the envelope-detection, the decimation, and the log-compression to 8-bit resolution, we obtained two matrices of luminance data for the same phantom, the low and the high resolution matrices, respectively, where each column of the two matrices corresponds to a single line of view. Successively, conventional scan conversion generated the usual sector scan images by properly deflecting each line of view and by filling the resulting spaces between the lines with interpolated values. The SAFT echographic data were acquired with a 3.5 MHz center frequency probe, 128 channels, pitch 0.22 mm, sampled at about 13 MHz, 4 bytes per sample, 2048 samples per channel. According to the SAFT technique, the data set is composed of 16,384 sampled RF tracks, spanning a range of about 230 mm in depth. We processed the data to obtained a phased-array scanning image, composed of 200 lines of view, with 90 degree beam deflection; after envelope detection, decimation, logcompression, 8-bit quantization, and scan conversion, we have a 512 × 512 pixels gray levels image. In particular, dynamic focusing both in transmission and in reception was performed. The proposed algorithm applies just before the scan conversion, on the log-compressed luminance matrices. Figs. 2(a) and (b) show the subarray image obtained with N = 64 elements, and the large aperture image obtained with N = 128 elements. The 128-element image will be the target image, and the 64-element image will be the input image to be processed. Fig. 2(c) shows the normalized difference image between the 128- and the 64-element images, where the gray levels are proportional to the differences and white pixels correspond to zero difference. The differences between these two images are mainly located in the lateral direction around the single high reflectance scatterer and cysts. Comparing the target image with the subarray image, it can be seen that the target image shows a wider black region (absorbent cyst) and a narrower white region (reflecting cyst) than the subarray image. In order to apply the proposed method, we have to choose the size (i.e., the pixel number) of the input window. The window size is related to the width of the PSF of the system. As a matter of fact, the size of the PSF changes through the image: in regions very near or very far from the transducer, and in the side zones, the PSF is larger than the PSF in central regions, due to diffraction phenomena. We have to find the minimum size of the input window capable to achieve the desired level of approximation because the computational effort grows very rapidly with the number of inputs. The available image (i.e., the subarray image) was first divided into three horizontal subregions [Fig. 1(a)], A, B, and C. In each subregion we suppose a constant PSF, and we compute the best size of the input window. A different mapping is then computed for each subregion. Each subregion was further halved to have two data sets: the learning set and the validation set.

1044

ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002

Fig. 2. Image 1. Input and target data sets: (a) SAFT echographic image of the cyst phantom with transducer aperture of 64 elements, dynamic focusing in transmission and reception. (b) SAFT echographic image of the synthetic phantom with transducer aperture of 128 elements, dynamic focusing in transmission and reception. (c) Difference image between the 64- and the 128-element images; white pixels correspond to zero error.

As mentioned above, the complete training procedure consists of a learning phase, in which a subregion of the image is scanned with the input window and the values of the pixels belonging to the window are fed as current inputs to the neural network learning process. The net weights are adjusted by the learning rule until the desired error in reproducing the mapping is achieved. Each complete subregion scan and weight adjusting is called an epoch. The convergence parameter α of the SD net learning iteration [see (3)] was chosen to be 0.1, in order to minimize the oscillations of the learning curve and to maintain acceptable the learning time. The resolution enhancement due to the increased aperture of the transducer slightly affects the axial resolution. We found that each pixel of the high resolution image is related mainly with pixels along the lateral direction, so that the input window was reduced to a single row of pixels along the lateral direction. We applied different input windows to the three axial subregions—A, B, and C—whose width was evaluated by a trial-and-error procedure. Using this procedure, the learning phase took 5,000 epochs for each window width trial. During the validation, the learning parameter α was set to zero, and the neural network was tested on the validation set, which is composed of pixels not used in the learning phase. The learning curves are shown in Fig. 3(a). In particular, the curves show the normalized root-mean-square (RMS) error computed on the whole image versus epochs. The RMS error is a global quality index commonly used to easily quantify performances of learning algorithms; however, it should be noticed that, in image processing, a low RMS error does not guarantee that the results are sufficiently good for each pixel of the image. We computed the validation error as a function of the width w of the input windows, and we chose the knee value of the descending

Fig. 3. Input window sizing: (a) Normalized RMS mapping representation error versus epochs line for different values of input window width w, by using the proposed method. (b) Normalized RMS mapping representation plateau error versus w.

carotenuto et al.: enhancement of ultrasound images with neural networks

1045

TABLE I Image 1. Lateral Resolution of the Low Resolution, High Resolution and Computed Images at Line 100, 323, and 420 (Before Scan Conversion), Respectively. Lateral resolution Line 100 64 elements 128 elements computed Line 323 64 elements 128 elements computed Line 420 64 elements 128 elements computed

Fig. 4. Mapping representation results: (a) The image computed with the proposed method. (b) The image error between the high resolution image and the computed image.

2.61 1.23 2.27 2.31 1.22 1.48 2.02 0.86 1.32

curve, which is a good trade-off between low error and computational effort [Fig. 3(b)]. With the echographic system used, we found the best compromise, between mapping representation accuracy and computational efforts, setting the size of the input window to w = 13 for region A, w = 11 for region B, and w = 9 for region C. By processing the whole input image, the final output image is computed [Fig. 4(a)]. Comparing this image with the true 128-element image [Fig. 2(b)], the general impression is very satisfying, i.e., the two images appear to be very similar as far as the global sharpness and the correspondence of each detail is concerned. In the computed image, the absorbent and the reflecting cysts appear almost as large as in the true 128-element image. The obtained improvement, with respect to the 64-element image [Fig. 2(a)], also is evident. Fig. 4(b) shows the difference between the true 128 elements [Fig. 2(b)] and the computed image [Fig. 4(a)]. Comparing the difference between the 128- and the 64-element images [Fig. 2(c)] and the difference between the true 128 elements and the computed image [Fig. 4(b)], it is possible to see that, with the computed image, the errors around the single high reflectance scatterer and the absorbent cyst are greatly reduced; also for the reflecting cyst, there is a significant reduction. The absorbent cysts in Figs. 2(b) and 4(a) are practically identical. The same holds for the single scatterer: only few isolated point errors are visible. In particular, no artifacts in geometry and/or in intensity were generated, even if the processing is nonlinear. In order to quantify the resolution increase, we computed three quality indexes for the 64 elements, 128 elements, and computed images [Figs. 2(a) and (b) and 4(a)]. The first index is a measure of the resolution based on the autocovariance function of sample horizontal lines of the image as defined in [7]. The autocovariance function of the image was normalized to 1.00. The lateral resolution Rl was defined as the width of the autocovariance function at l dB below the peak; we used l = 6.00. The computed resolutions have been summarized in Table I for three dif-

1046

ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002 TABLE II SNR and CNR of the Low Resolution, High Resolution and Computed Images1 . Image 1

Image 2

Image 3

Transducer aperture

SNR

CNR

SNR

CNR

Rl

SNR

CNR

Rl

64 elem. 128 elem. computed

4.96 6.98 5.21

0.36 0.41 0.39

4.02 5.86 4.71

0.28 0.35 0.30

2.64 1.52 1.87

3.98 5.55 4.32

0.24 0.33 0.29

2.91 1.82 2.01

1 The

CNR is computed on two 10 by 10 pixels squared regions, on the background and at the center of the reflecting cyst (lesion). The lateral resolution Rl of the images 2 and 3 was computed on the line corresponding to the most central scatter.

ferent image lines (100, 323, and 420). The second index is the signal-to-noise ratio (SNR), defined as SNR = (µ/σ), where µ is the mean and σ2 is the variance of the image [7] (see Table II). As third index, the contrast-to-noise ratio (CNR), de| µl − µb | fined as CNR =    , was computed on two 1 2 2 + σ σ l b 2 squared regions 10 by 10 pixels belonging to the background and to the reflecting cyst (i.e., the lesion), where µ is the mean and σ2 is the variance of each image squared region, and subindexes l and b stand for lesion and background, respectively [32] (see Table II). The quality indexes confirm the general visual impression that the computed image is in good agreement with the true 128-element image. In particular, the resolution indexes of the true 128-element image [Fig. 2(b)] and of the image computed using neural network [Fig. 4(a)] are quite similar; the main lobe width of the autocovariance function of the line through the single scatterer placed in the left side of the image (line 323) decreases from 2.31 (physical 64-element line) down to 1.48 (computed line), very close to the value 1.22 of the true 128-element line. The SNR computed over the whole image is 6.98 for the true 128-element image and 5.21 for the computed image: the proposed algorithm improves the resolution but does not significantly worsen the noise level. The final test on the SD neural network is carried out by processing other images with the same gq s (i.e., with the same weights) computed during the training on the first image. We processed the other two images acquired with the 128-element transducer array currently available from the BUL web site. Figs. 5(a)–(c) show the low (64 elements) and high (128 elements) resolution images of a wire phantom (image 2), obtained by processing the SAFT data set in the same way as the image used for the training, and the image computed by the proposed neural network, respectively. Figs. 6(a)–(c) show the high and low resolution images of a cyst phantom (image 3), obtained by processing the SAFT data set in the same way as previously described, and the image generated by the net, respectively. Both the computed images of the wire and the cyst phantom look very similar to the corresponding true high resolution images, and they show improvement in all the quality indexes (lateral resolution, SNR, and CNR) in re-

spect to the starting low resolution images, as summarized in Table II. The performances of the proposed neural network were compared with the performances of a standard MLP net. In particular, due to the heavy computational load of the MLP training procedure, we considered only the central region of image 1; the input window size, which depends on the PSF width, is the same used for the application of the SD net. The number of neurons in the hidden layer was chosen by carefully applying a standard dimensioning technique, as described in [29]; we found best results using nine hidden neurons. In Fig. 7, the learning curve is shown as a function of Nhl , number of hidden neurons. For each trial, the learning phase of the MLP took 100,000 epochs, instead of the 5,000 epochs of the SD net. Fig. 8(a) shows the starting 64-element aperture image, the target 128-element image Fig. 8(b), the image provided by the proposed neural networks method Fig. 8(c), and the computed MLP image Fig. 8(d). The results of the two neural networks are quite similar; however, the proposed network achieves the goal with a substantially lower computational effort than the MLP net, both in the learning phase and in output generation. V. Conclusions A new technique for echographic image resolution enhancement is presented. After a learning phase, in which we built an approximator of the mapping between low and high resolution images, the technique is able to compute images that could be obtained with a larger transducer aperture using information contained both in the image to be enhanced and in the mapping itself. The ANN used to represent the underlying mapping between the low and the high resolution images shows good computational performances in comparison with a standard MLP net, both in training and in output generation. The proposed learning algorithm builds the approximator of the desired mapping from experimental data; the approximator mainly depends on the system used to produce the image, not on the particular image under investigation, as demonstrated by the test procedure carried out successfully on images not used during the training procedure.

carotenuto et al.: enhancement of ultrasound images with neural networks

1047

Fig. 5. Image 2. Test of the neural net: (a) SAFT echographic image of the wire phantom with transducer aperture of 64 elements, dynamic focusing in transmission and reception. (b) SAFT echographic image of the wire phantom with transducer aperture of 128 elements, dynamic focusing in transmission and reception. (c) Computed image starting from the low resolution image and using the net weights obtained during the training on the Image 1.

Fig. 6. Image 3. Test of the neural net: (a) SAFT echographic image of the cyst phantom with transducer aperture of 64 elements, dynamic focusing in transmission and reception. (b) SAFT echographic image of the cyst phantom with transducer aperture of 128 elements, dynamic focusing in transmission and reception. (c) Computed image starting from the low resolution image and using the net weights obtained during the training on the Image 1.

The obtained results show that the proposed algorithm provides a good resolution enhancement with no geometrical or intensity artifacts, which are indeed a well-known problem of many nonlinear algorithms. The presented technique requires no hardware modifications of standard echographic systems, and it could be directly implemented in modern digital systems using linear, convex, and phased arrays. The computational cost is low, compared with blind, Fourier-based, deconvolution approaches; only multiplications and sums on real numbers are required. Moreover, contrary to Fourier-based methods, the entire processing is made on envelope-detected and decimated images, further lowering the total data to be processed. Real-time echographic system implementation is currently being developed. Fig. 7. MLP internal structure sizing: normalized RMS mapping representation plateau error versus Nhl .

The neural network-based resolution enhancement could represent a good trade-off between quality of the

1048

ieee transactions on ultrasonics, ferroelectrics, and frequency control, vol. 49, no. 8, august 2002

Fig. 8. Echographic images obtained with different methods: (a) SAFT image with an aperture of 64 elements. (b) SAFT image with an aperture of 128 elements. (c) Image computed by using the SD net. (d) Image computed by using a MLP net.

image and computational efforts. We hope to be able in the near future to highlight limitations and trade-offs of the presented method, and to place this method in a solid, theoretical framework. Our current efforts are devoted to this direction. Acknowledgments The authors want to thank Prof. M. Karaman for his valuable hints in processing SAFT image data. References [1] G. S. Kino, Acoustic Waves. New York: Prentice-Hall, 1987. [2] J. U. Quistgaard, “Signal acquisition and processing in medical diagnostic ultrasound,” IEEE Signal Processing Mag., pp. 67– 74, Jan. 1997. [3] M. Fatemi and A. C. Kak, “Ultrasonic B-scan imaging: theory of image formation and technique for restoration,” Ultrason. Imaging, vol. 2, no. 1, 1980.

[4] D. Irac` a, L. Landini, and L. Verrazzani, “Power spectrum equalization for ultrasonic image restoration,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 36, pp. 216–222, Mar. 1989. [5] T. Loupas, S. D. Pye, and W. N. McDicken, “Deconvolution in medical ultrasonics: Practical considerations,” Phys. Med. Biol., vol. 34, pp. 1691–1700, 1989. [6] J. P. Ardouin and A. N. Venetsanopoulos, “Modelling and restoration of ultrasonic phased-array B-scan images,” Ultrason. Imaging, vol. 7, pp. 321–344, 1985. [7] U. R. Abeyratne, A. P. Petropulu, and J. M. Reed, “Higher order spectra based deconvolution of ultrasound images,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 42, pp. 1064– 1075, Nov. 1995. [8] T. Taxt, “Restoration of medical ultrasound images using twodimensional homomorphic deconvolution,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 42, pp. 543–554, 1995. [9] T. Taxt and G. V. Frolova, “Noise robust one-dimensional blind deconvolution of medical ultrasound images,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 46, pp. 291–299, 1999. [10] R. J. Dickinson, “Reduction of speckle in ultrasound B-scans by digital signal processing,” Acoust. Imaging, vol. 12, pp. 213–224, 1982. [11] A. K. Nandi, D. M¨ ampel, and B. Roscher, “Blind deconvolution of ultrasonic signals in nondestructive testing applica-

carotenuto et al.: enhancement of ultrasound images with neural networks

[12] [13] [14] [15] [16] [17]

[18] [19] [20] [21]

[22] [23] [24]

[25] [26]

[27] [28] [29] [30] [31]

[32]

tions,” IEEE Trans. Signal Processing, vol. 45, pp. 1382–1390, 1997. D. E. Robinson and M. Wing, “Lateral deconvolution of ultrasonic beams,” Ultrason. Imaging, vol. 6, pp. 1–12, 1984. L. Cheng-Chang and S. Yong Ho, “A neural network based image compression system,” IEEE Trans. Consumer Electron., vol. 38, pp. 25–29, 1992. C. N. Manikopoulos, “Neural network approach to DCPM system design for image coding,” IEEE Proc.-I, vol. 139, no. 5, pp. 501–507, 1992. G. Cottrell, “Principal components analysis of images via back propagation,” SPIE Visual Commun. Image Processing, vol. 1001, pp. 1070–1076, 1988. D. T. Pham and E. J. Bayro-Corrochano, “Neural computing for noise filtering, edge detection and signature extraction,” J. Syst. Eng., vol. 2, pp. 111–122, 1992. J. G. Daugman, “Complete discrete 2-D gabor transforms by neural networks for image analysis and compression,” IEEE Trans. Acoust. Speech Signal Processing, vol. 36, pp. 1169–1179, 1988. E. S. Dunstone, “Image processing using an image approximation neural network,” Proc. IEEE ICIP-94, vol. 3, pp. 912–916, 1994. C. S. Pattichis and A. G. Constantinides, “Medical imaging with neural networks,” in Proc. IEEE Workshop on Neural Networks for Signal Processing, 1994, pp. 431–440. M. Karaman, P.-C. Li, and M. O’Donnell, “Synthetic aperture imaging from small scale systems,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 39, pp. 429–442, 1995. C. G. Atkeson and D. J. Reinkensmeyer, “Using associative content-addressable memories to control robots,” in Neural Networks for Control. W. T. Miller, III, R. S. Sutton, and P. J. Werbos, Eds. London, England: MIT Press, pp. 255–286, 1990. C. De Boor, “Bicubic spline interpolation,” J. Math. Physics, vol. 41, pp. 212–218, 1962. J. Sard and S. Weintraub, A Book of Splines. New York: Wiley, 1970. K. Stokbro and K. Umberger, “Forecasting with weighted maps,” in Proc. Conf. Nonlinear Modeling Forecasting, SFI Studies in the Sciences of Complexity, M. Casdagli and S. Eubank, Eds. 1992, pp. 73–93. A. Bowyer, “Computing Dirichelet tessellations,” Computer J., vol. 2, pp. 162–166, 1981. A. N. Kolmogorov, “On the representation of continuous functions of several variables by superimpositions of continuous of one variable and addition,” Dokl. Akad. Nauk SSSR, vol. 108, pp. 179–182, 1957, (English transl.: Amer. Math. Soc. Transl., vol. 28, pp 5–59, 1965). D. A. Sprecher, “On the structure of continuous functions of several variables,” Trans. Amer. Math. Soc., vol. 115, pp. 340– 355, 1965. K. I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural Networks, vol. 2, pp. 183– 192, 1989. M. Nørgaard, “Neural network based system identification toolbox,” Technical Report 95-E-773, Inst. of Automation, Technical University of Denmark, 1995. R. Carotenuto, L. Franchina, and M. Coli, “Nonlinear system process prediction using neural networks,” in Proc. IEEE Int. Conf. on Neural Networks (ICNN’96), pp. 184–189, 1996. R. Carotenuto, L. Franchina, and M. Coli, “Multidimensional mapping representation by multiple 1-dimensional decompositions for complex system modelling,” in Proc. Int. Conf. Fractal and Chaos in Chem. Eng., Singapore: World Scientific, 1996, pp. 518–529. M. Karaman and M. O’Donnell, “Subaperture processing for ultrasonic imaging,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 45, pp. 126–135, Jan. 1998.

1049

Riccardo Carotenuto (M’00) was born in Rome, Italy. In 1992 he received the Dr. Sc. degree in electronic engineering from the University of Rome “La Sapienza”, Rome, Italy. He served as a postgraduate fellow at the Department of Electronic Engineering of the University of Rome “La Sapienza”, working on complex and partially known dynamical systems. Beginning in 1993, he worked toward the Ph.D. at the University of Rome “La Sapienza” on a new method for the extraction of robust system representation from time series data. In 1997, he earned the Ph.D. degree. Since 1997, he has joined the Acustoelectronics Laboratory (ACULAB), Department of Electronic Engineering, University Roma Tre, working on ultrasonic micromotors and on resolution enhancement of ultrasound imaging systems. His main interests include time series prediction, nonlinear systems identification and control, neural networks theory and applications.

Gabriele Sabbi was born in Dolo (Venice), Italy, in 1973. He received the Dr. Sc. degree in electronic engineering from the University Roma Tre, Rome, Italy, in 2001 with a thesis on ultrasound image processing. His current interests are on design and development of Javar based networking and wireless applications.

Massimo Pappalardo (M’98) received the Dr. Sc. Degree in electrical engineering from the University of Naples in 1967; he started his research activity with a scholarship at the Istituto di Acustica C.N.R., where, in 1968, he joined the staff as a Researcher. In 1978 he became Scientific Manager of the Department of Ultrasound and Acoustic Technology and in 1981 he was member of the Scientific Committee. During the period ’68–’85 he made research activity at the University of Birmingham, he was responsible of several national and international research programs and he taught as a contract professor at the Universities of Calabria and Salerno. In 1985 he became Full Professor at the Department of Electronic of the University of Salerno where he was the Director for many years. Since 1995 he is Full Professor at the Department of Electronic Engineering of the University Roma Tre. Besides the basic course in Electronic, he is teaching the specialized courses “Sensors and Actuators” and “Electronic Instrumentation and Measurements”, the first devoted to the piezoelectric sensors and actuators and the second to the acustoelectronics systems. Massimo Pappalardo worked mainly on the field of ultrasound applications and acoustical imaging for biomedical and underwater prospecting. His current research activity is on transducer modeling, piezoelectric devices and echographic image processing. Massimo Pappalardo is author of more than one hundred works on these fields published on international magazines and conferences proceedings.