Computer-aided methods for quantitative assessment of ... - CiteSeerX

3 downloads 3377 Views 2MB Size Report
One related research project in computer-based grading of ARMD patients is by ... Lesions specific to ARMD are measured by a grader from 30 degree field-of-.
Computer-aided methods for quantitative assessment of longitudinal changes in retinal images presenting with maculopathy Peter Soliz, Mark Wilson, Sheila Nemeth, and Phong Nguyen Kestrel Corporation Albuquerque, NM 87109 USA ABSTRACT This paper presents the results from applying a computer-based methodology for making precise measurements of longitudinal changes in a patient’s digital retinal images presenting with age-related macular degeneration. The digital retinal image analysis system applies recognized principles in automatic image segmentation and integrates the automation with a graphical user interface. Drusen, retinal lesions associated with age-related macular degeneration (ARMD), were segmented using a region-growing algorithm. The algorithm calculates the 76 percentile intensity in a region to provide seed points for the neighborhood-growing algorithm. Twenty-one cases were analyzed. Agreement statistics (kappa) were determined by comparing the automated results with those provided from manually derived measurements. Agreement statistics ranged from 0.49 to 0.71 for different regions of the retina. The manual analysis “ground truth” was performed by trained graders from the University of Wisconsin Reading Center using guidelines found in the Wisconsin Age-Related Maculopathy Degeneration Grading Scheme (WARMGS). Because of the time required, the ophthalmic graders can only grade (size, area, type) the most prominent drusen in specific regions, resulting in a small sampling of drusen lesions in the retina. The computer-based approach allows one to efficiently and comprehensively “grade” all of the lesions for larger numbers of images. The additional advantage, however, is in the precision and total area that can be graded with the computer-aided technology. Computer-registered longitudinal images produced a precise determination of the temporal changes in the individual lesions. This study has demonstrated a robust segmentation and registration methodology for automatic and semiautomatic detection and measurement of abnormal regions in longitudinal retinal images. Keywords: digital retinal images, age-related macular degeneration, drusen, segmentation, computer-aided, feature descriptors, longitudinal morphology.

1. INTRODUCTION This study focused on the segmentation of lesions called drusen, which are lipid deposits in the outer retina that are associated with age-related macular degeneration (ARMD). ARMD is the leading cause of irreversible visual loss among the elderly in the US and Europe. The U.S. Census Bureau projects the U.S. population over 65 will increase from 34 million in 1997 to approximately 69 million by the year 2030. As a result, ARMD is expected to increase in importance as a public health concern. Estimates of ARMD in population-based prevalence studies vary considerably. One of the reasons for the differences among them is the variation in the methods used to detect and classify ARMD. Standardized protocols for detecting and classifying ARMD have been developed. Comparisons have been made between three large population studies: the Beaver Dam Eye Study, the Blue Mountains Eye Study, and the Rotterdam Study. The current grading techniques for the measurement of pathologies require grader estimates of the area covered. Consequently, there is a high degree of subjectivity, leading to inconsistencies in inter-study comparisons. Currently stereo pairs of fundus images are “graded” by trained specialists to determine the nature of the pathologies in the subject’s retina. For research studies and clinical trials, retinal images continue to be evaluated and characterized (grade) based on techniques that are highly dependent on the human observer’s judgment and experience. In these studies, it is important to quantify the pathologies to the greatest extent possible. Unfortunately, the task of performing this function with totally manual techniques, using light boxes or 35mm slide stereo viewers, limits the precision and comprehensiveness of the grading. A computer-based system that allows a fully automated segmentation of lesions in

Medical Imaging 2002: Visualization, Image-Guided Procedures, and Display, Seong K. Mun, Editor, Proceedings of SPIE Vol. 4681 (2002) © 2002 SPIE · 1605-7422/02/$15.00

159

the retinal image has been developed. The system presents a stereo rendition of the images on a computer monitor using shuttered glasses technology. It is designed to exploit the strengths of the trained specialist, such as classification, while making available the attributes of computer algorithms, such as precision and speed. The computer can with relative ease segment the drusen, measure their sizes, calculate a total area, and record their precise location, while with the aid of the human analyst the drusen can be classified as to its type (reticular, hard, soft, etc). Measurement of individual lesions or calculation of total area can be a daunting task for the human, but is nearly instantaneous and automatic for the computer. Fundus photography is rapid, non-invasive, and a well-accepted integral part of clinical diagnosis of ocular diseases. Digital imaging technology, a more convenient form of record keeping, has achieved both improved spatial resolution and cost, approaching that of photography. Digital fundus imaging offers the potential for a great number of quantitative measurements and computer-aided diagnoses (CAD) not otherwise possible with photographic images. More complex image processing, such as automatic feature detection and identification, is now an integral part of numerous medical machine vision applications. One related research project in computer-based grading of ARMD patients is by Berger and Shin. Berger and Shin have demonstrated and validated for their purposes a software package, “To quantitate the area subtended by drusen in color fundus photographs for the conduct of efficient, accurate clinical trials in age-related macular degeneration.” Their philosophy for integrating the medically trained user through interactive image processing parallels our own approach. Their system capitalizes on the power and technology now available in computers by giving the computer the role of counting, measuring, and storing drusen data. The grader supervises the analysis by providing detection and classification function. Their system does not presently use stereo images.

2. MATERIALS AND METHODS To validate the segmentation algorithm, the study used a data set of eighteen 35 mm color slide stereo-pair, fundus images from the WARMGS. Additionally, 3 longitudinal cases were analyzed. The longitudinal cases each contained three time periods, five years apart. The WARMGS contains 136 graded stereo retinal image pairs with examples of the various lesions and stages of age-related macular degeneration (ARMD). Eighteen image pairs were selected from the WARMGS data set for the validation step. The fundus images were digitized with great care and study to ensure the best possible quality data were presented to both the human interpreter and the image processing algorithms 12]. Before applying the segmentation, the images were pre-processed to reduce lighting artifacts and inter-subject pigmentation differences. Next, using ground truth that was based on manual grading of lesion sizes, the automatic segmentation system was validated. Finally, the algorithm was applied to a longitudinal data set consisting of three cases each spanning a period of 15 years. Each of these steps is described. 2.1 WARMGS Currently, ARMD is detected and quantified in drug and epidemiological studies by the grading of color stereoscopic fundus photographs andor the analyzing of fluorescein angiograms. Describing the natural history of age-related macular degeneration is important in understanding its pathogenesis and developing approaches to prevent it. The WARMGS developed by Klein et al. [’I describes a system for the grading of ARMD. This system has been used in large epidemiological studies, large multi-centered clinical trails, and to develop an international classification system. The set of photographs contained in the WARMGS provide an introduction to lesion recognition, training the grader for the decision making process they will undergo when classifying lesions. The subtle

160

Proc. SPIE Vol. 4681

Figure 1. WARMGS grid. Center Circle has a radius of 500 pm>followed by 1500 pm for the inner circle, and 3000 pm for the outer circle. The graders use the three small circles of 63 pm: 125 pm and 250 pm to measure lesions.

and salient nature of drusen requires explicit attention to such features as margin, boundary, sharpness, and thickness (Le. substance or density of the lesion). Lesions specific to ARMD are measured by a grader from 30 degree field-ofview color hndus photographs using a grid centered over the fovea to define subfields (Figure l), standard circles printed on plastic to assess size and area of specific lesions, and a specially designed light box to allow better distinction of subtle drusen. The subfields are defined depending on the eye (OD or OS): center circle (CC), inner and outer superior (IS, OS), inner and outer nasal (IN, ON), inner and outer inferior (11, 01), and inner and outer temporal (IT, OT). The degrees of exact agreement achieved between two trained graders across a variety of lesions ranged from 67% for drusen size to 99% for geographic atrophy 14]. Time constrains these graders to measure only the most salient pathological feature, for example largest drusen, in each region of the grid. A comprehensive approach that locates and measures every lesion is not feasible for the human given the large volume of images that must be analyzed. 2.2 Selection of Validation Data Set To validate the segmentation algorithm before applying it to the longitudinal image set, a test was performed using the WARMGS images where “ground truth” information on lesion sizes is available. The “ground truth” was provided by the University of Wisconsin through rigorous manual analysis and group consensus. A subset of 18 cases from the WARMGS was selected for validation. A total of 150 out of a possible 162 lesions were used in the validation of the segmentation algorithm. The size grades assigned by the reading center are one of the following: none, questionable, less than circle CO, less than circle C1, less than circle C2, greater than or equal to circle C2, reticular, cannot grade because of reticular, cannot grade because of photo quality. For the test, only the four circle measurement criteria were used and an image with any of the other grades was thrown out. From this subset the average grade from all 9 regions for a patient was calculated and sorted into one of four groups. The top five from each group was selected for validation. This process gave a good distribution of sizes for each of the nine regions. The drusen type was not considered as a selection criterion so that soft and hard distinct and soft indistinct drusen are presented randomly. 2.3 Longitudinal Data Set The Beaver Dam Eye Study conducted by the Department of Ophthalmology and Visual Sciences at the University of Wisconsin Madison is an extensive longitudinal natural history study of drusen progression. The study completed its tenth year in 2000 and contains hundreds of subjects. Three subjects were analyzed for drusen progression from a baseline exam (year 0) to 5 years (year 1) and then 10 years (year 2). Each of the three subjects selected varied from their drusen progression to type of drusen manifested on exam. The three years were segmented and the change in area from year to year was calculated. A discussion of the results and a chart of the area changes per grid region are presented in section 3.2.

2.4 Preprocessing The WARMGS image media are color 35-mm slides. The slides were digitized with the Nikon LS-2000 scanner with customized settings at 35 pixels per millimeter. The settings were found through a series of qualitative and quantitative assessments of visual quality geared towards the macula [21. The scanner settings allow the green channel to present the greatest amount of contrast between the drusen and macular background. The blue channel was tuned to amplify the optic disc and any illumination artifacts. Qualitative tests for evaluating color information showed the green channel contained drusen while the red and blue channels have little or no drusen information present. This can be seen in Figure 2 where the red channel (a) is predominately choroid, the green channel (b) visualizes veins, arteries, optic disc, and drusen, while the blue channel (c) mainly has disc and crescent information.

(a)

(b)

Figure 2. (a) Red channel. (b) Green channel. (c) Blue channel.

Proc. SPIE Vol. 4681

161

Once digitized the images are filtered and enhanced to assist the segmentation. The drusen segmentation algorithm begins by processing the green channel for artifacts due to illumination, pigmentation, and eyelid anomalies. The filtering process first removes any crescent effects that show up around the image border due to an eyelid, eyelash, or photographer misalignment. This is accomplished by looking in the blue color plane for intensities greater than 99.5% of the total intensity range. For an 8-bit image the cutoff would be a gray level of 254, setting all intensities greater to zero in the green plane. Another source of noise in the nasal region of the fundus is the optic disc and the peripapillary crescent that surrounds it. The disc is also located in the blue color plane by searching the outer third of the nasal region. Within this region a grid of 60x60 non-overlapping blocks have their intensities summed and the largest sum is considered the center of the disc. To segment the disc and the peripapillary crescent as well, a circle with a radius of 1.5 disc diameters is centered on this box location and the intensities are set to zero in the green plane. In Figure 3 two examples show the results of crescent and disc removal. The first example found the disc and peripapillary crescent very well, while the second example has an illumination crescent which was removed very well.

Figure 3. Examples of illumination crescent and disc with peripapillary crescent artifact removal (a,c) Original image. (b,d) Artifacts removed

Further filtering includes removal of inherent speckle noise due to digitization. The speckle is removed with a 3x3 median filter. The fmal pre-processing step removes any illumination artifacts caused by the curvature of the reflecting surfaces of the eye and/or light absorption non-linearity. To do this, the green plane is blurred using a Gaussian low2 2 i- ) , with a sigma of 50. The sigma value was based on a frequency Pass filter G(X,Y) G ( x ,y ) = exp - (’ 202 range of features present in the typical macular fundus. Once all of the features have been blurred out, only the illumination is left as seen in Figure 4 (b,e). This illumination is then subtracted from the original image to dampen the non-linear illumination while sharpening the high frequency component edges. To help visualize the filtered images shown in Figure 4 the values were linearly scaled for more brightness.

162

Proc. SPIE Vol. 4681

. *

Figure 4. (a,d) Original green channel. (b,e) Gaussian filtered image. (c,Q Flattened image.

2.5 Segmentation Algorithm

With the image filtered of artifacts and noise the segmentation process is quite straightforward. The filtering has increased the separation between the drusen and background histograms, example in shown in Figure 5 (a,b). This separation allows a simple threshold, within a region of interest, to divide the pixels that belong to a drusen from those which do not. To further localize intensity statistics and aid the segmentation, by minimizing the effects of any illumination artifacts still present, the 9 regions of the WARMGS grid previously described were divided into thirds creating 27 regions. An individual threshold is calculated for each of these 27 regions. For a region of interest the filtered gray level histogram is formed p ( r k )= nk i n , where r, is the kth gray level, nk is the number of pixels in the image with that gray level, and n is the total number of pixels in the image. All of the gray levels, which contain at least one pixel, are sorted into an array of length 1. The gray level which corresponds to element (l"0.76) in the array is chosen as the threshold. This process is explained in Figure 6. The threshold percentage of 0.16 was found through experimentation trial and error feedback.

(a) (b) Figure 5. (a) Original drusen (red) and background histograms (blue) (b) Histograms after filtering

Proc. SPIE Vol. 4681

163

The pixels that fall above the threshold are considered as 600 having a high probability of belonging to a drusen and are feed into a region-growing algorithm. The diagram found in Figure 500 7 will be used to explain the growing algorithm based off of Savol’s adaptive object growing algorithm [’I. The diagram 400 number for a routine will be placed at the end of each appropriate sentence. Within a region the 8-neighbor local 300 maxima, which fall above the threshold, are tagged as seed points (1). The seed with the highest intensity is considered 200 first and is assigned to drusen cluster one ( 2 ) . The seed’s 8 neighboring pixels are searched for the maximum intensity and 100 it is added to the cluster (3). Taking these two intensities the contrast integral is calculated as I C I [ O ] = ISeed - ICmldidate . 0 10 20 40 50 70 80 Now the pixels neighboring these two pixels are searched for Figure 6. Threshold selection explanation. From the the maximum and the contrast integral histogram only gray levels 10,20:30,50,60, and 80 are I C I [ I ]= - ICm1d,dafe is calculated (4). If represented in the image giving a total of 6 distinct levels. The threshold would be 0.76 * 6 = 4.56 4, so the 5th gray jeCluster,

II

I

I

Czj

level will be the cutoff. In this case the threshold would

ICI[l]- ICI[O]> 0 then the candidate is added to the cluster 1 be 60. and ICI[O] = ICI[l]. This process continues until the stopping condition is satisfied ICI[l]- I C I [ O ] < 0 . This will restart the algorithm to step 3 with the next largest seed as cluster 2 . The cycle stops when there are no more seeds left. The stopping condition is only satisfied when the contrast between the pixels within the cluster and the candidate has decreased from the previous iteration. This condition is consistent with the makeup of a typical drusen shown topographically in Figure 8, where there is a peak and a steady drop-off. Once the bottom of the drusen has been reached the contrast within the cluster is not increasing with each new candidate and the drusen has been segmented.

1.

I Figure 7. Explanation of growing algorithm.

164

Proc. SPIE Vol. 4681

I

194

I

192

I

189

I

Figure 8. Topographical plot of a drusen.

2.6 Manual Controls for User Amended Segmentation. Once the automatic segmentation has finished the grader can improve the results with a series of tools. One of the tools will allow the user to change the initial threshold percentage from 0.76 to a lower or higher value. This ultimately reduces or enlarges the amount of drusen considered by the growing algorithm, needed if there are unusually high amounts of drusen or if undetected artifacts exist. Point and click options to add or remove segmented areas not associated with a drusen such as a false positive choroid or adding a subtle drusen not found by the algorithm. The average time for editing a result is 5 minutes which when used can drastically improve the segmentation result.

3. RESULTS 3.1 Comparison of Computer-Aided Segmentation to Manual Analysis of WARMGS images. The eighteen WARMGS images were segmented with the default automatic setting. An ophthalmic specialist then used the edit tools to further amend the results. We will discuss the results of the comparison between the automatic and the user-amended segmentations to the ground truth analysis in the WARMGS grade database. The size measure for the largest drusen in each of the nine grid regions will be compared. The resulting lesions from the segmentation program were manually measured according to the WARMGS protocol. While the protocol uses a distinct set of measuring circles to determine the size as discussed in section 2.1, the program yields an actual size in microns. The measured value will fall into a range of the circle diameter size categories (-43 micron, 63pm

Suggest Documents