Histological image retrieval based on semantic content ... - IEEE Xplore

3 downloads 72 Views 875KB Size Report
Content Analysis. H. Lilian Tang, Rudolf Hanka, Member, IEEE, and Horace H. S. Ip, Member, IEEE. Abstract—The demand for automatically recognizing and re-.
26

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

Histological Image Retrieval Based on Semantic Content Analysis H. Lilian Tang, Rudolf Hanka, Member, IEEE, and Horace H. S. Ip, Member, IEEE

Abstract—The demand for automatically recognizing and retrieving medical images for screening, reference, and management is growing faster than ever. In this paper, we present an intelligent content-based image retrieval system called I-Browse, which integrates both iconic and semantic content for histological image analysis. The I-Browse system combines low-level image processing technology with high-level semantic analysis of medical image content through different processing modules in the proposed system architecture. Similarity measures are proposed and their performance is evaluated. Furthermore, as a byproduct of semantic analysis, I-Browse allows textual annotations to be generated for unknown images. As an image browser, apart from retrieving images by image example, it also supports query by natural language. Index Terms—Histological image analysis, image annotation, medical image database, semantic analysis.

I. INTRODUCTION

M

EDICAL images play a central role in patient diagnosis, therapy, surgical planning, medical reference, and training. The development of systems for diagnosing, screening, archiving, and annotating based on automatic analysis of medical images are recurring research topics. In particular, with the advent of digital imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and single-photon emission computerized tomography (SPECT) as well as images digitized from conventional devices such as histological slides and X-rays, collections of medical images are increasingly being held in digital form. How to build up medical databases and effectively use these sophisticated data for efficient clinical applications is a challenging research issue. In a conventional medical image database, most of the indexing and retrieval operations have been based on the patient identity, date, and types of examination, image number or other information contained in the image record. However, the information at a higher level of abstraction, inherent in the images, is far different from the kinds of representations that are suitable for textual information. Moreover, as the use of multimedia in healthcare extends, more information could be utilized if image databases can be organized and retrieved based on image content, especially at the semantic level. This would Manuscript received July 2, 2001. This work was supported by the Hong Kong Jockey Club Charities Trust. H. L. Tang is with the Department of Computing, University of Surrey, Guildford, Surrey GU2 7XH, U.K. (e-mail: [email protected]). R. Hanka was with the Medical Informatics Unit, University of Cambridge, Cambridge CB1 2ES, U.K. (e-mail: [email protected]). H. H. S. Ip is with the Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong (e-mail: [email protected]). Digital Object Identifier 10.1109/TITB.2003.808500

also make possible the fusion and referencing of medical information extracted from different media sources. Research in content-based image retrieval today is a lively discipline, expanding in breadth [1], as the access to visual information is not only performed at a conceptual level, using keywords as in the textual domain, but also at a perceptual level, using objective measurements of visual content [2]. There have been a large number of special issues dedicated to the topics of content-based indexing and retrieval in recent years by many leading journals. Content-based image indexing and retrieval have commonly been based upon nonsemantic approaches employing primitive image information like texture [3], shape [4]–[6], color [7], spatial relationships [8], or mixtures of these features [9]–[11] to facilitate the retrieval process. However, in many domain specific applications, like medical image databases, the semantic content is more desirable because it facilitates the high-level application tasks [12]. One key issue to be faced is the identification and extraction of semantic information from the visual data. One approach to solving this problem is to associate high-level semantic information with low-level visual data. Several systems that attempt to bridge this information gap can be found in [13]–[17]. A comprehensive survey by Smeulders et al. on content-based retrieval, the role of the semantic gap and the challenges for such research can be found in [1]. The work presented in this paper contributes toward semantic content analysis of medical images. Research on analyzing images from the modalities of X-ray, CT, and MRI, etc., are numerous and active. An example like the ASSERT system that is under development in Purdue University [18] mainly focuses on the CT images of the lung. It is based on a human-in-the-loop design in which a physician delineates the pathology bearing regions and a set of anatomical landmarks in the image for the computer to extract attributes. A review on content-based indexing for medical images can be found in [19]. Compared with other types of medical images, histological image analysis is relatively rare. This is partly due to the digital format of such data being relatively difficult to obtain. But mostly it is because histological images are far more complicated and diverse than other types of images in terms of, for example, the variations of colors, object appearances, and semantic interpretations at different magnifications. By focusing on a particular feature some successful results were reported by Schnorrenberg et al. [20] for breast cancer nuclei detection, as well as Hamilton et al. [21] on distinguishing normal and abnormal histological tissues. Schnorrenberg reported a 83% nucleus detection sensitivity compared to the experts’ result while Hamilton also claimed about 83% of test images were correctly classified as normal and abnormal tissues. Further research on a

1089-7771/03$17.00 © 2003 IEEE

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS

wider range of images and problems is still needed and it is still a long way before we see the launch of fully automated systems to assist clinical work. Recently, there is an ongoing project [22] developed in the University of Pittsburgh School of Medicine building up a tissue banking information system, and a visual hospital project based on breast histological image analysis in QinetiQ in the U.K. [23]. This paper describes an approach and the techniques for semantic content-based retrieval of histological images. The resulting prototype system called I-Browse, not only enables a user, e.g., a physician, to search over image archives through a combination of iconic and semantic contents, but it also automatically generates textual annotations for input images. As a summary, I-Browse satisfies the following objectives: 1) to find similar images from the archive by image example in terms of visual similarity; 2) to interpret visual properties as histological features in a similar way to doctors; 3) to generate textual annotations for unknown images; 4) to find similar images from the archive by image example in terms of histological or semantic similarity; 5) to retrieve images using natural language queries; 6) to act as a postman or a classifier, i.e., put an unknown image into a correct “pigeon hole,” i.e., tells where the image is taken from along the gastrointestinal (GI) tract. Histological images, like other types of medical images, frequently give rise to ambiguity in interpretation and in diagnosis. Medical images derived from a specific organ are similar visually and usually differ only in small details but such subtle differences may be of pathological significance. Our claim is that current content-based image retrieval techniques using the primitive image characteristics such as texture, colors, and shapes, are insufficient for medical images. A major goal of the work reported in this paper is to demonstrate that semantic analysis of the image content plays a critical role in the understanding and retrieval of medical images. The techniques presented in this work can potentially be generalized for analyzing different types of complex images through the integration of both low-level image analysis and high-level semantic reasoning. Unlike other systems that normally look at several features in on a single organ, in this work we focused on a range of histological images originating from six organs along the GI tract. The GI Tract is essentially a muscular tract lined by a mucous membrane, which exhibits regional variations in structure reflecting the changing functions of the system from the mouth to the anus as shown in Fig. 1, which is adapted from [24, pp. 248]. We mainly aim at six areas along the tract, i.e., esophagus, stomach, small intestine (small bowel), large intestine (large bowel), appendix, and anus. The rest of this paper is organized as follows. In Section II, we introduce the definition of the semantic features and histological labeling. In Section III, we review the I-Browse architecture, and its major functional components. The processing cycle for iconic and semantic analysis is presented in Sections IV and V respectively. In Section VI, retrieval based on semantic content and the associated similarity measures are introduced together with an evaluation of the approach. Finally, we conclude the paper in Section VII.

27

(a)

(b) Fig. 1. Diagram of GI Tract. (a) Six organs along the GI tract. (b) General structure of the tract.

II. DEFINING THE SEMANTIC FEATURES HISTOLOGICAL IMAGES

OF

The semantic interpretation ability in I-Browse means that the system is able to achieve the objectives 2–6 stated in Section I. For this reason, first of all we define what semantic units or salient histological features need to be automatically extracted from the image through a series of image analysis operations in the retrieval system. In consultation with histopathologists, we defined two sets of relevant histological features in GI tract images, which are also called histologically labels or semantic labels. These labels become the basic semantic units for producing high-level information in the system. In this research, we focus on the images digitized at a magnification of 50. A human expert can identify most of the useful histological features at this resolution. More than 1500 images were digitized in our collection. Under a given magnification the histopathologist may see several levels of features. As shown in the schematic diagram in Fig. 1 (b) and the actual histological images in Fig. 2(a)–(c), the GI canal is identified by its tubular nature and the division of its wall into five distinct layers, namely, Lumen (L), Mucosa (M) including Muscularis Mucosae (MM), Submucosa (S, or SubM), Muscularis

28

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

TABLE I (a) COARSE FEATURE LABELS (b) PART OF THE 63 FINE FEATURES LABELS

Fig. 2. Image examples from GI tract. (a) small intestine, (b) stomach, (c) small intestine, and (d) image features examples at coarse and fine levels.

Externa (E or ME), and Serosa or Adventitia (A). For example, images in Fig. 2(a) and (b), which respectively are taken from the small intestine and stomach, demonstrate such structure. We call these features Coarse Features, which are applied to all the organs along the GI tract. As seen in the diagram and the sample images in Fig. 2, even if there may be significant visual differences between the images from the GI tract, no matter whether they are from the same or different organs, from the same or different specimens, they all share similar color and cross proportion with respect to other coarse divisions. The perceived spatial arrangement of the coarse feature regions provides an overall structural description of the image content that is important for the later reasoning procedures. We define ten basic coarse regions as given in Table I(a), which include the five regions mentioned above and the junctions between their outer boundaries. There are five extra coarse features in Table I(a), which are derived from some of these ten coarse regions, e.g., X is a combination of A and S, and Z means a feature appears everywhere. This is to facilitate the definition of the fine level of semantic features. The fine level features are used to distinguish different visual appearances within each coarse region. In particular, we focused on the following distinctive differences. 1) The difference of those fine features appearing in different organs like the villi in small intestine in Fig. 2(a) and the fundus glands in stomach in Fig. 2(b). In both images these fine features (villi and fundus glands) belong to their respective same coarse region “Mucosa.” 2) The different appearance of the fine features in the same or different images appearing in the same coarse region from the same organ, e.g., the intestinal glands in Fig. 2(a) (the area where the name itself covers) and the villi region (in oval area). They both belong to the coarse region “Mucosa.” 3) The varied appearances within the same histological features, e.g., the villi region in Fig. 2(a) and villi region in Fig. 2(c). The reason that the villi details in (a) and (c) look so different, even if both images come from the same organ (small intestine), is mainly due to the cutting angle

(a)

(b)

or caused by a twist of the tissues when preparing the slide. In this case, either their coarse structure arrangement [note the coarse feature distribution in Fig. 2(c)], or the fine details in each region, is significantly different. As a result of such analysis, two levels of histological meaningful interpretations of the images are defined: coarse feature level and fine feature levels. The principle of defining fine level features is not just to discriminate objects in the images in terms of histological meanings, but also to discriminate the visual variation for the same objects. For example, the villi in Fig. 2(a) and (c) will be regarded as two kinds of fine features to facilitate a better performance of the feature classifiers. Such features will be regrouped later in the semantic analysis for the purpose of generating the semantic content of the image. A fine feature can be a common feature appearing in many coarse regions in any

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS

29

Fig. 4. System architecture.

Fig. 3.

Process flow for feature sample selection and semantic labeling.

organ, such as blood, or can be a specific feature that only appears in certain regions in an organ like small intestine intestinal glands. Currently we defined the semantics for 15 coarse features and 76 fine feature labels. Among the 76 fine features labels, in practice, it is sufficient to adopt 63 of them in the system. Coarse feature definition is shown in Table I(a) and examples of the 63 fine features are given in Table I(b). Examples of the mapping between the visual appearances and their fine and coarse features are given in Fig. 2(d). As a brief summary for the semantic features that we defined, a visual feature can be mapped to different levels of semantic meanings or terms and labels. In this system, each visual feature is mapped to two levels of terms, coarse and fine levels. Any fine level feature could be grouped or subsumed under one of the classes in the coarse level. To facilitate the process of collecting sufficient salient visual feature samples relating to the histological meanings at both coarse and fine levels, a specially designed knowledge elicitation subsystem was developed to enable a histopathologist to interactively assign from a computer interface these histological feature labels (semantic labels) to a large set of the subimages randomly selected from the GI tract image collection (Fig. 3, the computer interface on the left). Fig. 2(d) shows some of the subimage samples selected by a histopathologist, which have been mapped to the corresponding semantic labels. These associations of histological labels with subimages depicting the various visual characteristics of the labels were then taken and stored as the ground truth and formed the initial set of training samples and testing data for designing the algorithms for feature extraction and the semantic label classifiers for the subsequent image recognition processes. There were in total 9049 subimage samples extracted from 274 images that were randomly selected from about 1500 images in the collection. Among these subimage samples, 2737 were used for training. The rest were divided into two sets for testing. Fig. 3 shows the knowledge elicitation module and the process flow of analyzing the visual and semantic features in unknown images. Details will be presented later. III. I-BROWSE ARCHITECTURE AND PROCESSING CYCLE To achieve the objectives for I-Browse, in particular to support the extraction and fusion of iconic and semantic information from histological images, we proposed the I-Browse archi-

tecture. It is composed of a set of disparate but complementary building blocks as shown in Fig. 4; the visual feature detector, domain knowledge base (KB), index constructor, image database, semantic analyzer (SA), annotation generator and free text analyzer. The semantic analyzer and the visual feature detectors are the most essential and computationally intensive blocks in the system. The relationships and the workflow between them are shown in Fig. 3. After the associations of histological meanings at two of the levels with the visual features found on a subimage, two corresponding visual detectors were designed to extract similar visual features for unknown images. We call them coarse detectors and semifine detectors. In the visual feature detector building block of the I-Browse architecture, there are actually three types of detectors: 1) coarse detectors; 2) semifine detectors; 3) fine detectors. The first two are for general purpose and used to identify all possible features in an unknown image either at a coarse or a fine level. Presently the semifine feature detectors implemented in I-Browse are mainly detectors that capture the textural content of the images and classify the subimages in unknown images into those fine histological labels that have been defined in Section II. Although the detectors are designed to recognize the fine level histological features in the images, we only call them “semifine” detectors as they serve to produce an initial semantic labeling of subimages that may need to be further confirmed by a subsequent semantic analysis process. As mentioned in Section II, since any fine feature can be grouped under a coarse level feature, the result from the semifine detectors can be mapped to histological labels at the coarse level. On the other hand, for coarse features, as seen in Fig. 2, since color characteristics in stained tissue images are prominent within these coarse structures, color histogram measurements were also used in conjunction with texture measurements to differentiate distinctive coarse regions. So at the coarse feature level, there are two independent sets of feature classification results based on the subimage texture and color characteristics. The semantic analyzer serves to improve the accuracy of the semantic labeling by identifying potential incorrect assignments of histological labels to subimages by the visual detectors. This is achieved through a contextual analysis of these labels in concert with the relevant histological knowledge. There is an iteration loop between the visual feature detector and the

30

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

TABLE II LIST OF SPECIALIZED FINE FEATURE DETECTORS THAT CAN BE INVOKED THROUGH THE SEMANTIC ANALYZER

Fig. 5. Initial label map superimposed on a tissue image.

semantic analyzer with feedback information. Furthermore, the semantic analyzer also triggers specialized fine feature detectors (the third type of the visual detector) designed to confirm or refute uncertain labels. Currently, 20 such primitive fine feature detectors were implemented in the system as listed in Table II. Explanations on the use of these fine detectors are given in Section V-B. When a query image is submitted or during the population process of image database, the input image is first partitioned into a two-dimensional (2-D) array of subimages, the coarse and fine features for each subimage are recognized by the coarse and semifine detectors. The initial result for the analyzed image is three 2-D arrays of semantic labels, two of them at coarse level and one at fine level. The parallel results from different detectors can provide the semantic analyzer cues to find the potentially erroneous detections. An example of these semantic labels superimposed on an input tissue image can be found in Fig. 5. With these label arrays, the semantic analyzer iteratively analyzes and corrects the labels according to the histological context in the knowledge base and may produce a set of hypotheses on the labels associated with subimages, if those labels are deemed erroneously detected by the coarse and semifine detectors. Based on the hypotheses, a number of fine feature detectors are invoked

to extract and confirm the visual features within the suspected regions. This analysis and-detection cycle iterates until the semantic analyzer finds a coherent result and no further change is needed. Details on visual feature detector and semantic analyzer are described in Sections IV and V. The final label map is then used to construct the semantic content representation structure, Papillon, which will be used to generate the textual annotation for the image in the database. “Papillon” is the codename for the internal semantic content representation used in I-Browse. It bridges information from different media (image and text), linking together the semantic analyzer, Free Text Analyzer, and the Annotation Generator. In this system, the semantic content of an analyzed image is represented in Papillon. On the other hand, when the query is made by natural language, the Free Text Analyzer will extract the information in the query and convert it into Papillon. Therefore, when the query is made either by image or text, their semantic content in Papillon will be used for the retrieval. Details about Papillon content, Free Text Analyzer, and Annotation Generator are outside the scope of this paper and can be found in [25]. The system is written in the C++ programming language with a five-tier architecture [26] that allows modules in different tiers to be developed independently. It is integrated to an existing patient diagnosis and documentation system, Pathos, which is extensively used in the pathology departments of the hospitals where our medical collaborators are from. Pathos is an electronic patient record system for recording patients’ pathological examination results, data, and diagnosis. I-Browse consists of two front-end interfaces called PIMS and Retrieval, respectively, for inputting and retrieving images. The generated annotation of the input image is stored in a Database Management System DB2, through the PIMS interface. The index constructor creates iconic and semantic index content respectively from Visual Feature Detector and Papillon. The two types of index content serve different kinds of queries. The user may retrieve the desired image by inputting a query image or query text and selecting either the semantic or the iconic similarity measurement.

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS

IV. ICONIC ANALYSIS OF VISUAL APPEARANCES The histological slides were obtained from the past eight years patient’s record at a local hospital. From these slides the histological images were captured under a high-resolution Leica microscope at 50 magnification. The image resolution was set to the microscope maximum value of 4491 3480 pixels during capture process and then sampled down to 1123 870 pixels. We found that, through experiment, down-sampling the original image by a factor of four still retains sufficient pixels for window-based color and texture analysis and significantly reduces the computational load. When the 1123 870 down-sampled image is divided into 64 64 subimages, we get 17 13 complete subimages. In other words, we only take those pixels that belong to complete subimage squares in the subsequent analysis process and ignore the small part around the four boundaries of the image. However, this does not mean that the analysis will miss the features around the boundaries. Since the images were taken from patients’ specimens, the specimen on one glass slide normally can produce many images. The digitization procedure allows some overlapping between images. Therefore the area around the boundaries of one image may appear in other images and be analyzed. We choose 64 64 subimages as the basic processing unit because through experimentation we found that it is appropriate for the coarse and semifine feature detections based on Gabor filters. For each subimage, coarse feature detectors examine the normalized color and gray-level histograms of the subimages. The rotation invariant histogram features of each subimage are passed to a three-layer neural network which assigns coarse feature labels to the subimages. These can be any one of the 15 coarse histological features as defined in Table I. In addition, another coarse label result can be obtained from the semifine detectors whose fine feature detection result can be mapped into one of the coarse features. The semifine detector extracts texture measurements that are based on a set of Gabor features, as well as the gray level mean and standard deviations of the histogram normalized subimage. These semifine Gabor and statistical features detections based on multiple-size windows [27] are also applied to the subimages. Except for the subimages corresponding to the feature boundaries, whose semifine features are computed from the original 64 64 window size, the features of the other subimages are computed in two different window sizes, 64 64 and 128 128. In our research, this multiple window approach has been demonstrated to increase the accuracy rate of the histological feature classification compared with the single window approach [27]. The accuracy of this initial assignment of histological labels to the test subimage sets ranges from 40% to 82% depending on the types of histological features. Details of the design of these feature detectors can be found in [27]. Fig. 5 shows an example of the histological labels assigned to the subimages in an analyzed image. We called this labeling result a label map. This label map defines the semantic content and the spatial relationship of the various histological features found on the image. The two letters and number in each subimage, respectively, represent two coarse feature results and the fine fea-

31

ture result for that sub-image. In Fig. 5, those subimages manually marked “*” are erroneously assigned labels at the fine feature level, and the subimages with “$” are examples where both coarse detectors assigning incorrect labels. Such mistakes are understandable as the feature detectors have been designed to discriminate many types of features and each feature can actually have many varied visual appearances. The variations are mainly caused by factors such as tissue cutting angle, tissue thickness, tissue deterioration, tissue orientation on the slide, slide preparation defect, patient disease, individual difference, digitization set up and staining, etc. However, many of these erroneous labels can be corrected subsequently through the cycle of analysis-and-detection by the semantic analyzer with the help of the knowledge base and fine feature detectors. V. SEMANTIC ANALYSIS OF HISTOLOGICAL LABELS A. Semantic Reasoning, Confusion Matrix, and Contextual Knowledge The semantic analyzer is initially presented with three matrices (the label map) produced from the coarse and semi-fine feature detectors and has to identify potentially erroneous labels and to correct for them. The most obvious cue for the semantic analyzer is that the labels obtained from different visual detectors may conflict with each other and the labels produced from the same detector may also conflict with each other. In Fig. 5, the fine detector identifies some regions as appendix mucosa (represented as 13), stomach fundus glands (55), small intestine intestinal glands (46), anus epithelium (2), etc., which we know are impossible histologically to appear in the same image. In another area, the color histogram suggests that they are mucosa while the Gabor filter identified them as lumen. The semantic analyzer must, therefore, reconcile mutually conflicting classifications. Given the label maps which contain possible erroneous labels, the semantic analyzer aims to do the following. 1) Improve the accuracy of the recognition results using high-level histological and contextual knowledge. It may also invoke fine detectors to confirm its reasoning hypotheses. 2) Analyze the semantic content of the whole image and recording the semantic content in Papillon, which forms the basis for generating a textual annotation for the image. The analysis or reasoning process in the semantic analyzer is closely correlated to the content in the knowledge base, which includes the prior knowledge of all semantic features at the two levels about their legitimate visual and histological contextual attributes. Especially, the information generated from the confusion matrix is also recorded and updated in the knowledge base. Confusion matrices which recorded performance statistics are computed from the training and testing procedures as well as from previous performance of the detectors on data. This gives us information on how much any two detected features are similar to each other. For instance, among the 90 cases of detection for adipose tissue, the detector recognized it as adipose tissue only three times. For the rest of the cases, the detector recognized it as hair follicle once, as connective tissue twice, as lumen 19 times, as muscularis mucoase three times,

32

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

and ten times as the junction between lumen and stomach foveolae, and so on. Such information is then summarized into the knowledge base, either at coarse level or fine level, a set of possible confusion candidates based on the relationships of these confused features in descending order according to the similar percentage, describing how similar any two features are, or how many and how much other features are similar to the given feature. The system refines the knowledge base through the classification process as the confusion matrices are updated. When there are more than one detector for recognizing one type of features, in our system the case is for detecting the features at coarse level, the semantic analyzer forms its own primitive label matrix according to the relative accuracy of the different detectors for a given class as recorded in the confusion matrixes for the detectors. The analysis starts from those subimages, which have a high probability of being correct such as Lumen, and gradually expands the scope of analysis based on the previous analyzed labels. When reasoning using contextual knowledge, the confusion candidates are compared and chosen for correcting the wrong labels if they are more coherent with the context. The analyzer estimates which organ along the GI tract that the image originated from. This estimate gives a list of features that are histologically consistent with this organ. If any feature label that is inconsistent with this organ are found on the label map, another feature which is consistent with this organ is chosen from its confusion candidates. If this fails, it chooses the next most similar confusion candidate which can appear in any organ and is coherent with the context. If all these fail, i.e., cannot find the above two kinds of similar features, then change the label to the majority label among its neighbors. There are many different operations involved in the analysis procedure, e.g.: 1) neighbor analysis and region analysis; 2) region searching and grouping; 3) finding a canonical axis of the image content which defines how the image should be viewed. This information forms the basic orientation of the image which gives the reference for describing spatial relationships; 4) finding boundaries of histological objects; and 5) spatial relationship analysis, etc. This is an iterative process where the subimages interact with, support or refute each other, and finally come up with a more coherent and consistent label matrix. We call the above procedure semantic reasoning. B. Analysis and Detection During the reasoning, when the semantic analyzer needs to confirm certain features in the region, it invokes one or more fine detectors to carry out specific analysis. To do this, the semantic analyzer generates a list of feature hypotheses, which happens when we have the following conditions. 1) A confusion candidate is chosen from the knowledge base. For example, if the semantic analyzer has identified that the image comes from esophagus, but in an area the subimage was detected as appendix muscularis externa. From appendix muscularis externa’s confusion candidate list, esophagus epithelium is the next similar feature to the appendix muscularis extern and it is coherent with the context. The semantic analyzer will change the label

to esophagus epithelium tentatively and then trigger fine detector Oesophagus: epithelium (no. 10 in Table II). 2) No suitable confusion candidate is available from the knowledge base. 3) The semantic analyzer has very weak confidence about the confusion candidates. To judge whether a candidate is weak or is not is based on the similar percentage that is calculated from the confusion matrices. In the case of 2) and 3), the semantic analyzer trigger more fine detectors according the context including the fine detector for the analyzed feature itself. 4) The semantic analyzer expects certain features should be present but they have not been recognized within the label map. This happens when two features normally appear together while only one has been confirmed. The fine feature detectors are a set of detectors specially designed to examine the visual properties of particular fine features that may need to be further confirmed. The design principle of these detectors is based on spatial features such as shape, structure, and spatial relationship. A set of morphological parameters such as shape, contour, distance to the lumen, neighbor configuration, etc. were used in these fine detectors. The accuracy rates of such detectors are high, but they require much more computation than the statistical coarse and semifine feature detections and, hence, should only be invoked on-demand. At present, 12 main fine feature detectors, which is subdivided into 20 fine feature detectors as listed in Table II, have been implemented. When invoking a specialized feature detector, the semantic analyzer passes at the same time a region of interest (ROI) to the detector which subsequently returns a confidence value to the semantic analyzer. In the end, the semantic analyzer compares the returned confidence values from the different detectors to verify the hypotheses. This process of analysis and detection may go through several iterations before coming up with a stable result. As an example, when the semantic analyzer hypothesizes that the region is probably an anus epithelium, it will ask the anus epithelium detector to compute the confidence value of the hypothesis. The detector will start from a Gaussian filtered image and combine the ROI to form a binary image according to a preset threshold value. Based on the binary image, the detector extracts every isolated island (groups of connected pixels) and examines its color content, neighbor intensity, size, the distance to the lumen as well as its boundary to the lumen. All these parameters have different weighting to the confidence value. After the computation, it passes the confidence value back to the semantic analyzer. Since the semantic analyzer may request several detectors to verify the hypotheses at the same time, in this case, the semantic analyzer will select the one with highest confident value. The fine feature detectors were developed using pattern recognition techniques that are suited for the particular image features. As shown in Fig. 6, the fine detector can extract most of the Anus Epithelium which is indicated with letter “A” (light color). The amount of missing Anus Epithelium area indicated by letter “B” (dark color) is acceptable when compared to the whole epithelium layer. On the other hand, the detector rules out those areas not belonging to the anus epithelium.

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS

33

Fig. 6. Extracted Anus Keratinised Squamous Epithelium (A) and missing area (B) by fine feature detector.

This detector was designed using five characteristics, which are: 1) distance of the epithelium layer to the Lumen; 2) the epithelium intensity; 3) the intensity of the epithelium layer at the Lumen boundary; 4) the tissue intensity near to the epithelium layer (excluding the Lumen area); and 5) the edge magnitude at the boundary of the epithelium layer with other tissues except the Lumen. To demonstrate the efficiency of the Anus Epithelium detector, we ran an experiment based on a small database containing 90 images, the accuracy rate for identifying anus epithelium was 76.9%. It should be noted that these specialized detectors are not applied to detect the features at the start of analyzing an image. This is not only because they are normally time-consuming, but also strategically it is not good to generally apply many specialized detectors at the beginning to an unknown image given the images in the large collection may vary greatly.

Fig. 7.

Final label map for the image in Fig. 5 after semantic analysis.

Fig. 8. Sample image to illustrate some key histological features.

C. Final Label Map and Semantic Content The semantic analyzer finally delivers two types of output, one is the final label map, and the other is the semantic content as represented in Papillon. Fig. 7 shows the final label map result superimposed on the same image in Fig. 5. Fig. 9 shows a section of the automatically generated Papillon structure for the semantic content of the image in Fig. 8, where key features recognized in Papillon are illustrated. In this example, the semantic content is represented in a forest structure, holding the relationships between features and information about features, like where the image comes from (appendix in this example), color, shape, size, quantity, spatial relationships, and locations of the features, the axis, and about how to view the image, etc. The semantic content of the whole image information is much more exhaustive than the sample content shown here. D. Evaluation of the Semantic Reasoning Approach To evaluate the advantage of using semantic reasoning to supplement visual content analysis in tissue image interpretation, 2957 histological subimages or feature units were selected at random from the images and we examined how these subimages were classified into 63 fine features. By comparing the subimages before reasoning and after reasoning, there were 1042 features successfully corrected by the semantic analyzer from the total of 2957 features, which means the accuracy of the whole image feature extraction is improved by 35.2%. 1663 features, about 56.2% remained correct; 205 features, about 6.9% remained wrong, and 47 features, about 1.6% changed from being correct into incorrect.

Fig. 9. Part of the Papillon structure generated by the semantic analyzer for the sample image shown in Fig. 8.

Since the semantic analysis is based on the results from other engines, the performance of the semantic analyzer can be improved incrementally by: 1) implementing more fine feature detectors; 2) improving the performance of those existing visual feature detectors; and 3) improving the descriptive power of the knowledge base by adding more knowledge details for certain features. VI. USING SEMANTIC CONTENT IN IMAGE RETRIEVAL Three associated similarity measurement were designed to compare most frequent semantic labels, local neighbor pattern of semantic labels and semantic label frequency distribution. Most frequent semantic label (MFSL) is based on the 15 coarse

34

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

histological regions of the image, which in fact roughly describes how the coarse features like Lumen, Mucosa, Submucosa, junctions, etc., are distributed in the image. MFSL is defined as below, where is a constant equal to 0.75 in this experiment. Similarity For all windows do winSimilarity Sort coarse labels of retrieved images by frequency. For each coarse label of the retrieved image if coarse label is the same as the most frequent coarse region of the query image winSimilarity winSimilarity End For Loop winSimilarity Similarity End For Loop MFSL Similarity number of windows Neighborhood Similarity (NS) uses a matrix to record the co-occurrence frequencies of the 63 histological labels of the eight nearest neighbors against those of the center subimage. of the matrix records how many times the Each element center subimage being histological label while the histological label of any one neighbor is equal to . Since there are 63 histological labels employed in the system, the matrix size is 63 63 (1) (2)

similarity similarity

(3)

and are the numbers of subimages labeled as Where Lumen and the total number of subimages respectively in an is the total number of fine histological labels, image; is the number of subimages which have eight connected neighand are the scaling factors to eliminate the bors. Finally, influence of the number of Lumen in the query and retrieved images, respectively. The scaling factor is included in the equation in order to reduce the influence of the noncharacteristic Lumen feature in any image. Semantic label frequency distribution similarity (SLDS) directly counts the frequency of the 63 fine histological labels occurring in the image. For each image, the system only needs to compare the 63 entries, therefore the computation time is much shorter than the NS. SLDS is defined as follows: (4)

similarity SLDS

similarity

(5)

Fig. 10.

Retrieval interface of the I-Browse system.

Where and are, respectively, the frequencies of fine histological label for the query and retrieved images, and have the same meaning as in the NS measure. An experiment was carried out to test the accuracy of these measurement. The accuracy rates using NS and SLDS are, respectively, 78% and 80%, higher than MFSL which actually uses less semantic information in the measurement. The retrieval accuracy rate depends on the accuracy of the histological label detection as well as the design of similarity functions. Another important factor in image retrieval is computation time. Based on a dual Intel PII-450 computer, the computation time of label analysis is 5.32 s. The similarity comparison time of each image to another is, respectively, equal to 6 ms, 0.23 s, 4.3 ms, for MFSL, NS, and SLDS. It is obvious that SLDS so far is the best choice among all current similarity measures. As well as the analyses discussed in the paper, in order to demonstrate the advantage of integrating semantics into the system, we also compared I-Browse with the QBIC system developed by IBM [9], which was designed as a general content-based retrieval system. We used a collection of 19 890 subimages with 2873 from anus, 3536 from appendix, 3094 from large intestine, 3536 from esophagus, 3757 from small intestine, and 3094 from stomach. In I-Browse NS and SLDS measures were used while in QBIC Color, Texture, and Spatial measures were chosen in this experiment. The accuracy rate of the similarity measures in I-Browse on the average can be up to 34.3% higher than the three measures in QBIC within the first five retrieved images. I-Browse performed better than QBIC in this benchmark test, and we attribute this primarily to the use of domain knowledge to improve the semantic interpretation. The Retrieval interface is shown in Fig. 10, where a query image of Stomach is submitted and displayed at the left top corner of the figure. The top five images most similar to the query image are displayed on the right hand column of the window. The most similar image except the query image itself is placed at the bottom left corner in Fig. 10. Moreover, if the user moves the mouse to any subimage, a small yellow textual annotation tag will be shown next to the subimage describing this subimage (not displayed in Fig. 10). If the right mouse button is clicked, a new textual annotation window that

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS

describes the content of the whole image will be displayed in the interface. It is placed at the right top corner in Fig. 10. VII. CONCLUSION The major contributions in the I-Browse prototype are: the association of semantic meanings and visual properties; the integration of syntactic visual analysis and semantic reasoning in connection with the contextual knowledge; and the design of Papillon which fuses the information in different media to facilitate intelligent application tasks. As a long term target, we aim to further develop this approach so that we can automatically classify a wider range of histological images, identify normal and abnormal images, and ultimately use it as an automatic screening tool. It has also been recognized by the consultant pathologists that the annotation functionality provided in I-Browse is useful as a doctors’ reference and reminder even now. Although I-Browse is a specialized system for GI tract images, the architecture and the analysis engines in the system have been designed so that they can be generalized, not only for other types of histological images. As the general system architecture of I-Browse is domain independent, to apply the approach to other applications, we can simply follow the design procedure of identifying a set of meaningful semantic labels for the particular domain and then train the appropriate set of feature detectors to identify the semantic labels in images. Semantic analysis is then carried out according to the set of domain specific rules. We are currently planning to extend the approach to a diversity of other application domains including an art gallery of oriental paintings, and a library of herbarium specimens. ACKNOWLEDGMENT The authors thank their medical collaborators: Dr. K. C. Lee, Consultant Pathologist and Chief of Service, Princess Margaret Hospital, Hong Kong; Dr. E. Sims, Consultant Pathologist, Royal Bolton Hospital, Bolton, U.K., and Dr. J. Rashbass, Consultant Pathologist, Addenbrooke’s Hospital, Cambridge, U.K. The authors thank the cooperation of Dr. R. Lam and Dr. K. Cheung in developing the visual feature detectors and the software framework for this work. REFERENCES [1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based image retrieval at the end of the early years,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp. 1348–1381, Dec. 2000. [2] C. Colombo, A. Del Bimbo, and P. Pala, “Semantics in visual information retrieval,” IEEE Multimedia, pp. 38–53, July–Sept. 1999. [3] S. R. Fountains and T. N. Tan, “Efficient rotation invariant texture features for content-based image retrieval,” Pattern Recogn., vol. 31, no. 11, pp. 1725–1732, 1998. [4] R. Singh and N. P. Papanikolopoulos, “Planar shape recognition by shape morphing,” Pattern Recogn., vol. 33, pp. 1683–1699, 2000. [5] A. Del Bimbo, M. De Marsico, S. Levialdi, and G. Peritore, “Query by dialog: An interactive approach to pictorial querying,” Image Vision Comput., vol. 16, pp. 557–569, 1998. [6] D.-H. Lee and H.-J. Kim, “Fast content-based indexing and retrieval technique by the shape information in large image database,” J. Syst. Soft., vol. 56, no. 2, pp. 165–182, Mar. 2001. [7] T. Geversa and A. W. M. Smeulders, “Content-based image retrieval by viewpoint-invariant color indexing,” Image Vision Comput., vol. 17, pp. 475–488, 1999.

35

[8] A. F. Abate, M. Nappi, G. Tortora, and M. Tucci, “IME: An image management environment with content-based access,” Image Vision Comput., vol. 17, pp. 967–980, 1999. [9] IBM Ultimedia Manager System [Online]. Available: http: //www/qbic.almaden.ibm.com/~qbic/qbic.html [10] J. Laaksonen, M. Koskela, S. Laakso, and E. Oja, “PicSOM: Contentbased image retrieval with self-organizing maps,” Pattern Recogn. Lett., vol. 21, pp. 1199–1207, 2000. [11] M. S. Kankanhalli, B. M. Mehtre, and H. Y. Huang, “Color and spatial feature for content-based image retrieval,” Pattern Recogn. Lett., vol. 20, pp. 109–118, 1999. [12] A. Martinez and J. R. Serra, “Semantic access to a database of images: An approach to object-related image retrieval,” in Proc. 1999 6th Int. Conf. Multimedia Comput. Syst., vol. 1, 1999, pp. 624–629. [13] J. M. Corridoni, A. Del Bimbo, and E. Vicario, “Image retrieval by color semantics with incomplete knowledge,” J. Amer. Soc. Inform. Sci., vol. 49, no. 3, pp. 267–282, Mar. 1998. [14] G.-H. Cha and C.-W Chung, “Indexing and retrieval mechanism for complex similarity queries in image databases,” J. Visual Commun. Image Representation, vol. 10, no. 3, pp. 268–290, 1999. [15] A. Jaimes and S.-F. Chang, “Conceptual framework for indexing visual information at multiple levels,” in Proc. SPIE, vol. 3964, San Jose, CA, Jan. 28, 2000, pp. 2–15. [16] A. Vailaya, A. Jain, and H. Zhang, “On image classification: City images vs landscapes,” Pattern Recog., vol. 31, no. 12, pp. 1921–1935, 1998. [17] W. Al-Khatib, Y. F. Day, A. Ghafoor, and P. B. Bruce, “Semantic modeling and knowledge representation in multimedia databases,” IEEE Trans. Knowledge Data Eng., vol. 11, pp. 64–80, Jan.–Feb. 1999. [18] C. R. Shyu, C. E. Brodley, A. C. Kak, A. Kosaka, A. Aisen, and L. Broderick, “Local versus global features for content-based image retrieval,” in Proc. IEEE Workshop Content-Based Access Image Video Libraries, 1998, pp. 30–34. [19] L. H. Y. Tang, R. Hanka, and H. H. S. Ip, “A review of intelligent contentbased indexing and browsing of medical images,” Health Inform. J., vol. 5, no. 1, pp. 40–49, Mar. 1999. [20] F. Schnorrenberg, C. S. Pattichis, K. C. Kyriacou, and C. N. Schizas, “Computer-aided detection of breast cancer nuclei,” IEEE Trans. Inform. Technol. Biomed., vol. 1, pp. 128–140, June 1997. [21] P. W. Hamilton, P. H. Bartels, D. Thompson, N. H. Anderson, R. Montironi, and J. M. Sloan, “Automated location of dysplastic fields in colorectal histology using image texture analysis,” J. Pathology, vol. 182, pp. 68–75, 1997. [22] (2002, July 31st). [Online]. Available: http://path.upmc.edu/cpi/cpires.html#TIS [23] M. J. Varga and P. G. Ducksbury, “Application of content-based image compression to telepathology,” in Proc. SPIE, vol. 4681, San Diego, CA, Feb. 23–28, 2002. [24] P. R. Wheater, H. G. Burkitt, and V. G. Daniels, Functional Histology. London, U.K.: Churchill Livingstone, 1993, sec. 14. [25] L. H. Tang, “Semantic Analysis of Image Content for Intelligent Retrieval and Automatic Annotation of Medical Images,” Ph.D. dissertation, Univ. Cambridge, U.K.. [26] K. K. T. Cheung, R. W. K. Lam, H. H. S. Ip, R. Hanka, L. H. Y. Tang, and G. Fuller, “An object-oriented framework for content-based image retrieval based on 5-tier architecture,” in Proc. Asia-Pacific Software Eng. Conf. 99, Takamatsu, Japan, Dec. 7–10, 1999, pp. 174–177. [27] R. W. K. Lam, H. H. S. Ip, K. K. T. Cheung, L. H. Y. Tang, and R. Hanka, “A multi-window approach to classify histological features,” in Proc. Int. Conf. Pattern Recognition, vol. 2, Barcelona, Spain, Sept. 2000, pp. 259–262.

H. Lilian Tang received the B.Eng and M.Eng degrees in computer science from Northeastern University, China, in 1989 and 1992, respectively, and the Ph.D. degree in medical informatics from the University of Cambridge, Cambridge, U.K., in 2000. She is currently a Lecturer in the Department of Computing, University of Surrey, Surrey, U.K. Her research interests include intelligent multimedia information retrieval, natural language processing, machine translation, medical informatics, and contentbased image retrieval.

36

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

Rudolf Hanka (M’70) was born in Prague, Czechoslovakia, and received the M.Sc. degree in electrical engineering from the Czech Technocal University (CVUT), Prague, Czech Republic, in 1961, the M.A. degree from the University of Cambridge, U.K., in 1972, and the Ph.D. degree in statistical pattern recognition from the Strathclyde University, U.K., in 1976. He is currently a Visiting Professor of Knowledge Management at the Faculty of Management, University of Economics (VSE), Prague, Czech Republic, and Head of the Medical Informatics Unit and Director of the Centre for Clinical Informatics, University of Cambridge. He has been a Visiting Professor of the First Military Medical University, Guangzhou, China, since 1999. His research interests lie in applications of statistics and computing to medical applications with additional interest in knowledge management. Given this broad range of interests, he has published in a wide cross section of disciplines and application areas including black-box modeling, statistical pattern recognition, image processing, medical diagnosis, and knowledge management. Dr Hanka is Member of the British Computer Society and the New York Academy of Sciences and a Fellow of the Royal Statistical Society and the Royal Society of Medicine.

Horace H. S. Ip (M’91) received the B.Sc. degree (with first-class honors) in applied physics and the Ph.D. degree in image processing from University College, London, U.K., in 1980 and 1983, respectively. Currently, he is the Chair Professor and Head of the Computer Science Department and the founding director of the AIMtech Centre (Centre for Innovative Applications of Internet and Multimedia Technologies) at the City University of Hong Kong, Kowloon, Hong Kong. His research interests include image processing and analysis, pattern recognition, hypermedia computing systems and computer graphics. He is a Member of the Editorial Boards of the Pattern Recognition Journal, The Visual Computer, the International Journal of Multimedia Tools and Applications, the Chinese Journal of CAD and Computer Graphics, and a guest editor for the international journal Real-Time Imaging and Real-Time Systems. He has published more than 120 papers in international journals and conference proceedings. Dr. Ip is a Fellow of the Hong Kong Institution of Engineers (HKIE) and a Fellow of the Institution of Electrical Engineers (IEE), United Kingdom. He serves on the International Association for Pattern Recognition (IAPR) Governing Board and served as founding Co-Chair of its Technical Committee on Multimedia Systems, he is currently the Vice-Chairman of the China Computer Federation, Technical Committee on CAD and Computer Graphics. He was the Chairman of the IEEE (Hong Kong Section) Computer chapter, Council Member of the Hong Kong Computer Society,and the Founding President of the Hong Kong Society for Multimedia and Image Computing.