How to Add Content-based Image Retrieval Capability ... - ICMC - USP

6 downloads 5524 Views 152KB Size Report
the content-based image retrieval (CBIR) functionality over the internet. We will discuss .... Therefore, it is necessary to have a data domain specialist defining a.
How to Add Content-based Image Retrieval Capability in a PACS Josiane M. Bueno,

Fabio Chino,

Agma J.M. Traina,

Caetano Traina Jr.,

Computer Science Department - University of Sao Paulo at Sao Carlos - Brazil [josiane | chino | agma | caetano]@icmc.sc.usp.br

Paulo M. Azevedo-Marques Science of Image and Medical Physics Center of the Medical School of Ribeirao Preto University of Sao Paulo at Ribeirao Preto - Brazil [email protected]

Abstract This paper presents a new Picture Archiving and Communication System (PACS), called cbPACS which has content-based image retrieval resources. The cbPACS answers similarity queries (range and nearest-neighbor) taking advantage of a metric access method embedded into the image database manager. The images are compared through their features, which are extracted by the image processing system module. By now, the system works on features based on color distribution of the images through normalized histograms as well as metric histograms. Metric histograms are invariant regarding scale, translation and rotation of images and also to brightness transformations. The cbPACS is prepared to integrate new image features, based on texture and shape of the main objects in the image.

1. Introduction The analysis of images obtained by computerized tomography (CT), magnetic resonance imaging (MRI), and ultrasound among others is a valuable tool for medical diagnosis. Depending on the hospital facility, the volume of data generated daily in such exams can reach several gigabytes. Therefore, the need of fast and effective methods to organize and retrieve patient's information, including images, has been highly pursued by the medical community. The design of Picture Archiving and Communication Systems (PACS) [6] has been conducted over the last years aiming to fulfill these objectives. Traditionally, images of patients' exams are described textually and the images are stored and organized based on such descriptions. However, this approach depends on the particular view the radiologist is concerned with at that time. Moreover, the description does not carry all the information kept in an image. Considering that, many hospitals and health centers are endeavoring the capability of searching images based on their content, as a tool for improving diagnosis. Additionally, it is worth to enable the PACS to be available through the WWW, thus physicians geographically apart can also cooperate and contribute to provide the diagnosis [3]. This paper focuses on describing the computational modules needed to build a PACS including the content-based image retrieval (CBIR) functionality over the internet. We will discuss the main techniques from image processing which must be implemented in order to allow image comparison. The aim is to support the usual comparisons made on images by the physicians. For example, given an image database, a simple question could be: “return the three most similar images to the Rx-

Thorax of John Doe”. A way to perform that is to compare the image of Rx-thorax of John Doe to all the images in the image database. However, this could take long as the time spent to get this answer is proportional to the number of images in the image database. In order to speed up the searching operation over data, index structures have been proposed in the literature. The well-known index structure B-tree is part of any commercial database manager. However, B-trees are not suitable for multidimensional data, as is the case of images. If the dimensionality of the data are not large the so called spatial access methods (SAMs) can be used. A widely known SAM is the R-tree [4] and its successors R+-tree [5] and R*-tree [2]. However, all these methods cannot be used when the dimensionality of the data grows. Considering similarity queries, the most suitable type of structures to deal with images are the so called metric access methods (MAM). These methods can handle similarity queries internally and demand to have a distance function defined. The distance function measures the dissimilarity between the data indexed. A PACS with CBIR should be developed in a modular way, allowing to keep the data in servers which can be accessed by client softwares. A PACS with such ability is being developed in a joint work between the Database and Image Group of the Computer Science Department at Sao Carlos and the Image Center of Clinical Hospital at Ribeirao Preto both from the University of Sao Paulo, Brazil. This PACS, which is called cbPACS, allows answering similarity queries over the images stored in the image database. The rest of the paper is organized as follows: Section 2 provides an overview of concepts and previous work related to this paper. Section 3 presents the architecture of the proposed system and Section 4 describes the experiments developed for validating the system. Section 5 gives the conclusions of the paper.

2. Background The importance of using similarity queries in medical systems comes from the fact that what is usually sought for the physicians are the images which are alike but not identical to a given image. Finding an identical image corresponds to obtain total similarity, but such a situation is not usual, because the given image is already under the physician's analysis. Similarity queries use the techniques from content-based image retrieval. That is, the images are indexed and compared through the feature vectors extracted from them. The dissimilarity between images is measured by a distance function applied over the images. Thus, in order to answer similarity queries it is necessary to have the distance function and the objects to be compared. The following definition formalizes these ideas. Definition 1 (metric distance function): Given a set of objects S= {s1, s2, ..., sn} of a domain S, a function d() that has the following properties: 1. Symmetry: d(s1,s2) = d(s2, s1) 2. Non-negativity: 0