'Fast-National University, Karachi, Pakistan,. 2Pakistan Navy Engineering College (NUST), Karachi, Pakistan. 'ghazanfar. monir@nu. edu.pk, 2hasnain@pnec.
A Framework for Interactive Content-Based Image Retrieval S. M. Ghazanfar Monirl, S. K. Hasnain2 'Fast-National University, Karachi, Pakistan, 2Pakistan Navy Engineering College (NUST), Karachi, Pakistan 'ghazanfar. monir@nu. edu.pk, 2hasnain@pnec. edu.pk
Abstract With the exponential increase in the size of digital image databases in the pastfew years, traditional way of manually annotating the images with text and then using text-based queries for image retrieval has been giving way to Content-based Image Retrieval (CBIR) Systems that use the visual contents of the images to automatically index and retrieve digital images. However, there is always a gap between high-level human perception of an image and the low-level image features used to describe its contents. This gap between low-level image features and semantic image content is the major bottleneck faced by traditional CBIR systems. Modern CBIR systems overcome this problem by using interactive learning, bringing the user in the loop. Such systems learn from feedbacks given by the user about the relevance or irrelevance of the current retrieval results. This paper presents a framework for interactive content-based image retrieval. Considering relevance feedback as a learning problem, a learning machine based on Radial Basis Functions (RBF) Neural Networks (NN) is implemented and the system has been tested for its effectiveness using a database of 10,000 images.
1. Introduction In past few years, we have seen an exponential increase in the size of digital image collections. Images are being generated at an ever-increasing rate. Defence and civilian satellites, military reconnaissance and surveillance flights, finger printing and mug-shotcapturing devices, scientific experiments, biomedical imaging, and home-entertainment systems are some examples of imaging devices generating gigabytes of images everyday. However, this information cannot be accessed and used properly unless it is organized in such a way to allow efficient browsing, searching and retrieval.
In early days, a popular framework of Image Retrieval was to first annotate the images by text and then use text-based Database Management Systems for Image Retrieval. [1] is an example of systems using text annotation for indexing of images. For very large image collections containing tens or hundreds of thousands of images, manual annotation becomes extremely tedious and time-consuming. Another problem faced by such systems is that perceptual subjectivity and annotation impreciseness may cause undesirable mismatches in the retrieval process. Content-Based Image Retrieval (CBIR) Systems try to provide the solution for these problems by indexing the images by their own visual contents. The most-widely used contents include colour, texture, and shape. QBIC (Query by Image and video Content) [2], MARS (Multimedia Analysis and Retrieval System) [3], Photobook [4], VisualSEEK [5] are some examples of available CBIR systems. Human perception of image similarity is subjective, task-dependent and based on high-level concepts. On the other hand, image features used for content description are low-level features. The gap between high-level human perception of an image and the low-level image features used to describe its contents is called Semantic Gap. This gap between low-level image features and semantic image content is the major bottleneck faced by traditional CBIR systems. The problem is further compounded due to the fact that different users may have different semantic interpretations of the same image. Modem CBIR systems overcome this problem by using interactive learning, bringing the user in the loop. [6, 7] are the examples of CBIR systems that learn from feedbacks given by the user about the relevance or irrelevance of the current retrieval results. Relevance feedback techniques have been shown to provide significant performance boost in retrieval systems. Relevance feedback can be considered as a learning problem. The system acquires knowledge through learning from the user's feedbacks and the
retrieval performance. Several different relevance feedback algorithms have been adopted in CBIR systems among which Query refinement [8,3] and feature re-weighting[7, 9] are the most widely used methods. In this work we have implemented a CBIR system that uses an RBF neural network-based learning machine. Two different design methods have been adopted and their performances are compared for efficiency. The first design method exact learning produces a network with zero error for the training vectors. The second method approximate learning creates neurons one at a time. At each iteration, the input vector that results in lowering the network error the most is used to create a neuron.
Figure 1: Framework for CBIR System
2. System Framework The image retrieval process in the implemented CBIR system can be divided into two different processes. The first process is an offline process and includes the steps of: * Feature extraction from the image database, * Development of a feature database containing the feature vectors of the database images.
process is the online image retrieval process during which: * The user selects a query image from the image database and passes it to the system. * The system performs a similarity comparison between the feature vector of the query image
The second
*
*
2.1
and that from the image database to retrieve 25 images ranked in the order of decreasing similarity. The user provides feedback according to his/her information need. The system learns from the feedback and tries to retrieve more relevant images on the next iteration. Figure 1 describes the steps involved in our image retrieval system.
Offline Processing
This process is performed once in which all the 10,000 images in our database are presented to the system which extracts the desired features of those images and creates a feature matrix containing the feature vectors of the images as row vectors. A feature vector of 170 dimensions has been used; hence the feature matrix has the dimensions of 10,000 x 170. This matrix is then saved to the hard disk and is used afterwards during online image retrieval process.
2.2 Online Processing The online processing phase includes the following steps: 2.2.1 Query Processing. The user browses through the images in the image database and selects an example image for querying. The system obtains the feature vector of the query image from the feature matrix for the initial search. 2.2.2 Similarity Search. Initial search applies knearest neighbour algorithm to find the images similar to the query image based on the Euclidean distances among them. In the subsequent iterations, the topranked K relevant images are used to train the RBF network and the similar images are then ranked according to the output of the RBF network. 2.2.3 Image Retrieval and Display. A ranked list of similar images is obtained from the database at each iteration and the top 25 images are displayed to the user. The user can then continue the search by providing feedback about the relevance/irrelevance of the retrieved images.
2.2.4 Feedback. The user provides the system the feedback about the relevance/irrelevance of the retrieved images by labelling the images as relevant or irrelevant. The system learns from the feedback by training the RBF Network and retrieves another set of images for the user.
3. The RBF Network Figure 2 shows the architecture of the implemented RBF network. The input data to the network is R-dimensional feature vectors, x= XR ]T .They are connected to the hidden layer constructed from relevant and irrelevant samples. The output layer consists of a single unit whose output value F(x) is the linear combination of the responses of all RBF units. Two sets of output connection weights, {wpi }'pl and {WNi }i are associated with relevant and irrelevant samples respectively. Ip and IN, are the number of RBF units for relevant and irrelevant samples.
[XIX2,...
2. Hierarchical cluster tree creation. The closest pairs of samples clusters are linked together. These newly formed clusters are then linked to other clusters to create bigger clusters until all the samples are linked together into a hierarchical cluster tree. 3.
Cluster creation. Clusters are constructed by cutting off the hierarchical tree, where clusters are formed when inconsistent values are greater than a predefined threshold. The inconsistency value for each link of the hierarchical cluster tree characterizes each link by comparing its height with the average height of other links at the same level of the hierarchy. The height of each link represents the distance between the two clusters being connected.
4. Simulations and Results
X2
XR
L
+
Figure2: Architecture of RBF Network The size of the network has been reduced by clustering the relevant and irrelevant images and taking the mean of the clusters as the RBF centres rather than associating each RBF centre with one sample. The accumulated feedback samples are separated into two groups i.e. relevant and irrelevant. Each group of samples is clustered individually to initialise the hidden neurons of the RBF Network. Hierarchical clustering, a well-known unsupervised clustering scheme, is adopted for this purpose. Single linkage clustering [10], also known as the nearest neighbor technique is used in our system. Let there are N samples to be clustered, single linkage clustering proceeds by fusing the samples into groups. The distance measure between two clusters is defined as the smallest distance between samples in their respective clusters. The clustering algorithm can be summarized as follows: 1.
Initialization. The N samples clusters.
are
placed in N singleton
Our dataset consists of 10,000 JPEG images and is divided into 100 classes, with each class containing 100 images. 60 images from 20 different classes have been used as query images in the simulations. A retrieved image is considered relevant if it belongs to the same class as that of the query image, and is considered irrelevant if it belongs to some other class. In all the simulations performed during the study, the top ranked 25 images were displayed to the user and machine learning was performed based upon the feedback provided in the current iteration as well as the previous iterations.
The Mean Retrieval Accuracy (MRA) is calculated by using the following formula. MRA
=-[-R*ioo S _j=1 D-
Where S is the number of simulations performed, R is the number of retrieved images that belong to the same class as the query image, and D is the number of images displayed to the user in each iteration. Based on the results, shown in Figure 3, it is observed that the retrieval accuracy of approximate learning-based network increases quickly in the initial stage. This is a desirable feature, since the user can obtain significant improvement on the retrieval results quickly. After six iterations, the retrieval accuracy obtained is 88.4% compared to 82% of exact learning-
based network. Further it is observed that to achieve a specific retrieval accuracy the approximate leaningbased method requires less numbers of iterations when compared to exact learning method.
[4] Pentland A, Picard R, Sclaroff S (1994) Photobook: tools for content-based manipula-tion of image databases. Proc. SPIE, vol.2185, pp.34-47
Performance Analysis 100 0
-.,i
80
6060
Exact Learning
> 0 0 'D CD o
>
Approx. Learning
,, 40
.,