classes. A class is a group of pixels with similar color and location. These classes are then ... class image is communicated to the host PC. The classi ed pixels ...
Spatial and Color Clustering on an FPGA-based Computer System Miriam Leeser, Natasha Kitaryeva and Jill Crisman Dept. of Electrical and Computer Engineering Northeastern University Boston, MA 02115
ABSTRACT
We are mapping an image clustering algorithm onto an FPGA-based computer system. Our approach processes raw pixel data in the red, green, blue color space and generates an output image where all pixels are assigned to classes. A class is a group of pixels with similar color and location. These classes are then used as the basis of further processing to generate tags. The tags, in turn, are used to generate queries for searching libraries of digital images. We run our image tagging approach on an FPGA-based computing machine. The image clustering algorithm is run on an FPGA board, and only the classi ed image is communicated to the host PC. Further processing is run on the host. Our experimental system consists of an Annapolis Wildforce board with four Xilinx XC4000 chips and a PCI connection to a host PC. Our implementation allows the raw image data to stay local to the FPGAs, and only the class image is communicated to the host PC. The classi ed pixels are then used to generate tags which can be used for searching a digital library. This approach allows us to parallelize the image processing on the FPGA board, and to minimize the data handled by the PC. FPGA platforms are ideally suited for this sort of initial processing of images. The large amount of image data can be preprocessed by exploiting the inherent parallelism available in FPGA architectures, keeping unnecessary data o the host processor. The result of our algorithm is a reduction by up to a factor of six in the number of bits required to represent each pixel. The output data is passed to the host PC, thus reducing the processing and memory resources needed compared to handling the raw data on the PC. The process of generating tags of images is simpli ed by rst classifying pixels on an FPGA-based system, and digital library search is accelerated. Keywords: Image Processing, Computer Vision, Digital Library Search, Image Analysis, FPGA Computing Systems, Con gurable Hardware
1. INTRODUCTION
A computer's workload is increasingly made up of images, whether still or video. Accessing digital images based on their content is becoming increasingly important to computer users. Many researchers are working on methods for nding images in digital library databases. The research on digital libraries focuses on many areas including tagging each image with information about its content, and using those tags as an index into a database of digital images in order to retrieve those images relevant to the search. We concentrate on generating tagging information from still color images that can later be used for digital library search. Our approach starts with a still color image, where each pixel is stored as its red, green and blue (R,G,B) components. Each pixel is then classi ed on an FPGA-based computing platform, and the pixel classes are communicated to the host PC for further processing. The processing on the host PC consists of region analysis and tag generation. The tags are then used as the basis of searches in digital libraries. For example, to nd all images in the database similar to the image presented in a query, the tags generated from the query image are compared to those stored with the images in the database. One of the advantages of our approach is that the amount of data that the host PC must handle is reduced by as much as a factor of six from the raw image. Our initial images are stored as raw pixels, where each pixel is represented by three bytes, one each of the R,G,B color bands. Using this format, a still image contains a huge amount of raw pixel data. For a 480 x 512 image, the resulting data required per image is over 700K bytes. We 1{3
Send correspondence to: Miriam Leeser. E-mail: mel@@ece.neu.edu
classify the pixels in an image and represent the pixels as pointers into a class data structure. For sixteen classes, each pixel can be represented with four bits of data instead of the original 24 bits, or a six times reduction in data. Pixel classes are formed by clustering pixels with similar color and location in the source image. Our technique is similar to color histogramming. In color histogramming, the number of pixels of similar colors in a given image is computed. The main advantage of color histogramming is the low computational overhead. The major disadvantage to this approach is that all spatial information in the image is lost. In addition, the resulting classi cation of pixels cannot be used to represent the image. A color histogram represents additional information about the image that must be communicated to the host PC. We are using a dierent approach for tagging color images. Our approach uses the ISODATA algorithm (also called k-means clustering) for clustering pixels of similar color in nearby locations. We incorporate spatial information as well as color distribution information. Our approach is simpler than other approaches using spatial correlation. In addition, the original image can be represented with the class labels as opposed to the pixels, while still maintaining a high quality image. Our approach is applicable to dierent types of scenes, such as indoors and outdoors, as well as dierent weather and lighting conditions. Spatial clustering requires more processing than color histogramming. Most other approaches to spatial and color clustering run on the host processor, and attempt to iterate over the pixel data only once, due to the high overhead of dealing with the large amounts of data. Our approach requires several iterations over the pixel data of an image. The processing is over byte width data, exhibits a great deal of parallelism, and is particularly well-suited to implementation on an FPGA-based computing machine. The classi ed pixel data is passed to the host processor for further processing. This allows us to exploit the inherent parallelism in the approach, reduces the amount of raw data the host PC must deal with, and speeds the digital library tagging process. Our implementation is on an Annapolis Microsystems board with a fast, PCI interface to the host PC. Each board contains a Xilinx XC4013 for communication with the host PC, and four Xilinx XC4028EX FPGAs which we use for image processing. Each or the four XC4028 chips has 500K bytes of local memory which we are using for image storage. In Section 2, we discuss our approach in more detail. In Section 3, we discuss our implementation of the algorithm and how it can be mapped to FPGAs. We show the results of classifying the pixels of an image with the ISODATA algorithm in Section 4 and discuss our results and future work. First we discuss related work. 1,2
4
4
1.1. Related Work
Computer vision and image processing algorithms have been popular applications for both general purpose processors and FPGA-based computing machines. In the past, these algorithms have often used only gray-level or intensity images as input. Digital images are increasingly represented in full color, so processing of color images is imperative. In addition, we have shown that for many applications, color is an important cue for identifying the objects in images. Most computer vision researchers are not working with three dimensional color algorithms because the basic computer vision algorithms are not straightforward to adapt. Consider edge detection. Edges can easily be detected in each of the three (red, green, blue) color bands. How to combine the edge data, however, is still under debate. Assuming an edge exists if it appears in any of the color bands will keep all of the noise generated by edge detection in each band. However, if the programmer only uses edges that exist in all bands of the image, then the algorithm would miss an edge between a light red and dark red surface, for example. An increasingly important area of research is searching for images in databases of color images. Often, color histogramming is used as an initial classi cation of these images. In color histogramming, some preprocessing of an image is done, usually involving discretizing the color space, then the number of pixels in each discretized color is computed. The main advantage of color histogramming is the simplicity of calculating. The main disadvantage is that all spatial information in an image is lost. Recently, researchers have begun to use more complex techniques that use spatial content as well as color content for extracting features for color images. For example, Zabih et al. have used histogram re nement, a technique that uses spatial coherence as well as histogramming. In a more recent paper, the researchers use color correlation to take into account spatial information. These techniques iterate over the pixels in an image only once, and do a complicated calculation to estimate both spatial and color content of an image. This approach is better suited to 5,6
1{3
7
4
processing on a workstation, where lots of data handling and low level operations are expensive. Our approach uses simple operations and iterates over the pixel image. As a result, out approach is more straightforward, well adapted to an FPGA-based computing machine and easy to parallelize. Image processing and computer vision are popular application areas for FPGA-based computing machines. Most of the applications and implementations discussed either use intensity images as input, or implement algorithms best suited to one band of information. For example, the image segmentation algorithm is similar to our approach. However, the input is an intensity image, and the classi cation is into one of three categories: background, text or halftone. The basic computation is the mean and variance in a limited spatial domain around a pixel. Our approach classi es pixels with no a priori determination of the nal classes, and no limit in the spatial domain processed. Researchers at USC present a technique for parallel object recognition accelerated on an FPGA-based computing system. They start with features extracted from images, and use geometric hashing to compare feature points. Our approach is the rst step in a system that does feature extraction, and thus can be used as a pre-processing step for their technique. Processing of three dimensional color images on FPGA-based computing machines has largely been limited to color histogramming. Our approach diers in that it involves true three dimensional computer vision. In addition, we expect it to map well onto an FPGA-based architecture. 8,9
9
10
2. THE ISODATA ALGORITHM
Our goal is to develop a set of tags to represent an image for searching an image database. The process we use is to rst classify each pixel into one of a set of classes. This classi cation is done on FPGAs. The output is a class image represented as a set of pixels, where each pixel is represented as a pointer to the information about that class. This class image is passed to the host PC for further processing, which includes grouping pixels into regions and extracting features or tags from those regions. These tags are stored with the image in the database and used for searching the database. Note that the tags are computed only once per image. In the rest of this paper, we concentrate on the pixel classi cation algorithm. We have developed a method of using both color and spatial information to classify pixels in a color image. Our method is based on a standard ISODATA clustering algorithm. (Other researchers refer to this as k-means clustering.) We cluster pixels by color (R,G,B) and position (X,Y). The clustering step groups pixels in the image that have similar color and position and represents each pixel by the number of the class to which it is closest. The statistics of each class is also generated. The means for the color and position for each group is calculated, and form the values that are used to represent the pixels. In other words, each pixel can be reduced to a class number, thus compressing the number of bits required to represent the image. The algorithm we use starts with a xed number of classes, K, and mean values for the classes distributed around the color space. Each pixel in the image is represented with a vector of information which includes the color of the pixel (R, G, B) and the spatial position of the pixel (X, Y), i.e. v = [R G B X Y]. Each pixel is assigned to the class k which has the closest mean value vector m(k) to its value. Next, for each of the k classes, the mean value, m(k), of all the pixels v assigned to that class is computed. The classi cation of each pixel is then removed, and the pixels v are reassigned to the class k which has the closest mean value mv(k). The \compute means" and \classify pixel" loop is repeated until the number of pixels changing classes in a given iteration is below a threshold. Pseudo-code for this algorithm is given in Figure 1. We have shown that this method of clustering is successful at clustering pixels and that the mean color and positions of the classes can be used to recreate a realistic representation of the original image. While our approach is more computationally complex than color histogramming, it has the distinct advantage of retaining spatial information. It diers from other approaches that incorporate spatial and color information in that the individual computations are simple, and we iterate over the pixel data. The operations consist of many additions, with a few divisions in each iteration through the image. Our approach can be implemented on the PC alone, but the result is very slow. In addition, it requires a large amount of data to be handled by the PC. We are currently processing images of size 256x240; such an image requires sixty thousand pixels to represent. This approach is better suited to an FPGA implementation which can handle the low-level data operations eciently. Another advantage of this algorithm is that it is highly parallelizable. There are many operations that 11
12
/* Initialize: */ For class k = 1 to K randomly assign meanr[k], meang[k], meanb[k] so colors are distributed; End for; /* Iterate until convergence is reached */ While number of pixels changing classes < Threshold /* Initialize values */ For each class k numpixels[k]= 0; redsum[k],greensum[k],bluesum[k],xsum[k],ysum[k] = 0; End for; /* Classify Pixels Loop: assign pixels to class with closest mean */ For each pixel p /* Find class k whose mean is closest to p[R],p[G],p[B],p[X],p[Y] */ Mindist = large number; For each class k dist = (|p[R]-meanr[k]| + |p[B]-meanb[k]| + |p[G]-meang[k]| + |p[X]-meanx[k]| + |p[Y]-meany[k]| ) if dist < mindist then this_k = k; Mindist = dist; end if; /* Assign pixel p to this class*/ class[p] = this_k; redsum[this_k] = redsum[this_k] + p[R]; greensum[this_k] = greensum[this_k] + p[G]; bluesum[this_k] = blusesum[this_k] + p[B]; xsum[this_k] = xsum[this_k] + p[x]; ysum[this_k] = ysum[this_k] + p[y]; numpixels[this_k] = numpixels[this_k] + 1; End for; /* Calculate Means for each class */ For each class k meanr[k] = redsum[k]/numpixels[k]; meang[k] = greensum[k]/numpixels[k]; meanb[k] = bluesum[k]/numpixels[k]; meanx[k] = xsum[k]/numpixels[k]; meany[k] = ysum[k]/numpixels[k]; End for; End while;
Figure 1.
Pseudo-code for the ISODATA algorithm
can be done simultaneously, and the data can be divided over multiple processors. Our initial experiments involve implementing the processing of one image one FPGA with the pixel data stored in memory local to the processor. We expect this implementation to speed up the pixel classi cation process considerably over running on a PC alone.
3. IMPLEMENTATION OF SPATIAL CLUSTERING 3.1. The Annapolis Wildforce Board
We are mapping the ISODATA clustering algorithm to a Wildforce board from Annapolis Microsystems. The Wildforce board is based on the Splash-2 architecture. Our Wildforce board has a PCI interface, one Xilinx X4013 for control, and four Xilinx XC4028EX-3 FPGAs for data processing. The four 4028 chips each have 1024 CLBs, and together provide for approximately 100,000 gate equivalents. They are interconnected using a crossbar switch. In addition, we have 500 Megabytes of external RAM on a mezzanine card local to each XC4028. This RAM will be used for local image storage during processing. The control FPGA is a Xilinx XC4013 used to implement the PCI bus interface. This is used to communicated the class image to the host PC. We use the software functions and VHDL models provided with the Wildforce board. Code that runs on the PC is written in C. Hardware to be run on the Wildforce board is described using behavioral VHDL, synthesized with Synopsys synthesis tools, and mapped to FPGAs using the Xilinx M1 design tools.
3.2. Mapping the ISODATA algorithm onto the Wildforce board
Our initial experiments involve mapping a small image onto one of the FPGAs on the Wildforce board, so that all data from one image is local to that FPGA. To accomplish this, we are using images with 256x240 pixels; these require 180K bytes of local storage. In addition, we store the class means table, which requires 320 bytes, and a data structure of 60K bytes for the class data for each pixel. This last data structure could be packed into fewer bytes if we use fewer than one byte per pixel. The algorithm was given in Figure 1. There are two main loops for the computation: assigning pixels to classes and computing means for the classes. The calculation for assigning pixels to classes requires only absolute value, addition and subtraction operations, and is straightforward to implement. Calculating means is moderately more complicated; it requires division and addition operations. All of these operations can easily t onto an FPGA implementation. Our design process starts with C code for the algorithm. The C code does not rely on any system calls, since it is a speci cation for the nal hardware to be implemented. We implement all the arithmetic operations with xed point operators. An important aspect of the design is to minimize the number of bits required since this minimizes the size of the components required in the FPGA speci cation. We have found that 24 bits are sucient to represent the means and class data. Our design ow proceeds by translating the C description to behavioral, synthesizable VHDL for use with Synopsys Behavioral Compiler (BC). We use Synopsys BC along with the libraries provided by Xilinx and Synopsys Designware libraries to generate an RTL level design. We then use Synopsys FPGA Compiler to generate a design mapped to CLBs, and the Xilinx Foundation tools to do the nal place and route of a design onto the Xilinx 4028EX-3 chips we have available. The design ow is illustrated in Figure 2. Since we are mapping one instance of our algorithm to each of the four chips there is no need to partition the design. The same design is mapped onto each of the chips. We have experience with a similar design ow and nd that translation from C to behavioral VHDL is straightforward, and that BC allows us to do a signi cant amount of design space exploration at the RTL level. By far the most complex component in this design is the divider needed to calculate the mean values for each of the classes. We serialize all the division operations onto one divider functional unit. We require a 24 bit dividend and 12 bit divisor. Our rst, naive implementation used the combinational divider available as a Synopsys Designware component. Mapped to the Xilinx 4000 family, this component requires 217 CLBs, and had a delay of approximately 1350 nanoseconds. The extremely slow speed is due to the size and pure combinational nature of the implementation. In addition the 217 CLBs take up over 20% of the area available on an XC4028, which has 1024 CLBs. Clearly this is not a good implementation for FPGAs. Instead, we are implementing a subtractive divider. The implementation uses a small lookup table and carry save adder, plus a few registers. It is much smaller than the combinational divider, and will allow for a much faster clock speed for the entire design. The throughput for each division operation, however will be several clock cycles. 13
14
Simulate Behavioral Description
Modify VHDL
No
Behaves as Desired?
Behavioral VHDL Code
Yes
Simulate RTL Against Behavioral
Reconstrain
No
Behaves as Desired?
Behavioral Compiler
RTL VHDL Code
Yes
No
Reconstrain
FPGA Compiler
Meets Constraints? Yes
Simulate Gate Level Against Behavioral
No
Behaves as Desired?
Gate Level VHDL Code
Yes
Simulate at Device Level
Figure 2.
Place & Route
Configure & Test the chip
Design Flow
4. DISCUSSION
In this section we present some preliminary results, and discuss future directions of our research.
4.1. Results
Our initial results are based on the C simulation with our xed point library. We are using 24 bit words for the mean data structure, 12 bits for the number of pixels per class, and 16 classes per image. Results are shown in Figures 3 and 4, for three dierent images. For each image we show the input image, and the output image after pixel classi cation. In the nal image, each pixel is represented by the mean red, green and blue values of the class it belongs to. The images here are in black and white. Color versions of these images can be found at: http://www.ece.neu.edu/groups/rpl/visionapp/images.html
Figure 3.
Input Images
Figure 4.
Output Images
The xed point C code running on a Sun workstation is much slower than the oating point version. The oating point version on a Sun Ultrasparc several minutes per image. This is too slow to make this a viable approach for large image databases. We expect to see signi cant speedup for the FPGA version. In addition, on our Annapolis Wildforce implementation we can process four images in parallel, giving us an automatic speed up of a factor of four.
4.2. Future Directions
We have presented preliminary results on a technique using spatial as well as color clustering to nd tags in three banded color images. Our immediate plans are to nalize the FPGA version, and use the output to generate tags and search digital libraries. Our current approach is to cluster based on X,Y coordinate and R,G,B color. This allows us to use data directly from an image database, or from a video camera. However, the technique can be applied to a broad range of pixel features, such as the hue, saturation, intensity (H,S,Y) domain. We plan to experiment with dierent features to determine which will allow us to generate the best image tags. We will use our approach to generate tags for images in digital libraries by providing a hypothesis for the content of each image. For example, a good deal of green and earth-tone colors throughout the image is a good indication that the image is of a natural outdoor scene. Seascapes can be characterized by large amount of blue near the top of the image and large amounts of blue-green at the bottom. We plan to use this and other clues to help us automatically categorize the images in a digital database. In addition, these tags will help us to nd results to such image queries as \Find all the portraits in a database". First we translate this into the query: \Find all images where at least one third of all pixels are esh colored", then search the tags of the stored database to nd the appropriate images. Similarly a query such as \Find all images in the database in which most of the bottom half of the image is a textured green region" can nd all meadow images in the database. We have discussed classifying pixels on the FPGA board, and doing further processing of the images on the host PC. In the future we will look at the best place to run each of our computer vision algorithms to get the best performance out of our setup: an FPGA-based computing system consisting of a parallel FPGA board and host PC. Ultimately, we expect this approach to accelerate the task of searching for images in digital databases.
5. CONCLUSIONS
We have presented a spatial and color clustering approach implemented on an Annapolis Microsystems Wildforce board connected to a host PC. Our approach diers from other work in that it processes three banded color images (red, green and blue) and incorporates spatial as well as color information about the input image. The output is an image of classi ed pixels, and represents a signi cant compression of the input image, as well as the basis for computing tags for searching digital libraries. Running this approach on an FPGA board signi cantly speeds up the processing of images compared to running it in C on a host processor.
ACKNOWLEDGMENTS
We would like to thank Annapolis Microsystems, Sun Microsystems, Synopsys and Xilinx Corporations for their support.
REFERENCES
1. M. Flickner et al., \Query by image and video content: The QBIC system," IEEE Computer 28(9), pp. 23{32, 1995. 2. V. Ogle and M. Stonebraker, \Chabot: Retrieval from a relational database of images," IEEE Computer 28(9), pp. 40{48, 1995. 3. A. Gupta and R. Jain, \Visual information retrieval," Communications of the ACM 40, pp. 71{79, May 1997. 4. J. Huang, S. R. Kumar, et al., \Image indexing using color correlograms," in Computer Vision and Patern Recognition, IEEE , pp. 762{768, June 1997. 5. N. Zeng and J. Crisman, \Evaluation of color categorization for representing vehicle colors," in Conference on Transportation Sensors and Controls: Collision Avoidance, Trac Management, Proc. SPIE , November 1996. 6. J. Crisman and C. Thorpe, \SCARF: a color vision system that tracks roads and intersections," IEEE Transactions on Robotics and Automation , pp. 49{58, 1993. 7. G. Pass and R. Zabih, \Histogram re nement for content-based image retrieval," in IEEE Workshop on Applications of Computer Vision, pp. 96{102, 1996. 8. P. Athanas and L. Abbott, \Real-time image processing on a custom computing platform," IEEE Computer 28(2), pp. 16{24, 1995. 9. N. K. Ratha and A. K. Jain, \FPGA-based computing in computer vision," in International Workshop on Computer Architecture for Machine Perception, IEEE , pp. 128{137, October 1997. 10. Y. Chung, S. Choi, and V. K. Prasanna, \Parallel object recoginition on an FPGA-based con gurable computing platform," in International Workshop on Computer Architecture for Machine Perception, IEEE , pp. 143{152, October 1997. 11. R. Duda and P. Hart, Pattern Classi cation and Scene Analysis, John Wiley and Sons, Inc., 1973. 12. J. Crisman and C. Thorpe, \UNSCARF: a color vision system for the detection of unstructured roads," in Conference on Robotics and Automation, Proc. IEEE , pp. 2496{2501, April 1991. 13. G. Doncev, M. Leeser, and S. Tarafdar, \High level synthesis for designing custom computing hardware," in Symposium on Field-Programmable Custom Computing, IEEE , April 1998. 14. P. Soderquist and M. Leeser, \Area and performance tradeos in oating-point division and square root implementations," ACM Computing Surveys 28, pp. 518{564, September 1996.