BESSI: an Experimentation System for Vision Module ... - CiteSeerX

5 downloads 219 Views 224KB Size Report
other software development areas, there is a need for accurate descriptions of ... software module to survive the current application are much better than in most ...
BESSI: an Experimentation System for Vision Module Evaluation  Carlo de Boer

Arnold W.M. Smeulders

Department of Computer Systems University of Amsterdam Kruislaan 403 1098 SJ Amsterdam The Netherlands Tel: ++31 20 525 7515 Fax: ++31 20 525 7490 e-mail: [email protected] keywords : computer vision, experimentation, module validation, performance measurement

Abstract

In past years, the complexity of computer vision software systems has grown considerably. As in other software development areas, there is a need for accurate descriptions of system behavior in practical situations. These descriptions are essential for maintainability and successful application of software in the long term. A useful way to specify the software behavior is to describe software system in terms of performance of its components to assure survivability if the components for other applications. In this paper, we present an experimentation system for the evaluation of computer vision modules, to support the process of getting the grips on the performance characteristics of a vision module. Such a system will, to our believe, strongly endorse the construction of large image processing systems by composition of thoroughly tested modules. In such a way, the chances of a software module to survive the current application are much better than in most current practices.

1 Introduction In past years, the complexity of computer vision software systems has grown considerably. As in other software development areas, there is a need for accurate descriptions of system behavior in practical situations. These descriptions are essential for maintainability and successful application of software in the long term. A useful way to specify the software behavior is to describe software system in terms of performance of its components to assure survivability if the components for other applications. In this paper, we present an experimentation system for the evaluation of computer vision modules, to support the process of getting the grips on the performance characteristics of a vision module. Performance has two aspects: computational performance, related to algorithm eciency and hardware speed, and methodological performance, providing an answer to the question whether the algorithm performs as anticipated. In this paper, we take the second view only.  Sponsored

by TopSpin "Knowledge based conversion of utility maps."

1

To carry out an e ective evaluation of each module, we demand that it will be seen as a detector module, where the outcome is either true or false, or at least a variable of correctness 1 . For such modules, a hit and miss ratio may be evaluated on a testset of exemplary samples of images of which the groundtruth is known. With the statistics on the hit and miss ratios, the performance of the algorithm with respect to its tuning parameters can be validated. In general, there are two ways to describe the performance of a computer vision algorithm. Firstly, the characteristics can be given analytically, by de ning the performance in terms of a mathematical description of the algorithm and its stability against perturbation parameters (see [8, 9]). Secondly, the algorithm can be characterized in statistical terms. Although the rst method is generally preferred, it is not always achievable in practical situations (see [23]). And what is more, even if a mathematical description applies, the second statistical option is still needed to test the e ectiveness in practice. Hence, in both cases, extensive experimentation is needed to validate the formulated hypotheses. Our current system concentrates on the statistical evaluation on the basis of sample data with known truth. As a consequence of the above, it is our observation that a growing portion of current daily work in image processing goes in carrying out the experimentation part of the design-code-experiment loop. With the BESSI system, we aim to support the developer of vision modules during the experimentation phase of the vision module design loop. Tools to help the experimenting process are often build on a ad-hoc basis, for speci c testing applications. Also, in most systems for computer vision research, such as Khoros [19], IUE [15] and KBVision [6], support for data analysis and/or data management is integrated. Other symptoms of the growing need for generic tools and standardization of performance measurement are benchmark databases ([10]) and the development of larger scale test environments, such as the Radius testbed [11, 7]. In other areas of computer science, performance measurement systems are more common. In general, these tools concentrate on the computational performance. In the database research, for example, there are several standard benchmarks [5, 16, 4] to measure the time eciency for di erent application domains and also generally applicable test environments, such as Testpilot [2] have been developed. In the area of parallel systems and programming languages, large scale computational performance measurement systems are even more common [1, 20, 17] and much e ort is spent on the development of these tools. Another system for carrying out experiments in all computer science research areas is the Desktop Experiment Management system [12]. The context of the presented system is the TopSpin project, which involves the building of a system for the conversion of utility maps from paper to electronic GIS les on the basis of a complete interpretation of the graphic symbols on the map. The vision tasks of this system are centred in the so called detector-reasoner loop. A detector is an algorithm for the recognition of a speci c graphic symbol in the digitized image of the map. The system will contain one detector for each speci c recognition task, of which there are 10-50 in a practical system. For each detection, a detector reports presence or absence of the symbol, augmented by certainty of observation to the reasoning module. An example of a detector is the arrow detector, that decides whether an arrow symbol is present in a given map area and the certainty that is indeed an arrow (and not a mistaken coincidence of two crossing lines). Based on the results of the detectors and domain speci c knowledge, the reasoning module decides whether enough evidence has been collected during the detection phase. Further information on the ROCKI project can be found in [21]. A detailed description on the arrow detector is given in [13]. In the TopSpin project, BESSI is particularly useful for experimentation on the detector modules. Apart from measurement of the performance characteristics, these experiments involve optimal parameter tuning and the analysis of misdetections. The organization of the paper is as follows. First, our vision on the experimentation process is 1

e.g. a classi cation attached with a measure for its a posteriori probability

2

given, together with the role of the BESSI system in during the process. Next, the architecture of the system is explained, followed by an example application of the system. Finally some concluding remarks are discussed, together with possible future extensions.

2 The Experimentation Process In general, an experiment in computer vision contains the following steps: 1. Data collection 2. The actual evaluation of the experiment 3. Interpretation of results In the preparation phase of the experiment, the researcher will consider a few things, such as: which algorithms are to be tested?, what are the interesting parameters to tune? which perturbations or perturbation models can be expected in practice? which hypotheses are tested?, what kind of test data are to be used? Are they synthetical or real data? An answer to all of the above questions is needed in the BESSI system to set up an experimental run (see section 4 for an example). In the next step, testdata are collected. If the testdata are real data, a set of images has to be de ned and annotated with truth values. If the experiment is done with synthetical data, the synthetical images are generated and groundtruth can be derived directly from the parameters by which the synthetical images were generated. In both cases, the result of the testdata collection consists of samples of images, with groundtruth values attached to them. After the de nition of the testset, the experiment will be composed, where di erent modules are put into the BESSI framework, establishing appropriate connections between the modules and setting the variables of the experiment. For example, a typical experiment set-up in computer vision consists of a perturbation module that adds noise to the test images, a module containing the algorithm to be tested and a module that compares the values calculated by the algorithm to the groundtruth and produces a measure of correctness for the tested algorithm. When the experiment has been composed in this way, we get to executing the modules. In a run, the module to test is applied to all testsample images and for each combination of variable tuning and perturbation parameters. Hence, for every point in the parameter space of the experiment, an evaluation value of the performance of the module is generated. In the last step, the generated data is analysed by visualization and/or statistical analysis. The hypotheses are refuted or accepted. Another possibility is that more evidence is needed. In that case, additional experiments have to be performed by iteration in the experiment loop once more. In the BESSI system, three steps of the experimenting process are supported. Each step is represented by a speci c system component, as will be discussed in the following section.

3 The System Architecture 3.1 Design Considerations

The BESSI system consists of a set of loosely connected components. Each component performs a speci c task and the communication between the system parts runs via well de ned interfaces. Most of the components are implemented by existing, dedicated software packages, such as a system for computer vision, a database management system and software for statistics and visualization. The advantages of the use of existing software are the improved quality and the reduced software development time for the full system. The main disadvantage is that the communication

3

software has to be adapted to the interfaces provided by the packages. But as modern component software technology matures, standard communication protocols are developed [18, 3] and the latter disadvantage becomes less important. By applying these well de ned, preferably internationally standardized interfaces between the packages, a exible environment can be built. The software modules are easily replaceable and the dependence on the used software remains limited. An example of the used software is the database management system. A DBMS is good at storing large amounts of data and carrying out ecient searches on the data. Instead of building a tool to manage the data ourselves, an interface to an existing DBMS was created, according to the ODMG standard. In this paper, we do not discuss the interfaces between the components into detail, but concentrate on the architecture and the way experiments are carried out using BESSI.

3.2 The System Components

The system architecture of BESSI is illustrated by gure 3.2. The central layer of the system consists of three functional components, each representing a step in the experimentation process, as mentioned in section 2. The successive functional components are described further into detail below: User interface

BESSI/JOEP data collection

BESSI/SHEETS experiment execution

BESSI/ANALYSIS analysis

Database interface

DBMS

Figure 1: The general architecture of BESSI

Data collection : BESSI/JOEP The data collection component is currently implemented in

the JOEP package. It is designed for the extraction and annotation of interesting image regions from large size digitized images. In the annotation phase, the user attaches groundtruth values to the image, such as a classi cation and the position of interesting objects. For other application domains and experiments with synthetical data, other data collection modules will be developed, or existing tools will be integrated in BESSI. Experiment execution : BESSI/sheets For the design and and execution of experiments in BESSI, a visual programming like module, called BESSI/sheets, has been designed. Using the tool, researchers can gradually and interactively construct their experiments from a set of prede ned so called experiment sheets. 4

A prede ned sheet represents the data ow during an experiment at a generic level, meaning that no instantiations of the data are de ned. In gure 3.2, an example of the layout of a generic experiment sheet is given. Pert. Par.

Alg. Par.

DB-Query

DB

Test Set

image

Perturbation

image

Algorithm to test

Results

values

Criterion function evaluation measure

truth values

N times Criterion Par. Result Set

DB

Figure 2: An example of a generic experiment sheet Initially, the modules in the sheet are empty. By lling in the modules, the researcher adjusts the experiment sheet to his speci c needs in a stepwise manner. First, the application speci c routines, such as the algorithm to test, the perturbation function and the evaluation function are added. Next, the parameters of the experiment are set. The value of an experiment parameter can be either a constant or a series of values. Also, the researcher de nes which results should be stored in the database after execution of the experiment. At last, the testset is de ned by a database query (in the DB-Query box) As the experiment sheet is fully instantiated, the experiment is executed and the generated data is stored in the database. In the following section, the development of an experiment with BESSI/sheets is illustrated by an example. Data analysis : BESSI/analysis The analysis of the data is carried out by an external package for statistics and visualization. This component is directly connected and controlled by the BESSI system. The details on the interface with the external package are not discussed in this paper. Apart from the functional components in the central system layer, the system consists of three additional internal components:

Graphical user interface The BESSI system is controlled via a graphical user interface. The graphical user interface is not discussed in this paper. Database interface : ODMG The interface with the database is developed according to the ODMG model. The ODMG model is a standard in development for communication with object oriented databases. For further information, we refer to [3]. 5

Database : Illustra The database management system we use is Illustra. Illustra is a so called

object-relational database system, speci cally aimed at the support for multi media software systems. It is highly extendible and with the 'datablade' paradigm, large classes of functionality can be added quickly. Further information on Illustra can be found in [22].

4 An Example Application of BESSI 4.1 Description

The example presented in this section is a ctituous experiment with a realistic detection module [13] and realistic data. It illustrates some of the capabilities of the BESSI system. The module at hand is an arrow detector. It reports presence or absence and location of an arrow in an image, digitized from a paper map and the certainty of the detection. On the input side, the detector receives a grey valued image, a parameter de ning the image resolution, a parameter 'pensize' that de nes the thickness of the pen used in the drawing process and a threshold parameter that determines the amount of evidence needed to classify an image as an image of an arrow. For the experiment, a testset of 500 samples is used. Each sample consists of a grey valued image containing no or one arrow with groundtruth values (the presence of an arrow and the location of the arrow in the image). Further, a noise generator is used, corrupting the images on input to evaluate the robustness of the module. In the example, we apply Gaussian noise function with a parameter . To be able to establish the performance of the detector, a criterion function is needed for evaluating the discrepancy between the truth value and the value as estimated by the module. Such a criterion can be the mean square error for continuous features, or the miss and hit count for modules performing a classi cation task. The criterion function in the example is de ned as follows: On the input side, the function receives the output of the detector, the groundtruth and a threshold value on the di erence between the detected arrow position and the groundtruth position. On the output side, the criterion function gives a measure for the quality of the detection, consisting of evaluation class (good , misdetection or false alarm), the certainty of the evaluation measure and a measure for the di erence of the calculated and the groundtruth position. A mis-detection occurs when the detector doesn't nd an arrow while one is present or when the detector nds an arrow at a wrong position. False alarm is de ned as the detection of an arrow, while there is none. An interesting description of the performance is given by the plot of the mis-detection ratio against the false alarm ratio, for varying signal to noise ratios and varying values of the detector's evidence threshold. This method is part of the data analysis methodology, as proposed in [14].

4.2 The Experiment in BESSI

In gure 4.2, a screendump of the object extraction tool JOEP is given. With JOEP, interesting parts of a digitized paper map, size 10; 000  6; 000 pixels are extracted, annotated and classi ed. The selected data is converted to the database format and stored. After the collection and annotation of the 500 testsamples, the experiment continues with the instantiation of the experiment sheet. Starting from the generic visual sheet in gure 3.2, the user can create the sheet in gure 4.2 interactively. First, the functions (perturbation function, algorithm to test and criterion function) are plugged in the data ow schema, together with the names of the in- and output elds. The user also speci es which results should be stored in the database (in the results box). Next, the values of the various parameters are set. The perturbation parameter  has three values (1.5, 2.0 and 2.5) and for the threshold value of the detector 10 values are de ned (0.64, 0.68, ..., 1.0). The testset is generated by querying the database for the 500 testsamples. 6

Figure 3: The screen layout of the object extraction tool JOEP. The arrow in the centre of the image with two arrowheads is annotated with the length, width and position (de ned as the centre of de length axis). The image is classi ed as arrow with the buttons on the right. When all variables are set, the experiment is executed. The sequence of perturbation, detection and evaluation is carried out for each sample and each combination of variable input parameters. In total, 15; 000 results are generated, converted to the database format and stored. The last step in the experiment is the analysis of the generated data. In the example, the analysis is illustrated by the mis-etection/false alarm ratio curves for di erent perturbation levels in gure 4.2. Typically, the curves in the upper left region of the diagram are caused by low evidence criterion values, while the high values can found in the lower right part. In this paper, the data and control ow during the analysis phase is not described into detail. The general idea is that the information is retrieved from the database by a query and sent to an external package for visualization. From the data analysis phase, the operator can go back to the previous phases, by adjusting the experiment variables or the contents of the testset.

4.3 Conclusions

The example illustrates that BESSI is a fruitful tool for carrying out experiments in the computer vision application domain. Especially, when experiments on datasets of over 1000 images are considered, evaluation of the results without a tool as BESSI is a tedious and dicult job. Also, misclassi ed objects can be retrieved easily from the database, whereas in the traditional experimentation a very precise manual administration is required. We believe BESSI is a tool which enhances productivity in the tuning and veri cation of image processing modules.

7

DB-Query

Pert. Par.

Alg. Par.

avg = 0.0 sigma = 1.5, 2.0, 2.5

resolution = 300 pensize = 0.05 threshold = 0.64..1.0, 0.04

testset = "select ....."

DB

Test Set

image

(500 samples)

Perturbation

noise image

Algorithm to test

gauss_noise

arrow_detector

classification, position, certainty

Criterion function true class, position

detector_eval

Results noise image classification certainty position evaluation class evaluation certainty evaluation difference

evaluation measure

N times Criterion Par. max_pos_diff = 0.01

Result Set (15000 elements)

DB

Figure 4: An instantiated visual experiment sheet

5 Discussion A useful extension of the BESSI system is the addition of a module for automatic parameter optimalization. Such a module connects the results of the evaluation module to the setting of the tuning parameters and tries to nd optimal results, by applying analytical, statistical or heuristical techniques. The mentioned extension will help the process of parameter tuning and provides a tool to explore the parameter space more e ectively. Especially in very large scale experiments (with large testsets and a lot of experiment variables), such a module is needed to reduce the number of results in order to keep the amount of data manageable. With the BESSI system, a useful tool for the measurement of performance characteristics of computer vision algorithms has been presented. Due to a modular design, the system is extendible and applicable to large problem domains in computer vision. Such a system will, to our believe, strongly endorse the construction of large image processing systems by composition of thoroughly tested components. In such a way, the chances of a software module to survive the current application are much better than in most current practices where components are generally tightly knitted into the code of an application. In fact, building systems for encapsulated self contained components (the design of which we support by BESSI) is along the lines of the OO-development process. In this case we refer to OO not as the use of a programming language but as the general design strategy guideline, gaining ground in many engineering sciences.

References [1] Vikram S. Adve, John Mellor-Crummey, Mark Anderson, Jhy chun Wang, and Daniel A. Reed. An integrated compilation and performance analysis environment for data parallel programs. 8

% false alarm

10

sigma = 1.5 sigma = 2.0 sigma = 2.5

5

0 0

5 % mis-detection

10

Figure 5: The misdetection/false alarm curves for di erent perturbation levels

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

In ACM Conference on Principles of Parallel of Parallel Programming, 1994. submitted for publication. F. Andres, F. Kwakkel, and M.L. Kersten. Calibration of a dbms using the software testpilot. Technical report, Bull, 1993. Tom Atwood, Joshua Duhl, Guy Ferran, Mary Loomis, and Drew Wade. The Object Database Design Standard: ODMG-93. Morgan Kaufman, San Mateo, 1994. D. Bitton and C. Turby ll. A retrospective on the wisconsin benchmark. In M. Stonebraker, editor, Readings in Database Systems, 2nd ed. Morgan Kaufmann, San Mateo, CA, 1994. M. Carey, D. J. DeWitt, and J. F. Naughton. The DEC OO7 benchmark. In Proc. ACM SIGMOD Conf., page 12, Washington, DC, May 1993. Peter Eggleston. General support tools for algorithm development and scienti c research in computer vision. Technical report, Amerinex A.I. Inc., 1992. D.J. Gerson and S.E. Wood Jr. Radius phase ii: The radius testbed system. In Morgan Kaufmann, editor, Proceedings, ARPA Image Understanding Workshop (Monterey, CA, November 13-16, 1994), pages 231{237, San Francisco, CA, 1994. R.M. Haralick. Performance characterization in computer vision. Image Understanding, 60:245{249, 264{265, 1994. (With comments by L. Cinque, C. Guerra, and S. Levialdi; J. Weng and T.S. Huang; P. Meer; Y. Shirai; and B.A. Draper and J.R. Beveridge.). R.M. Haralick. Performance characterization protocol in computer vision. In Morgan Kaufmann, editor, Proceedings, ARPA Image Understanding Workshop (Monterey, CA, November 13-16, 1994), pages 667{673, San Francisco, CA, 1994. A.J. Heller and J.L. Mundy. Benchmark evaluation of a model-based object recognition system. In Morgan Kaufmann, editor, Proceedings, Image Understanding Workshop (Pittsburg, PA, September 11-13, 1990), pages 727{741, San Mateo, CA, 1990. A. Hoogs and B. Knin. The radius testbed database: Issues and design. In Morgan Kaufmann, editor, Proceedings, ARPA Image Understanding Workshop (Monterey, CA, November 13-16, 1994), pages 269{276, San Francisco, CA, 1994. 9

[12] Y. Ioannidis, M. Livny, E. Haber, R. Miller, O. Tsatalos, and J. Wiener. Desktop experiment management. IEEE Data Engineering Bulletin, 16(1):19{23, March 1993. [13] Arnold Jonk, Rein van den Boomgaard, and Arnold W.M. Smeulders. An arrow-detector. in preparation, 1995. [14] T. Kanungo, M.Y. Jasimha, J. Palmer, and R.M. Haralick. A quantitative methodology for analyzing the performance of detection algorithms. In Proceedings , Fourth International Conference on Computer Vision (Berlin, Germany, May 11-14, 1993), pages 247{252. IEEE Computer Society Press, Los Alamitos, CA, 1993. [15] Kohl and Mundy. The development of the image understanding environment. In Proceedings of the Conference on Computer Vision and Pattern Recognition, 1994. [16] A.R. Lebeck and D.A. Wood. Cache pro ling and the spec benchmarks: A case study. IEEE Computer, 27(10):15{26, October 1994. [17] A. Marconi, M. R. Nazzarelli, S. Sabina, E. N. Houstis, K. N. Pantazopoulos, and M. A. Tsoukarellas. Monitoring and performance analysis tools requirement analysis. Technical report, PEPS, ESPRIT, 1994. [18] OMG. Common Object Request Broker: Architecture and Speci cation. Object Management Group, 1995. available at ftp: omg.org/pub/CORBA. [19] Rasure and Kubica. The khoros application development environment. Experimental Environments for Computer Vision and Image Processing, 1994. [20] Daniel A. Reed, Ruth A. Aydt, Roger J. Noe, Phillip C. Roth, Keith A. Shields, Bradley Schwartz, and Luis F. Tavera. Scalable performance analysis: The pablo performance analysis environment. In Anthony Skjellum, editor, Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer Society, 1993. [21] A.W.M. Smeulders and T.K. ten Kate. Systems for paper map interpretation: methods engineering. in preparation, 1995. [22] M. Stonebraker. The miro dbms. In Proc. ACM-SIGMOD International Conference on Management of Data,Washington, D.C., May 1993. [23] J. Weng and T.S Huang. Performance characterization in computer vision, reply. Image Understanding, 60:253{256, 1994. (reply to R.M. Haralick : Performance characterization in computer vision).

10

Suggest Documents