Data Mining for GOR V3 - Semantic Scholar

4 downloads 31433 Views 659KB Size Report
combinations of feature pairs looking for successful application ... between C# and MySQL (available from http:// ... machine using Parallels Desktop for Mac.
Data Mining for Generalised Object Recognition Huub H C Bakker, Rory C Flemmer School of Engineering and Advanced Technology, Massey University, Palmerston North, New Zealand [email protected]

Abstract—Object recognition from machine vision is a complex task that can involve the comparison of image data with hundreds of thousands of templates. The use of brightness contours instead of edges and the corresponding contour profile diagram, or fingerprint, can provide mathematically nonintensive comparisons that can be efficiently performed in a database. This, the second of two papers, deals with data mining a library of known objects for matches and the final confirmation or rejection of the object identity. The first paper [1] outlines the general problem, the current solution and provides preliminary results. Keywords-data mining; machine vision; object recognition; SQL; database

I. INTRODUCTION Object recognition of general objects is a very complex process that humans and animals do well but which is not yet done well by machines. One approach is to define edges and use them, together with pixel data to reconstruct threedimensional geometries of objects. This has not been fruitful but machine vision has been industrially successful because it deals with objects whose orientation is influenced by lying flat on a conveyor or is fixed by some other stratagem. The twodimensional problem is solved relatively easily but the threedimensional problem is intransigent. This is exemplified by the bin-picking problem [2]. Here only one object in many possible orientations is considered but even so, it is not generally tractable. A recently developed approach [3] to this problem uses brightness contours in the image to be matched against an object library. In particular, the contours are represented as their second derivative versus distance along the contour. This representation allows a convenient and unique interpretation of the data, which makes it particularly amenable to searches in a SQL database. The implementation of this is discussed in sections IV and V while results of early trials are shown in section VI. II. FEATURE EXTRACTION A brightness contour for a monochrome image defines the trajectory of a line such that the grey level is constant at all points. For each image, all the contours are determined at grey levels from 20 to 220 in increments of 10. This produces hundreds of contours, some of them overlying others. This population is reduced by removing those contours that are

shorter than others and coincident with them over their full length. This leaves 50 – 100 contours for an image. The second derivative of these contours, the contour profile diagram or fingerprint, is then analysed to find the 60-100 most representative and distinctive features. A. Objects, Views, Features In order to describe an object, 20 views of the object are acquired, each view being normal to the face of a regular icosahedron. The faces of an icosahedron are rotated 36 degrees relative to their neighbours and it is found to be possible to interpolate over this distance for intermediate views of the object. Objects have a number of views (20) from the twenty faces of an icosahedron; views have a number of features (50 – 100) that are to be found on contours.

Figure 1.

Features on a contour.

B. Fingerprints An interesting representation of a contour is to plot the angle the contour turns from pixel to pixel. (The second derivative of the contour curve with respect to angle.) Figure 2 shows this plot for the circumscribing contour of the image greyed out in Figure 1. This contour starts at the lower left corner of the mug and moves vertically until it turns the first corner to go across the top of the mug. This is represented by the first spike (or lobe) in Figure 2. The other features can be related, by inspection, to Figure 1. For convenience, we call this abstracted curve the fingerprint of the contour.

manner, the importance of a number of features can be determined by the aggregate value of their individual weights.

Figure 2: Change of contour angle per pixel vs. contour length

III. DATA MINING From the observation that an educated English speaker knows about 60,000 words of which about a quarter are nouns [4], we can assume that there will be about 15,000 objects in the world (This ignores adjectives such as ‘tall’ and ‘short’.) Each object will have 20 views and each view will have 50-100 features. We therefore need to search through a library with about 15 to 30 million features. It seems apparent that the best choice to implement such a library is a database. Database engines are designed to handle such large collections of data and have highly optimised and efficient search algorithms. Some characteristics of the problem may afford us methods of making this search less daunting. The final search of the database can make use of the three pre-classifiers—area, skewness and kurtosis [1]—to select a relatively small number of objects from a database of candidates. These are described in the subsequent sections. A. Features Several major features can be extracted from the fingerprint. Those used in this paper we call lobes, arcs and lines and are described below. The lobes of the fingerprint are sections of the fingerprint where the curve makes an excursion from zero for some distance and then returns to zero. These relate to curves or corners in the image. They can be extracted easily and with reasonable precision. The area under the lobe, as well as the skewness and kurtosis, are scale and rotation invariant (within limits imposed by the pixelation process). Characteristics that have these properties can be developed for other possible features, for instance the eccentricity of an ellipse. We refer to these as discriminators. The line features in a fingerprint are those sections where the curve maintains a value of zero and relate to lines in the original image. While it may be difficult to determine accurately the length of the line feature (since it may be truncated) its identification as a line is much more robust. The area, skewness and kurtosis of a line are not useful measures since they are the same for all lines. However the orientation and position of a line or lobe can provide useful information.

B. Binning Mining of data from the database relies upon choosing a suitable key or index that can be used to extract useful candidate records very quickly from tens of millions. Speed of retrieving records is the main factor with a secondary requirement that the retrieval not exclude a significant number of valid records. As noted in the previous section, a feature can be characterised by discriminators, three of which we shall work with in this paper—the area, skewness and kurtosis. Simply searching a database for records that contain the correct values of these discriminators to within a suitable interval would be inefficient. A better approach would be to sort them into bins and search for an integer representing membership of a particular bin. A search through the database for candidate features now amounts to looking up an index for matching bin numbers. This can reduce the search time by orders of magnitude. The choice of the width of the bins is important since bins that are too narrow will result in measurement errors placing characteristics in the wrong bin. Conversely, bins that are too wide will result in loss of discrimination when searching the database, thereby returning too many candidates. The optimum size would appear to be the same as the measurement error. Where the measurement error is a percentage of reading rather than a percentage of full-scale error this will naturally lead to the bin sizes increasing with the size of the discriminator; the log of the bin width will be a constant. We refer to these as log bins. With log bins there are an infinite number of bins between any given value and zero. One ‘catch-all’ bin can be included to span the range 0 to the smallest log bin with the rest of the bins spanning the range to the largest bin. There is a further concern however. Regardless of the size of the bin, it is possible for the discriminator to be placed in the wrong one by being close to the bin’s edge. This can be overcome by considering membership to include not only the given bin but also the nearest neighbour. This will guarantee that, with the bin no narrower than the measurement error, a search of the database will turn up all relevant candidates, albeit that the discrimination is reduced.

The arc features in a fingerprint are those sections where the curve maintains a constant, non-zero, value. The centre of rotation and the radius are the only useful characteristics.

Databases are capable of concatenating search indices, which provides a neat way of searching for all three discriminators simultaneously. The disadvantage is that including the nearest-neighbour bin requires eight separate searches be undertaken, one for each of the eight possible combinations of two possible values (one bin and its nearest neighbour) of the three discriminators. Due to the nature of the methods used by the search engine, the eight separate searches are actually completed in far less time than any more obvious or simpler approach.

The measurement of feature characteristics, averaged over a very large set of features, can be used to determine the probability of a feature, with that particular combination of characteristics, occurring. The inverse of this probability can be used as a weighting to indicate this feature is uncommon and should be given more notice when discovered. In a similar

This scheme can be extended to handle different discriminators simultaneously. This can be done by concatenating the feature type on to the discriminator index. The eight separate searches can then simultaneously search for (say) lobes that use the ask discriminator as well as ellipses that use eccentricity as a discriminator.

Another possibility would be to create bins where the width is chosen so that the probability of a discriminator falling into any given bin is uniform. This would provide a maximum of discrimination over the entire range of the characteristic. This conflicts with the idea of intervals being the same size as the measurement error but, if the intervals are always larger than the measurement error, can reduce the number of bins and, therefore, the storage space. C. Geometry There are a number of geometric relationships between features in a group that can be used to exclude incorrect objects from the list of candidates. The relationships considered here use orientation and position of the feature but other possibilities include the radius (of arcs) or the order of features in a fingerprint. The simplest geometric relationship is the orientation of the brass and gold features. To use this one must first know the rotation angle between the brass object and the gold object, but this can be estimated in a number of ways including clustering (see next section). Two other relationships involve the angles and distances between features in a group. These require knowledge of the rotation angle or the scale factor, respectively, between the objects but, again, these can estimated by the clustering or enumeration methods. The distances to line features are rather harder to handle since the midpoint of the line is hard to establish. Given a point on a line and the line’s orientation, a unique distance can be calculated as the minimum distance between a feature and the line. IV. DATA MINING ALGORITHM With these ideas the following algorithm is suggested for returning a small number of possible candidates from a very large library. The process is: 1. 2. 3. 4. 5. 6.

extract features find candidate objects in the library, find candidate views and, calculate the suitability of feature pairings between the views, winnow bad feature matches by clustering or add good feature matches by enumeration, eliminate candidate objects or recognise them.

search is improved in efficiency by using only the half of brass features that have the highest weightings (least common). The search will therefore involve searching the library of several hundred thousand or more gold features for matches to 25-50 brass features. The resulting list of objects is ranked according to the aggregated weighting of all the matched features. The subsequent steps therefore start with the mostly likely candidate and work downwards. B. Find candidate views Having chosen a candidate object we now attempt to find the object view which best matches the brass image. This is accomplished by searching for all brass features that have the same discriminator values as those in the gold views. In this case, all the brass features are used, not just that half with the highest weightings. The resulting list of views is again ranked by aggregated weight and starts with the most likely candidate view. C. Calculate Feature Pair Suitability At this point, we have a the features of a gold image that we are testing against the features of a brass image. The objects in the two images will very likely be rotated with respect to each other and to have different sizes. The former we will call the rotation angle, θ, and the ratio of the latter we will call the dilation factor, δ. We will need to calculate the extent to which a given pair of gold and brass features match between the two images. The simplest comparison is that of feature type and feature characteristics; indeed it is trivial to create a list of gold/brass features pairs that have the same type and discriminator values. We pass on to tests of geometry beginning with the feature orientation. The difference in orientations between the brass and gold feature should equal to the rotation angle within uncertainty limits. This can be seen in Figure 5 where θ is the difference between the orientation of feature, f1, in the gold image and f1' in the brass image and also between f2 and f2'. The orientation of the features is shown by the arrows. The angle between features can then be tested, i.e. the angle of the line joining a given pair of features in the image. For each connecting line the difference in angles should be θ, the rotation angle. Gold

Brass ! !

This process is now considered in more detail. Feature extraction is dealt with in other papers [1,3], while finding candidate objects and views is covered here in preliminary form but has been extended in other work [5]. A. Find candidate objects The first data mining step finds a list of candidate gold objects by searching for the features of the brass image in the library. This is accomplished by searching for gold features with the same discriminators as the brass features. Since, in this research, only lobe features have discriminators—ask numbers —they are the only features that can be used at this point. This

!

f2' ! !

" f1

Figure 5.

f2

" f1'

Relationship between features in gold and brass images.

In a similar fashion we calculate the ratio of distances between any given two features in the brass image and the distance between corresponding features in the gold image. The ratio should be the dilation factor.

The features used in these tests have required an orientation and full position. Other feature types can be used such as lines and arcs. The perpendicular distance from a line to other features can be used as a test, with the gold/brass line pairs initially chosen from their orientation. Arcs do not have an orientation but the radius can be used as an initial filter and the angle and distance to other features can be used. Having calculated our measure of geometrical similarity for a given gold/brass pair of features, the task now becomes to eliminate the mismatched feature pairs. We have attempted two different methods of achieving this; clustering and enumeration. 1) Clustering Consider the rotation angle, θ, as the difference in orientations of brass and gold feature of each pair. If the objects are similar, this angle should be over-represented when randomly pairing gold and brass features. If this is the case then we can both find this angle and eliminate bad pairing by clustering these differences. That is to say, the average of the differences in orientation is found and assumed to be θ. The deviations in differences are taken from this and the feature pair with the largest deviation is removed. The process is then repeated until the largest deviation falls within acceptable limits or we run out of feature pairs. This step can be taken further by now looking at the line features in both views. Because of their constancy, the orientation of lines is far more precise than that of other features. All line feature/line feature matches are examined and those that are sufficiently close to θ are used. These are then clustered in the same way to achieve a better estimate of the rotation angle, θ. Now we can perform the same type of clustering operation on the dilation factor by considering the ratio of inter-feature distances between the gold image and the brass image. 2) Enumeration of combinations A simplistic, and possibly ill-advised, method of testing the matched feature pairs would be to enumerate the possible combinations of feature pairs looking for successful application of the geometry tests. For 75 feature pairs between the images there are in the order of 7575 possible combinations, a number not to be seriously considered. However, this is not the full story. We do not need to test all possible combinations. We can begin by testing pair-pair combinations first. If we find a pairpair combination that passes the geometry tests we can then carry on to look for a third feature pair that matches the other two. If there are no true matches in the list of feature pairs then, at the end of applying geometry tests to all possible pair-pair combinations, there will have been 752 failed tests; a far more tractable number. Furthermore, since the geometry tests are all symmetric we need only test half of these possibilities. And, finally, the order in which the pairs are tested is not important. If a set of feature pairs, abc, is a successful combination, then so will acb or cba. This reduces enormously the potential combinations to be searched.

There is one exception to this, which is the choice of the first pair-pair combination. This is used to determine the initial estimates of rotation angle and the scale factor. There will be one combination where the rotation angle and scale factor lie closest to the average of all the true matches. Since the geometry tests are passed or failed on a tolerance figure, we need to find this optimal combination; a condition that requires us to test all possible pair-pair combinations. Of course, as the number of successfully-tested feature pairs increases, so do the number of combinations to be tested but, beyond a certain level, the chances of the brass and gold images not matching become insignificant. To that extent a failure to enumerate all the possibilities within a given limit is, in itself, an indicator of a successful match. In any event, early testing has suggested that limiting the number of tests to 3 times the number of feature pairs squared is quite sufficient. All the feature pairs that remain have passed scrutiny. Their aggregated weight will be the most robust indicator of whether the objects in the two images are the same. V. EXPERIMENTAL A. Implementation The data mining was implemented in MySQL (available from http://dev.mysql.com.) because of its open source nature and cross-platform compatibility. Subsequent processing was performed in Microsoft Visual Studio 2005 using C#. The MySQL Connector Net 1.0.7 connector was used as the driver between C# and MySQL (available from http:// dev.mysql.com). B. Large database search The major bottleneck in the process is the speed at which the initial database search is conducted. A test of this speed was conducted by filling the gold database with up to three million features and performing searches. The characteristics of the features were randomly assigned. A brass image with 81 features was used. These 81 features were then searched for in the gold database using the ask search key and running all of the eight required searches sequentially. The time taken was displayed by the MySQL client program but, unfortunately, this was only to a precision of one hundredth of a second and only for the searches individually. Speed tests were performed with the major database table —the features table—on the computer’s hard drive and with this table contained entirely within physical memory (using the MySQL Memory table type). Both sets of tests inserted the results of the search into another table as would occur in practice. Tests were carried out on a 1.8GHz Powermac with 2GB of memory running MySQL version 4.1. The individual times for the searches were aggregated for the table shown in the Results section. C. Object recognition Experiments were carried out to determine the ability of the feature extraction and matching system to accurately identify objects. Instead of creating a large library of objects to search through, the task was reversed; one object was loaded into the library and 100 general images were tested for matches with the library object.

Two separate objects (Figure 6) were chosen for testing, each being loaded into the library separately and tested against the 100 images (which included the gold images themselves).

2 million

560±40ms

80±40ms

3 million

720±40ms

100±40ms

B. Object recognition The results of the object recognition experiments are summarised in the following tables. TABLE 4. OBJECT RECOGNITION RESULTS. Library object

Figure 6. Two objects used for testing; mug and measuring tape.

The 100 brass images were taken mainly from an office environment and included the two library objects in a number of the images at various scales, rotations and occlusions. Some examples are shown in Figure 7.

Mug

Mug (occluded)

Tape

Tape (occluded)

Images

11

23

30

11

Recognised

11

13

21

4

False positives

0

0

0

0

False negatives

0

10

9

7

TABLE 5. TIMES TO PROCESS 100 IMAGES. Library object Mug

Total time (s) Average time/image (s)

Measuring tape

64

61

0.64

0.61

VII. DISCUSSION The times taken for the initial database search (Table 3) are pleasingly small as was the linear increase with library size. The use of randomly-assigned values for ask would have resulted in larger numbers of features being returned than would normally be expected. (This is so since only the highestweight, and therefore most uncommon, ask values would normally be chosen for the search.) This result would therefore over-estimate the true time taken.

Figure 7.

The memory set aside for the database table (using the max_heap_table_size option) was found to be about 32MB per million features. Extrapolating both of these figures to a table of 15-30 million features (15,000 objects) would require 0.6-1.2s and 480-960MB. The first of these numbers will decrease over time as memory and processor speeds improve but the second will remain constant.

Example brass images

Tests were carried out on a 20” Intel iMac with a 2GHz Core Duo CPU and 1.5GB of memory. The MySQL database was version 5.0.22 for Mac OS X (i686). The C# program, Clementine, was run under Windows XP SP2 on the same machine using Parallels Desktop for Mac. Communications between the two programs was via the TCP/IP stacks of the two operating systems.

The object recognition results are reported elsewhere [1] but are included here for completeness. The time taken to load and process the 100 images is shown to be about 0.6s, of which about 0.2s is due to loading the images from disk and displaying them. Therefore the total time to find and display an object from a library of 15,000 objects would be about 1.2-1.8s.

VI. RESULTS

The results, while still preliminary, demonstrate that this method of data mining for generalised object recognition is very promising. Further improvements are possible and, in particular, another method for implementing the first database search is considered in another paper [5].

A. Speed tests Results of the speed tests can be seen in Table 3. TABLE 3. TIMES FOR MATCHING 81 FEATURES IN VARIOUS-SIZED GOLD LIBRARIES. Table type Number of gold features

Disc-based

Memory

1 million

380±40ms

40±40ms

VII. CONCLUSIONS A process was described to use the second derivative of image contours as ‘fingerprints’ and deriving features therefrom for image recognition.

A method for data mining a large library of these features was related and tested for both speed and efficacy. A library of 15,000 objects would require between 1.2 and 1.8s to search if the table indexes were kept in memory. Recognition accuracy was between 70 and 100% for unobstructed objects, falling to between 40 to 60% when partially occluded. VIII. REFERENCES [1]

R.C. Flemmer, H.H.C. Bakker, "Generalised Object Recognition," submitted to ICARA 2009.

[2] [3] [4] [5]

H. Saldner, “PalletPicker-3D, the solution for picking of randomly placed parts,” Assembly Automation, 23(1): pp 29-31 (2003). R.C. Flemmer & H.H.C. Bakker, "Sensing Objects for Artificial Intelligence," ICARA 2005, pp 687-690, (November 2005). B. Bryson, "The Mother Tongue: English and how it got that way," William Morrow and Co., New York (1990). J.W. Howarth, R.C. Flemmer, H.H.C. Bakker, "Feature-based Object Recognition," submitted to ICARA 2009.