machine learning can be exploited for data mining purposes and more ... KDD involves, as well as data mining, a previous process of data selection and ...
Data Mining for Automated Visual Inspection M. Bariani, R. Cucchiara, P. Mello, M. Piccardi Dipartimento di Ingegneria, University of Ferrara E-mail {RCucchiara, MBariani, PMello, MPiccardi}@ing.unife.it
Abstract. This paper addresses an automatic knowledge discovery process from a database of images in a context of Automated Visual Inspection (AVI). AVI is the field of computer vision addressing quality inspection of industrial products, even under informal quality models. When modelling informal knowledge, one of the most critical point turns out to be the correct and efficient translation of human experience into a set of rules. The paper focuses on the use of machine learning in inspection of industrial workpieces. It shows how machine learning can be exploited for data mining purposes and more specifically for selecting a minimal set of visual primitives, in order to perform reliable and robust classification of the inspected components. Eventually, the industrial application and the inspection system are presented in details.
1. Introduction Data mining relies on extracting meaningful, previously unknown information from large databases, which could be highly profitable for business application [1]. It is normally considered as a part of a larger process, namely knowledge discovery on databases (KDD). KDD involves, as well as data mining, a previous process of data selection and compression from wide warehouses, and several tasks of data transformation, in order to provide a more suitable description of the selected data. Since KDD has been defined as the process of extracting high level knowledge from low-level data, after the data mining step KDD aims at consolidating discovered knowledge by incorporating it in an on-line automatic system, or providing a friendly visualization for humans [2]. The basic aim is to substitute or, more realistically, support human decisions which involve significant experience on large databases, by means of automatic tools able to encode the human expertise. KDD and data mining are currently exploited in many different fields, such as finance, manufacturing, diagnosis or monitoring. An interesting and novel application domain is automated visual inspection and, more specifically, defect diagnosis and quality inspection of industrial products. Visual inspection is normally carried out by human inspectors which are able to perform the visual process and reach intuitive and experience-based decision. Conversely, human inspectors are not particularly capable in the purpose of inferring and formalizing the rules which they use for supporting decision. A typical example is presented in this paper: it addresses quality inspection of industrial products with the aim at finding surface defects, visible only under UV light. Observing the images of the pieces under UV light, only unstructured shapes are perceivable by human (or artificial) eyes. In this environment human inspectors are very efficient in distinguishing unstructured fluorescent shapes corresponding to possible defects from other similar ones exclusively due to “noise”, since they can exploit their large database of experience.
Scope of our work was to develop a computer based system which could provide the same reliable decision in classification on the basis of rules previously inferred by examples, under real-time constraints. The paper describes the application framework, and in particular, the real time requirements of the automatic quality inspection process together with the adopted computer vision approach. However, the main purpose of this paper is to show how the quality inspection process may be significantly improved by performing a typical data mining operation, i.e. supervised learning from the available image database of inspected workpieces. The paper is divided as follows: next section introduces some general concepts on automated visual inspection and describes our proposal for the adoption of data mining techniques. Section 3 introduces the specific application with reference to the real-time industrial context. Section 4 shows the use of automatic learning techniques, providing a further comparative analysis between available tools. Finally the last section provide final conclusions on results and performance.
2. Automated Visual Inspection Automated Visual Inspection (AVI) is one of the most explored application fields of computer vision for its importance in the automation of industrial production lines. As this discipline is essentially of industry interest, AVI systems undergo typical industrial requirements such as cost and space restrictions. Approaches used in AVI tasks are often classified as quantitative and qualitative [3]: a quantitative inspection has the goal of extracting measures of specific, well-known features of objects such as areas or perimeters, starting from images; instead, qualitative inspection relies on decision process based on a set of visual information, generally learned by human experts: in these cases, this information is not quantitatively well defined, depending on the human training and are generally not stable enough for different context changes. This second class implies much more complex tasks, but results very important since embraces a very large set of applications, starting from integrity inspection, defect detection, product classification, up to guidance of mobile robots. Automated inspection is a typical problem of Model-based or Knowledge-based Vision [4,5]. It aims at matching information extracted from sensed images with a-priori models of the target; however, in many industrial contexts the models of objects or shapes to be recognized are only qualitatively defined by the humans inspectors. We can describe the computer vision process in the AVI context by means of several steps: 1. Defining the reliable, stable and minimal set of features, able to describe the model; and correspondingly eliciting visual primitives useful for representing the defined features. 2. Detecting and implementing computer vision techniques suitable for computing the visual primitives with enough accuracy and which satisfy the application (real) time requirements. 3. Implementing the decision making process by adopting intelligent techniques for managing symbolic and numeric data. These three phases are traditionally developed on the basis of the human experience and the support of Computer Vision and Artificial Intelligence scientists. Nonetheless while many yet stable techniques and tools are available for point 2 and 3, and more specifically for extracting features by images and classifying them, the most unpredictable and critical point is often the first one since it must encode the human experience, using heuristics or uncertain models.
For this reason the automated visual inspection process may be substantially improved by adopting some techniques, such as automatic learning, rule induction and predictive modelling, defined in a data mining context. qualitative models
human experience
eliciting salient visual primitives
IMAGE DATABASE
AVI implementing visual system for feature extraction
classification of symbolic features
implementing a visual system for extracting a redundant set of features
discovery-driven data mining techniques ( supervised induction)
decision Fig. 1. Using data mining in automated visual inspection. To this aim, we propose an approach that can be described as in Fig. 1: in the left part of the figure, the AVI process is schematized by indicating the three previously discussed phases. In the right part of Fig. 1, we indicate the adopted data mining process: starting from a vast database of images a large set of often redundant features is extracted, by means of specific computer vision techniques, and represented by tuples of values. Secondly, a possibly complete training set is extracted from the image database and is pre-classified. The tuples of the training set are used as input for a process of supervised induction which aims at providing automatic data classification. This goal is typically associated with data mining and in particular with “discovery-driven datamining operation” [1]. This category includes several techniques such as neural networks or symbolic learning algorithms [6]. We adopted supervised learning with the following purposes: • It may induce the rules that will be used in the classification task of quality inspection. In particular it selects suitable thresholds on feature values, able to discriminate the presence of defects. The rules include also a certain degree of “fuzziness” in order to integrate the uncertainty of the model. The advantage of the symbolic approach is that its rules can be directly understood and validated by experts. • It operates a sort of knowledge deeming and filtering since this process may reduce the set of required visual features, by pruning those features which are not mandatory for the classification. This result can be directly exploited in the a-posteriori selection of visual primitives which must be extracted from images by computer vision operators: therefore it makes more reliable, stable and higher speed the quality inspection process.
3. The application of quality inspection The application goal is visual quality inspection of metallic industrial workpieces and in particular the location of surface and subsurface discontinuities in ferromagnetic materials. This target can not be reached by normal, visible-light inspection but it is usually accomplished by adopting a “Magnetic-Particle Inspection” technique (MPI) [7]. First, the piece is magnetized and dipped in a water suspension of fluorescent ferromagnetic particles; then, it is exposed under ultraviolet light and examined by a human inspector. When surface or subsurface defects are present, they produce a leakage field that attracts and concentrates the ferromagnetic particles. Defects can then be easily perceived by the human eye, since ultraviolet light greatly enhances fluorescence. The AVI system must classify workpieces by solving problems at two levels: first, the system must focus on particulars of interest by filtering out non relevant image aspects, such as contours of the inspected component, luminance reflectance, and other objects possibly present in the scene. Secondly, defects must be discriminated by other similar unstructured shapes which are due to noise, surface roughness, friction or to the presence of spurious magnetic fields. Hereafter we address the former item, by describing the acquisition subsystem; the latter will be described in section 4. When approaching automated visual inspection, the acquisition subsystem must be endowed with similar or better sensitivity then that of the human eye. In fact, modern CCD sensors are sensitive to a broader luminance band, ranging from near infrared to ultraviolet. In the observed scene, three main luminance bands are present: high-intensity reflections of ultraviolet light generated by the luminance source; substantial amount of infrared luminance generated both by the high-temperature, ultraviolet lamps and by heat sources present in the industrial environment (like engines, pumps); finally, the desired fluorescence, belonging to the band perceivable also by the human eye. Therefore, accurate optical filtering has been used to select the band of interest. In order to assess the efficacy of filtering, Fig. 2 shows two images acquired from a piece exposed under ultraviolet light: in Fig. 2.a no optical filter is used, and the intensity of the defect is overwhelmed by the undesired luminance due to spurious radiation; in Fig. 2.b, the defect evidence is strongly enhanced by eliminating undesired luminance frequencies.
Fig 2.a
Fig. 2.b
Fig. 2. Defect image without and with filtering.
Sets of lamps have been installed in order to uniformly distribute luminance on the workpiece, as shown in Fig. 3. Due to the low intensity of fluorescence, highly sensitive sensors are required: the carried-out experiments have assessed that off-the-shelf CCD cameras and frame grabbers hosted by commercial PCs offer sufficient sensitivity and resolution, not requiring the high costs associated with dedicated acquisition subsystems. Fluorescent object
Ultraviolet source
Reflector
Ultraviolet source
Optical fiters CCD Camera
Reflector
Fig. 3. The image acquisition subsystem. As the piece can not be inspected under a single viewpoint, several poses must be planned as a function of the specific component types, that vary in shape and dimensions. Each pose must offer sufficient resolution to reveal even minimum-size defects, without introducing significant geometrical distortion. As opposing requirement, the number of views must be limited in order not to heavily increase the computational load. The inspection process must be carried out in real time in order to satisfy the production throughput requirements. The inspection time per piece includes two main contributions: the time required to compute the visual primitives defined at point 2 of the previous section; the time to classify the piece as defective or non defective on the basis on the computed values, according to point 3. The former task results computationally expensive at run-time because of pixel-based operations, while the latter introduces negligible overhead. MAGNETIC FIELD
fluorescent particles suspension
REJECT U.V. light visual inspection
OK
camera frame grabber
Fig. 4. The Automated Visual Inspection system.
robot
Therefore the goals of the visual system are • minimizing the time requested for executing the complete classification of an image: this time is affected by the choice of which visual operators to implement; • performing a reliable classification with respect to the on-line inspection requirements (for instance the maximum percentage of rejections, or the acceptable percentage of errors both for positive and negative misclassifications1) • performing real-time inspection with low-cost, commercial equipment. A description of the process is indicated in Fig. 4.
4. Classification by visual primitives The defect shape was a-priori known by means of a qualitative model provided by human inspectors. It was defined as a thin, roughly rectilinear and very bright shape. On the basis of this rather generic model, we devised a set of visual primitives that can be used for describing the defect proprieties. In particular from a computer vision point of view, a crack can be defined as a bright shape, i.e. with a high local gradient of luminosity in the proximity of its edges, rectilinear, i.e. with two main edges approximately straight, thin, i.e. with a small distance between the two main edges. The adopted approach is based on the Gradientweighted Hough Transform, and some suitable modifications [8]. Hough Transform is a space transformation from the pixel image space to a parametric space: “collinear” points of the image space are collected into a single point of the parametric space, where each point is individuated by two coordinates, indicating the slope of the possible straight line and its distance from the origin. Each point is represented by a value which is proportional to the number of collinear points and their luminosity gradient. Therefore peaks in the Hough space reveal the existence of straight lines in the image space. A possible crack with two edges having similar luminosity variation (but with opposite orientation) is represented by two peaks in a pre-determined reciprocal position. For more references on the computer vision process see [9, 10]. A first prototype was based on the extraction of those features; the features were classified by means of fuzzy rules empirically defined. Fuzzy rules are able to represent a certain degree of uncertainty or “fuzziness”. It is normally expressed in fuzzy classifiers by adopting fuzzy thresholds instead of crisp, on-off thresholds on the variables representing the objects to classify [11]. The fuzzy rules implemented in this inspection process were based on physical and empirical considerations; they are reported in Table 1. The meaning of the word low, high and enough high in Table 1 suggest the use of adequate thresholds, empirically tuned. It should be noted that that rules indicate not only “obvious” classifications, corresponding to high contrasted shapes with high rectilinearity associated with. The fourth rule in fact would include also the case of short cracks which manifest a rather high (but not very high) Hough value and a correspondent enough-high gradient value; while the last rules indicates that the previous (enough high) Hough values together with a “too high” image luminosity variation must state the presence of a noisy image instead of a possible defect. Explanatory examples are shown in Fig. 5.
1
A positive misclassification is a non-defective object classified as defective, while a negative error corresponds to validate a defective piece.
→NoDefect →NoDefect AND ) →Defect ( AND < Hough_Value enough high>) →Defect ( AND < Hough_Value high>) →NoDefect Table 1. Fuzzy rules for classification.
Fig. 5. A long crack, a short crack and a noisy image without cracks. This solution achieved sufficiently good classification results but also manifested some limitations; firstly, the lack of generality, since thresholds were tuned on the basis of very few manually inspected samples. Moreover, since a systematic manual pre-classification is very hard and costly, no validation of the classification process was carried out about the correctness of the fuzzy thresholds, the selected rules, and the features used within the rules, for reasons of time. For all these motivations we decided to address the data mining process.
5. Data mining on workpieces images The adopted approach can be viewed as a typical task of knowledge discovery on an image database, aiming at providing a robust and efficient classifier. According to the scheme of Fig. 1 it is composed of two parts: the former consists of selecting a possibly redundant set of features that could be taken into account for the considered crack model; the latter consists of performing a process of discovery-driven data mining, and in particular of supervised learning, able to achieve valid classification of a selected set of images. 5.1 Visual primitives The considered features are extracted according to many well-known computer vision approaches, together with specifically implemented algorithms more suitable to the target visual inspection. They can be summarized as 1. First_Hough_Peak: It is computed as the highest peak in the Hough space after a transformation performed between 0 and π degrees; it indicates the “rectilinearity” of the first edge, by supposing a crack brighter than the background (and thus with a raising luminance variation). It measures the amount of collinear points together with their average gradient. 2. Second_Hough_Peak: It is the peak in the Hough space between π and 2π at the ideal point were a second straight edge should be found, by having supposed the existence of an ideally straight defect.
3. Second_Hough_Average: this feature represents an average value of the Hough value in a suitable neighbourhood of the point devised by point 2. It accounts for possible not-ideality both in straight shape and in thickness 4. Correlated_Hough_Peak: this is a specifically developed visual primitive performed by means of correlation in the Hough space [9]. The value measures the concurrent presence of two parallel edges approximately of the same length and same luminosity gradient with a mutual distance included between a minimum and a maximum value. 5. Thickness the mutual distance between the First_Hough_Peak and the highest peak in the neighbour considered in point 3. It could represent the object thickness. 6. Number_of_Points: The number of voting points of the First_Hough_Peak, which should estimate the edge length. 7. Average_Vote: The average “vote” of the voting points, i.e. the average Gradient of each point voting for a single Hough point: it is computed by dividing the value of point 1 by the Number_of_Points. 8. Average_Image_Gradient: The average gradient of the image is a different propriety with respect to the others, since it is global, meaning that it is an overall feature of the whole image. It should be used as a corrective weight, since images containing cracks have a “high” average gradient but not “too high”, since for the MPI process the fluorescence is collected only in the zone of a possible crack and is not distributed on the surface. 5.2 Data mining with automatic learning tools: C4.5 The second step of our approach consists of the use of an automatic classification system. A classification system should emulate the human ability of classifying instances of the world, on the basis of the values assumed by the attributes that describe the instance itself. As the human expert does, the classification system need to learn how to classify correctly. This is generally accomplished by giving it a set of resolved problems, called training set. This can be thought as a collection of couples of the form: , . Consulting the training set, a classification system builds a decision tree: every non-leaf node corresponds to a test made on the value assumed by a particular attribute. Leaf nodes correspond to classes. Given any instance of the training set it is possible to visit the tree starting from the root, choosing at each node the branch corresponding to the value of the attributes and finally reaching the leaf node that corresponds to the correct classification. Moreover, since real life problems tend to generate very large trees which are sometimes difficult to understand due to the fact that each node must be considered in a specific context deriving from the outcomes of tests made at antecedent nodes, the system is generally able to deduce a set of production rules from the decision tree. Moreover, many implementations of learning systems offer the possibility to use some degrees of heuristics in pruning the decision tree and in generating production rules. Although efficient, this heuristic approach tends to produce inaccurate results. Generally, the purpose is to find a compromise between accuracy and efficiency. Now if we assume the hypothesis that the training set is sufficiently representative of the world, we can assert that, given any new instance of the world, the classification system will be able to classify it correctly. Intuitively, the more larger the training set is, the more valid the latter hypothesis is. In our application, the rules of classification to assess the presence of defects in components have been constructed by using this classification system approach. To this aim, two classes of interest have been defined, namely Defect and NoDefect, and objects to be classified are extracted from images in order to be submitted to classification.
In this approach the problem can be stated in this way: given a database of images containing several points of interest, or objects (in our case unstructured shapes), each object is defined by a set of attributes, or visual features and thus is described by a tuple of values . The set of tuples represents the training set used by the automatic learning tool for inferring rules which can then be exploited by the automatic inspection classifier. We have experimented a machine learning approach by using the C4.5 system, a tool developed at the Irvine University of California by Quinlan and Cameron-Jones [12]. C4.5 generates a classification system represented in form of a decision tree: each leaf indicates a possible class while each decision node specificates a single or a set of tests to be carried out on a single attribute value with one branch and subtree for each possible outcome of the test. The input used for C4.5 is shown in Fig. 6: the first row indicates the name of the possible classes while the other entries specify the attribute name (i.e. the visual computed feature) and nature. This can be “continuous”, i.e. with a numeric value, as in the considered examples, or can be defined with an enumerative symbolic set. Defect, NoDefect. First_Hough_Peak: Second_Hough_Peak: Second_Hough_Average: Correlated_Hough_Peak: Thickness: Number_of_Points: Average_Votes: Average_Image_Gradient:
continuous. continuous. continuous. continuous. continuous. continuous. continuous. continuous. Fig. 6. Input for C4.5.
The tuples given as input to C4.5 have been evaluated on an adequate training set, i.e. a set of images acquired under UV light representing various poses of the workpieces. Images were acquired under different external environments of the on-line fabrication process. Only a subset of them was containing defect: from the image database 317 tuples of features were extracted by means of the vision system, of which 21% representing an actual defect. The result of running C4.5 is an initially un-pruned tree. Often, decision trees could result over-specialised in the sense that it overfits the data by inferring more structure than it is justified by the training case [12]. The limit case is to have a leaf for each classified example of the training set; that is a sort of simple storage of the training set without any generalization inferred. In order to avoid this possibility, several parameters can be tuned within C4.5, for instance by operating a further pruning phase or by indicating the minimum number of examples which must be used for generating a new leaf. Obviously these operations provide both generalization and simplification of the classification process and thus it may increase the error rate estimation. In addition, C4.5 allows for extracting production rules from the decision tree. This operates a further generalization since it extracts simpler and more compact rules while at the same time indicates an explicit way for classifying the target object. This is one of the most important advantages of symbolic methods that, differently from connectionist approaches, create models which can be understood and thus also validated by human experts. Fig. 7 indicates the rules produced by C4.5 and the confidence percentages for each rule with respect to the training set.
C4.5 (Correlated_Hough_Peak > 17.73) (Average_Image_Gradient < 47.05)
→Defect
(92.6%)
(Correlated_Hough_Peak 44.67)
→Defect
(99.0%)
(Correlated_Hough_Peak>78.62)
→Defect
(95.3%)
(Correlated_Hough_Peak 45.69)
Æ NoDefect Æ NoDefect
(Correlated_Hough_Peak78.62)
ÆDefect
(Correlated_Hough_Peak > 8.81) (Average_Image_Gradient < =45.69) (First_Hough_Peak >1.01)
Æ Defect
(Correlated_Hough_Peak > 9.46) (Average_Image_Gradient < =47.05) (Thickness >=3) (Second_Hough_Average>11.92)
Æ Defect
(Correlated_Hough_Peak > 8.81) (Correlated_Hough_Peak < 11.04) (Average_Image_Gradient < =44.9) (Average_Image_Gradient >44.67) (Correlated_Hough_Peak< 25.22) (First_Hough_Peak >1.53)
Æ Defect Æ Defect
Fig. 8. FOIL classification rules for the Defect Class. FOIL for NoDefect (Correlated_Hough_Peak