Knowledge Discovery from Diagrammatically Represented Data

2 downloads 0 Views 1MB Size Report
Michael Anderson. Department of Computer ... Bronx, NY 10458 anderson@trill.cis.fordham.edu ..... and Diagrammatic Representations, Orlando, FL. October.
Knowledge Discovery from Diagrammatically Represented Data Michael Anderson Department of Computer and Information Sciences Fordham University Bronx, NY 10458 [email protected] Abstract Knowledge discovery from diagrammatic data can be facilitated by a language that permits queries on such data. Such a language (Diagrammatic SQL) is being developed to expedite the development of an autonomous artificially intelligent agent with a capacity to deal with diagrammatic information. This language is described and examples of how it can be used to facilitate diagrammatic data mining are detailed.

1. Introduction In our ongoing investigation into the development of an agent with full diagrammatic reasoning capabilities comparable to those of human beings, we are currently focusing our attention on systems that allow users to pose queries against diagrams, seeking responses that require information to be inferred from diagrams ¾ diagrammatic information systems (DIS) [3,4]. We are developing a core diagrammatic information system that remains diagram and domain independent, capable of accepting domain dependent diagrammatic and non-diagrammatic knowledge. In this way, each body of knowledge produces a new instantiation of the diagrammatic information system knowledgeable in the particular domain and diagram types represented by this knowledge. Our first instantiation of a diagrammatic information system, for example, is informed about cartograms (maps representing information as grayscale or color shaded areas) of the United States (Figure 1). Other domains might include suites of diagrams representing systems such as automobiles, the human body, integrated circuits, buildings, ships, etc. An important facet of this research has been the development of a language in which to express queries against diagrammatic information. Diagrammatic SQL (DSQL) [3] is our extension to Structured Query Language (SQL) that supports querying of diagrammatic

information. Just as SQL permits users to query information in relations of a relational database, DSQL permits a user to query information in diagrams. We have chosen to extend SQL for use as our query language for a number of reasons. The grammar of SQL has a remarkable fit to the uses we wish to make of it. It is a reasonably intuitive language that allows specification of what data you want without having to specify exactly how to get it. It is a well-developed prepackaged technology whose use permits us to focus on more pressing research issues. SQL's large installed base of users provides a ready and able audience for a fully developed version of DSQL. As DSQL extends SQL, the SQL substrate can be used to query non-diagrammatic information, permitting heterogeneous data retrieval. The availability of immediate and imbedded modes provide means to use the system responses for both direct human consumption and further machine processing. Lastly, the availability of natural language interfaces to SQL will allow a diagrammatic information system to provide an even more intuitive interface for its users. An interesting outgrowth of the development of DSQL is the capability of using this language to facilitate knowledge discovery from diagrammatic information. Given a database of diagrams, DSQL can be used to perform a number of data mining tasks including data cleaning, integration of heterogeneous data, retrieval of data relevant to the data mining task at hand, diagrammatic concept hierarchies development, generalization of diagrams through these hierarchies, computation of interestingness measures of discovered knowledge, and visualization of the discovered patterns.

2. A diagrammatic information system As an example of a diagrammatic information system and its use of DSQL, consider the diagram in Figure 1. This is a cartogram (from The Weather Channel web site

Figure 1. Optimal planting in late May http://www.weather.com) that depicts, in five colors, regions in the United States where it was optimal to plant various crops in late May, 2000. As detailed in the accompanying key, dark red denotes tomatoes and annuals, yellow denotes corn and beans, dark green denotes broccoli, light green denotes potatoes and lettuce, and light red denotes strawberries. Given this diagram as input to the system, as well as the semantics of the colors in this particular diagram, posing the query "Which states had regions optimal for planting strawberries in late May?" elicits the diagram in Figure 2 as a response from the system. In this diagrammatic response, each state that had regions optimal to plant strawberries in late May is represented by its shape in black positioned where the state lies within the United States. Figure 1 is input to the system as a pixmap (pixel map) and stored as such. The system is supplied with the semantic mapping of the colors of the diagram to the crop types present and diagrams of these colors are stored. Using this information, the input diagram can then be parsed into five diagrams, each comprised of a single color. Each of these diagrams represents, then, the location of the optimal planting of a particular crop within the United States in late May. Figure 3, for example, shows the diagram resulting from this parsing that represents the locations of optimal planting regions for strawberries in the United States in late May. A priori diagrammatic knowledge required to respond to this example query is comprised of a set of diagrams that represent the shape and locations of each state within the United States. Figure 4 is an example of such a diagram that shows the location of the state of New York within the United States by marking its area on the map in black. There are forty eight such state diagrams (along with a diagram for the capital region) as this domain pertains to the continental United States only. The one-time effort of developing this set of diagrams is small when weighed against the fact that it can be used to query all past, present, and future US diagrams on the Weather Channel web site.

Figure 2. Response to query: "Which states had regions optimal for planting strawberries in late May?" The query "Which states had regions optimal for planting strawberries in late May?" is presented to the system as the DSQL query : DSELECT state FROM us WHERE crop = strawberries AND season = late_may

Intuitively, this query requests that the system display all states in the US that satisfy the condition that they contain a region in which strawberries are planted in late May. The response to this query is generated by comparing each of the state diagrams with the diagram representing strawberries in late May using primitives derived from the theory of inter-diagrammatic reasoning (detailed in the next section). When a state diagram intersects the late May strawberry diagram, the semantics of the domain dictate that that state contains a region that was optimal for planting strawberries in late May. All such states are then accumulated onto a single diagram and presented to the user as the response to the query. In this manner, diagrammatic responses can be generated for a wide variety of queries concerning crops in the United States including “Which states did not have regions optimal for planting broccoli in late May?”, “How many states had regions optimal for planting corn and beans in late May?”, “Did Rhode Island have a region optimal for planting strawberries in late May?”, “Which crop had regions optimally planted in late May in the most states?”, “Did any states have regions optimal for planting strawberries and potatoes in late May?”, “Which states had regions that were optimal for planting either corn or tomatoes in late May?”, “Did more states have regions optimal for planting strawberries than tomatoes in late May?”, “Which states had regions optimal for planting corn but not potatoes in late May?” , etc. After a concise description of the theory that provides the foundation for DSQL, we use this example system to show how DSQL can help facilitate knowledge discovery from diagrammatic information.

Figure 3. Location of optimal planting regions for strawberries in late May

3. Inter-diagrammatic reasoning Our currently chosen approach gleans knowledge from diagrams by directly manipulating spatial representations of them. This approach is motivated by noting that, given diagrams directly input as pixmaps ¾ an ability required of an autonomous diagrammatically capable agent ¾ any translation into other representations will require some form of direct manipulation of these pixmaps. In many cases, this translation is superfluous. Given this approach, we store input pixmaps directly with no further abstraction. This strategy not only allows us to manipulate these spatial representations directly but, should the need arise, it will allow us to translate to any other representations as required. We use, as a basis for this direct manipulation of diagrams, an inter-diagrammatic reasoning approach. Inter-diagrammatic reasoning (IDR) [3,5,6,7] defines diagrams as tessellations (tilings of finite subsets of two-dimensional space). Individual tesserae (tiles) take their values from an I, J, K valued subtractive CMY color scale. Intuitively, these CMY (Cyan, Magenta, Yellow) color scale values (denoted vi, j, k) correspond to a discrete set of transparent color filters where i is the cyan contribution to a filter’s color, j is the magenta contribution, and k is the yellow contribution. When overlaid, these filters combine to create other color filters from a minimum of WHITE (v0,0,0) to a maximum of BLACK (vI-1, J-1, K-1). IDR leverages the spatial and temporal coherence often exhibited by groups of related diagrams for computational purposes. Like diagrams are combined in ways that produce new like diagrams that infer information implicit in the original diagrams. The following unary operators, binary operators, and functions provide a set of basic tools for the process of IDR. IDR binary operators take two diagrams, d1 and d2, of equal dimension and tessellation and return a new diagram where each tessera has a value v that is some function of

Figure 4. Location of New York in the US the values of the two corresponding tesserae, vi1, j1, k1 and vi2, j2, k2 , in the operands. Ÿ OR, denoted d1 - d2, returns the maximum of each pair of tesserae where the maximum of two corresponding tesserae is defined as vmax(i1,i2), max(j1, j2), max(k1,k2) . Ÿ AND, denoted d1 . d2, returns the minimum of each pair of tesserae where the minimum of two corresponding tesserae is defined as vmin(i1,i2), min(j1,j2), min(k1,k2) . Ÿ OVERLAY, denoted d1 + d2, returns the sum of each pair of tesserae where the sum of values of corresponding tesserae is defined as vmin(i1+i2, I-1), min(j1+j2, J-1), min(k1+k2, K-1). Ÿ PEEL, denoted d1 - d2, returns the difference of each pair of tesserae the difference of values of corresponding tesserae is defined as vmax(i1-i2, 0), max(j1-j2, 0), max(k1-k2, 0). Ÿ NOT, denoted ¾ d, is a one place operator that returns the value of º - d, where º (the maximum diagram) denotes a diagram equal in tessellation to d containing all BLACK-valued tesserae. Ÿ NULL, denoted C(d), is a one place Boolean function taking a single diagram that returns TRUE if d =à , where à (the null diagram) denotes a diagram equal in tessellation to d containing all WHITE-valued tesserae, else it returns FALSE. Ÿ ACCUMULATE, denoted a (d, ds, o), is a three place function taking an initial diagram, d, a set of diagrams of equal dimension and tessellation, ds, and the name of a binary diagrammatic operator, o, that returns a new diagram which is the accumulation of the results of successively applying o to d and each diagram in ds. Ÿ MAP, denotedl (f, ds1,...,dsn), is an n+1 place function taking an n-place function f and n sets (of equal cardinality) of diagrams of equal dimension and tessellation, dsi, that returns the set of values resulting from application of f to each corresponding n diagrams in ds1,...,dsn.

Ÿ FILTER, denoted v (f, ds), is a two place function taking a Boolean function, f and a set of diagrams of equal dimension and tessellation, ds, that returns a new set of diagrams comprised of all diagrams in ds for which f returns TRUE. Ÿ ISOLATE, denoted i (d,di,j,k), is a binary operator taking two diagrams d and di,j,k (a diagram covered in vi,j,k-valued tesserae) that returns a diagram where tesserae corresponding to those in d with value vi,j,k (vi,j,k ! WHITE) have a non-WHITE value and all other tesserae are WHITE. This is compound operator whose functionality is achieved by creating a diagram dBLACK-1 covered in BLACK - 1 valued tesserae, and returning the value of ¾ ((d - di,j,k) - (di,j,k - d)) - dBLACK-1. DSQL queries are compiled into equivalent IDR operations and these are executed to produce appropriate responses. For example, the query "Which states had regions optimal for planting strawberries in late May?" can be represented in IDR operators as: a(Ã, v(k(x) ~C(a(Ã,l(i,v(k( y) (y = late_may), season), v(k(z) (z = strawberries), crop)), +) . x, states), +).

4. Knowledge discovery with DSQL Discovery of knowledge in data, diagrammatic or otherwise, can be decomposed into a sequence of steps: data cleaning (removal of noise and inconsistent data), data integration (combining of multiple data sources), data selection (retrieval of relevant data), data transformation (conversion of data into appropriate forms for mining), data mining (extraction of patterns in data), pattern evaluation (computation of the interestingness of patterns), and knowledge presentation (visualization of patterns) [11]. DSQL facilitates each of these steps for knowledge discovery in diagrammatic data.

4.1. Data cleaning Diagrammatic data is subject to both pixel-level noise (when dealing with pixmaps) and knowledge-level noise. Pixel-level noise relates to inconsistencies in the data that arise from file reading errors, scanning errors, etc. As this noise is difficult both to detect and remove, the system must accommodate it through the specification of thresholds instead of absolutes. Knowledge-level noise relates to irrelevant aspects of diagrammatic data. Figure 1, for example, contains much information that is irrelevant to the system's current goals including The Weather Channel logo, title and key, non-US geography, and bodies of water. DSQL removes this noise from consideration by the nature of its a priori knowledge ¾ only regions of the diagram that are of interest are represented in this knowledge. Figure 5 shows these

queriable regions in the current knowledge-based noise removed.

domain

with

4.2. Data integration Sources for data to be mined may differ in various ways including their physical locations, logical configurations, storage and access paradigms, and representation schemes. DSQL's SQL substrate permits querying of all SQL-accessible data and enables DSQL to query both diagrammatic and non-diagrammatic data simultaneously. Thus DSQL can be viewed as an inter-lingua abstraction that can be use to achieve integration of diagrammatic and non-diagrammatic data by furnishing a homogeneous interface to this data. Given appropriate data sources, DSQL permits queries on heterogeneous data such as "What was the average temperature of states that had regions optimal for strawberries in late May?" or "Of those states that had regions optimal for planting tomatoes in late May, which had the highest per captia income?" For example, this latter query can be expressed by imbedding it into a SQL query that uses the list of states returned by DSQL to query the per capita income of the relation as follows: SELECT MAX(pci) FROM us WHERE state IN (DSELECT state FROM us WHERE crop = tomatoes AND season = late_may)

4.3. Data selection As data mining an entire data set without discrimination can be time consuming and return many patterns irrelevant to the data mining task at hand, DSQL, by its very nature, provides a means to select relevant data subsets. The query detailed previously is an example of the retrieval of such a subset. Further, given its data integration capabilities, DSQL can draw this subset from both diagrammatic and non-diagrammatic data.

4.4. Data transformation It is often useful to the discovery process to provide background knowledge concerning the domain of the data to be mined and this knowledge is often provided as concept hierarchies or mappings from sets of low-level data to higher-level more general concepts [10]. The a priori knowledge providing the location, shape, and size of states within the United States is an example of one layer of a diagrammatic concept hierarchy. It maps sets of pixels (low-level data) into states (higher level concepts). DSQL can facilitate creation of further levels of concept hierarchies through its CREATE VIEW feature as a DSQL diagrammatic view can serve as a generalization of the diagrams from which it is created. For example, the following DSQL statement creates the view shown in Figure 6: CREATE VIEW newengland AS

Figure 5. Planting diagram with noise removed DSELECT state FROM us WHERE state = connecticut OR state = maine OR state = massachusetts OR state = newhampshire OR state = vermont OR state = rhodeisland

Using this view, any state in the set {Connecticut, Maine, Massachusetts, New Hampshire, Vermont, Rhode Island} can be represented more generally by the diagrammatic concept of newengland. Generalizations from the tessarae level (pixels, in the current example) to diagrams (sets of tesserae) to sets of diagrams can permit pattern discovery and presentation at more meaningful and understandable levels.

4.5. Data mining Methods for finding patterns in cleaned, integrated, selected , and transformed data can be classified into two categories: descriptive data mining, where the object is to construct a concise description of the general properties of a set of data, and predictive data mining, where the object is to construct a model that can be used to help predict the behavior of new data sets [13]. Continuing our investigation into the application of machine learning techniques to diagrammatic data [8], we are currently exploring predictive data mining from temporal diagrammatic data [2]. Given a sequence of related diagrams representing change in some characteristics over time, how can earlier values for these characteristics be used to predict later values? Our current approach induces rules based upon the values for various characteristics for the relevant data set over the sequence of given diagrams. DSQL permits retrieval of diagrammatic data across the search space of these characteristics and comparison of this data to the data relevant to the data mining tasks. An example is presented in the next section.

4.6. Pattern evaluation Not all patterns discovered through data mining are equally interesting. Less interesting patterns can be

Figure 6. DSQL newengland view pruned by interestingness measures that estimate the simplicity, certainty, utility, and novelty of discovered patterns. Two such measures are confidence and support [1]. Confidence is a measure of certainty that assesses the validity of the pattern. Given a rule if A then B, the confidence of that rule can be expressed as the number of data items containing both A and B divided by the number of data items containing A. Support is a measure of utility that assesses the potential usefulness of a pattern. Given a rule if A then B, the support of that rule can be expressed as the number of data items containing both A and B divided by the number of total data items. DSQL facilitates the computation of both confidence and support by providing a means to count the number of items that satisfy a given condition. Both measures are computed using DSQL queries in the example data mining task that follows.

4.7. Knowledge presentation Finally, interesting patterns found via data mining must be visualized in ways that make the discovered knowledge clear. The visual nature of DSQL responses can be exploited to help provide such visualizations. In the following example, DSQL provides a straightforward means to visualize the confidence and support of the discovered knowledge.

5. An example Figures 7 and 8 are cartograms from The Weather Channel web site that depict, in various colors, regions in the United States where it was optimal to plant various crops in late April, 2000 and early May, 2000. As detailed in the accompanying key, in Figure 7 orange denotes citrus, dark red denotes tomatoes and annuals, yellow denotes corn and beans, dark green denotes broccoli, light green denotes peas and onions, pink denotes strawberries, and blue denotes regions too cold to plant anything. In Figure 8, red denotes tomatoes and annuals, yellow

Figure 7. Optimal planting in late April denotes corn and beans, dark green denotes broccoli and roses, light green denotes peas and potatoes, and pink denotes fruit trees. Together with Figure 1, these three cartograms can be viewed as a portion of a database of time-sequence data concerned with optimal planting of crops in the US. The order of this data is based upon the time of year to which each cartogram pertains ¾ late April (Figure 7), early May (Figure8), and late May (Figure 1). Given this database, we set for ourselves the predictive knowledge discovery task of finding indicators earlier in the sequence that can help predict which states will have regions optimal for planting strawberries at the end of the sequence in late May. The example more concretely illustrates DSQL's use in each knowledge discovery step previously discussed.

5.1. Data cleaning and integration As stated previously, DSQL removes noise from diagrammatic data by specification of a priori knowledge that focuses only on the relevant portions of that diagrammatic data. In this case, this a priori knowledge consists of diagrams representing the shape, size, and location of each state in the continental Unites States. As we are currently investigating DSQL's purely diagrammatic aspects, the current example does not make use of its data integration capabili ties.

5.2. Data selection and transformation Data relevant to the current knowledge discovery task is retrieved by posing the query "Which states are optimal for planting strawberries in late May?" In DSQL: DSELECT state FROM us WHERE crop = strawberries AND season = late_may

The result of this query is shown in Figure 2 comprised of the following set of states: {Maine, Michigan, Minnesota, Montana, New Hampshire, New York, North Dakota, South Dakota, Vermont, Wisconsin}

Figure 8. Optimal planting in early May In this example, data is transformed by a priori knowledge providing the location, shape, and size of states within the United States mapping sets of pixels (low-level data) into states (higher level concepts).

5.3. Data mining The strategy used to find the desired predictors is to query previous seasonal diagrams for the states that contain regions for each of the crops represented in each diagram. These responses, then, are compared with task relevant data retrieved during data selection and those responses that do not contain a significant subset of the task relevant data are removed from further consideration. The remaining set of queries is then combined to induce a rule and that rule is evaluated for interestingness. To begin, the late April diagram is queried, via DSQL, for states that contain regions optimal to plant each of the crops it represents (including "no crop"). These queries include "Which states have regions optimal for planting citrus?", "Which states have regions optimal for planting tomatoes and annuals?", etc. Given a tolerance level of 100% (i.e. responses are removed that do not return a data set of which the entire set of task relevant data retrieved during data selection is a subset), only one such query for late April is retained: "Which states have regions that are optimal for planting fruit trees in late April?" In DSQL: DSELECT state FROM us WHERE crop = fruit_trees AND season = late_april

This query returns the diagram shown in Figure 9. Next, the early May diagram is queried, via DSQL, for states that contain regions optimal to plant each of the crops it represents. Again, given a tolerance level of 100%, only one such query for early May is retained: "Which states have regions that are optimal for planting peas and potatoes in early May?" In DSQL: DSELECT state FROM us WHERE crop=peaspotatoes AND season= early_may

This query returns the diagram shown in Figure 10.

Figure 9. Response to query: "Which states had regions optimal for planting fruit trees in late April?" As each of these queries elicits a response that matches the task relevant data exactly, the conditions represented by the queries represent likely predictors for a state containing a region optimal for planting strawberries in late May. That is, states that contained regions optimal for planting strawberries in late May contained regions optimal for planting fruit trees in late April and peas and potatoes in early May. As we are interested in predicting optimal strawberry planting regions from past observations, we are interested in validating the induced rule: "If a state has regions optimal for planting fruit trees in late April and peas & potatoes in early May, then it has regions optimal for planting strawberries in late May."

5.4. Pattern evaluation Evaluating the interestingness of this rule requires establishing its simplicity, certainty, utility, and novelty. Its simplicity can be established through a count of conjuncts used in the rules antecedent ¾ we deem two to be within our simplicity threshold. Further, we deem the rule novel in that it is at least so to us. We establish certainty via computation of the rule's measure of confidence and its utility via computation of the rule's measure of support. To compute the confidence measure of this rule, DSQL is used to query the database for 1) the number of states that satisfy the antecedent of the induced rule (“state has regions optimal for planting fruit trees in late April and peas & potatoes in early May”) and 2) the number of states that satisfy both the antecedent and consequent of the induced rule (“If state has regions optimal for planting fruit trees in late April and peas & potatoes in early May, then it has regions optimal for planting strawberries in late May”). The confidence measure equals the first count divided by the second count. The following DSQL query returns the first count: DSELECT COUNT (state) FROM us

Figure 10. Response to query: "Which states had regions optimal for planting peas and potatoes in early May?" WHERE (crop = fruit_trees AND season =late_april) AND (crop = peaspotatoes AND season = early_may)

The result of this query is 19, the number of states that satisfy it. The following DSQL query returns the second count: DSELECT COUNT (state) FROM us WHERE (crop = fruit_trees AND season =late_april) AND (crop = peaspotatoes AND season = early_may) AND (crop = strawberries AND season = late_may)

The result of this query is 10, the number of states that satisfy it. The confidence measure is then 10/19 or 52.6%, deeming the rule correct about half the time. The measure of support is computed by dividing the second count from above (number of states satisfying both the antecedent and consequent of the rule) by the total number of states in the database. The following DSQL query returns the count of all states (including Washington, D.C.): DSELECT COUNT (state) FROM us

The result of this query is 49. The measure of support is then 10/49 or 20.4% deeming the rule useful for about a fifth of the database.

5.5. Knowledge presentation To help clarify the rule, DSQL queries can be used to help visualize the confidence and support of the discovered knowledge. These queries are identical to those used to determine the interestingness measures except that diagrams are returned instead of counts. States that satisfy the antecedent of the induced rule, as displayed in Figure 11, are returned by the first query. States that satisfy both the antecedent and consequent of the rule, as displayed in Figure 2, are returned by the second query.

6. Conclusion We have shown how a language that can be used to query diagrammatic data can be useful in knowledge

on Data Engineering (ICDE '95), pp. 3-14, Taipei, Taiwan, March. [3] Anderson, M. 1999. Toward Diagram Processing: A Diagrammatic Information System. In Proceedings of the 16th National Conference on Artificial Intelligence, Orlando, Fl. July. [4] Anderson, M. 2000. Diagrammatic Reasoning and Mathematical Morphology. In Proceedings of the AAAI Spring Symposium on Smart Graphics, Stanford, CA. March.

Figure 11. States satisfying the antecedent of the rule discovery tasks on this data. Other related data mining approaches include data mining on spatial databases [9], multimedia databases [12], and mining the World Wide Web [14]. Our approach differs from typical spatial databases in that high level semantics are provided for raster level data instead of simply descriptions of points, lines, polygons, etc. Further, multimedia databases, while permitting queries upon raster data, do so at relatively primitive levels of abstraction such as color, basic shape, texture, etc. Again, the higher level of abstraction provided by our approach (arising from its domain specificity) differentiates it. To conclude, this is but a step in our attempt to provide agents with diagrammatic reasoning capabilities. Such a capability will be required of autonomous agents that interact with an environment rife with such representations and unsophisticated users whose expectations for human-like interaction are unbounded. Those agents able to interact with diagrammatic information in their environment and engage in two-way diagrammatic communication with users will clearly exhibit a higher degree of autonomy and naturalness of human-machine interfacing than those not so able.

7. Acknowledgments This material is based upon work supported by the National Science Foundation under grant number IIS-9820368.

8. References [1] Agrawal, R., Imielinski, T. and Swami, A., 1993. Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering, 5:914-925. [2] Agrawal, R. and Srikant, R., 1995. Mining Sequential Patterns. In Proceedings of 1995 International Conference

[5] Anderson, M. and Armen, C. 1998. Diagrammatic Reasoning and Color. In Proceedings of the AAAI Fall Symposium on Formalization of Reasoning with Visual and Diagrammatic Representations, Orlando, FL. October. [6] Anderson, M. and McCartney, R. 1995. Inter-diagrammatic Reasoning. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada. August. [7] Anderson, M. and McCartney, R. 1996. Diagrammatic Reasoning and Cases. In Proceedings of the 13th National Conference on Artificial Intelligence, Portland, OR. August. [8] Anderson, M. and McCartney, R. 1997. Learning from Diagrams. Special issue on diagrammatic reasoning of Journal of Machine Vision and Graphics, Vol. 6, No. 1. [9] Ester, M., Kriegel, H.-P. and Sander, J., 1997. Spatial Data Mining: A Database Approach. In Proceedings of the International Symposium on Large Spatial Databases (SSD '97), pp. 47-66, Berlin, Germany, July. [10] Han, J. and Fu, Y., 1994. Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases. In Proceedings of the 1994 AAAI Workshop Knowledge Discovery in Databases (KDD '94), pp. 157-168, Seattle, WA. July. [11] Han, J. and Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA. [12] Subrahmanian, V.S., 1998. Principles of Multimedia Database Systems. San Francisco: Morgan Kaufmann. [13] Weiss, S.M. and Kulikowski, C.A., 1991. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann. [14] Zaïane, O.R. and Han, J., 1998. Querying the World-Wide Web for Resources and Knowledge. In the Proceedings of the International Workshop on Web Information and Data Management (WIDM '98), pp. 9-12, Bethesda, MD, November.

Suggest Documents