The role of soil inference systems in digital soil ...

Digital Soil Assessments and Beyond – Minasny, Malone & McBratney (eds) © 2012 Taylor & Francis Group, London, ISBN 978-0-415-62155-7

The role of soil inference systems in digital soil assessments J.C. Morris, B. Minasny & A.B. McBratney Faculty of Agriculture and Environment, The University of Sydney, NSW, Australia

ABSTRACT: In order to evolve beyond Digital Soil Mapping (DSM) sensu stricto, there is a need for tools that can augment observationally and inferentially obtained soil property data without the disadvantages of extrapolation. In particular, it is highly desirable to estimate the soil property values for which direct measurements are unavailable or infeasible to obtain. The Soil Inferencing System (SINFERS) is a rule-based expert system to predict unavailable or unfeasibly obtainable soil property values from known soil property data, and its existence will help drive the aforementioned evolution. The results from SINFERS will be used to describe soil quality, to monitor the effects of agricultural management, and principally to provide information needed by policy makers concerned with sustainable land use. This paper provides an introduction to the high-level technical aspects of SINFERS, its relation to DSM, and how the work will be extended in the near future. 1

INTRODUCTION

As Sam Cooke sang, “It’s been a long time coming.” An examination of literature trends since 1990 indicates that use of expert systems in the soil science domain, particularly rule-based expert systems and neural networks, has been increasing in step with the AI Spring trend—the renaissance of commercial AI—for nearly two decades. New systems are often variants or improvements to existing systems, but many original systems have been added. Here we present a brief summary of systems similar to SINFERS as well as the early SINFERS theory. 1.1

Classification systems

Categorization and taxonomic classification tend to be the most common roles for such soildomain expert systems, with systems that actually compute quantitative predictions being narrowly focused on specific outputs such as thermal properties (Tarnawski & Wagner 2001), structural and mechanical properties (Kotdawala & Hossain 1994). A treatise on the importance of soil classification expert systems can be found in Dale and McBratney (1989), while a typical example combining both rules and neural nets would be the system to infer soil properties based on geomorphological features by Cook (1996) which was inspired by PROSPECTOR (Campbell, 1982). Other systems focus on the soil assessment realm, with remediation and erosion prevention being common topics. Examples would be Harris (1990) and Ross (1993). This brief survey supports the authors’ claim that while there are numerous purpose-built

systems servicing specific niches in the soil attribute and soil assessment spaces, there is a gap for a flexible, scalable, system designed to supply reliably accurate soil property predications to facilitate general assessment of a soil’s potential (Carré & McBratney 2007). SINFERS is the first application in its class to attempt such large-data set soil property prediction using a rule-based approach. 1.2

SINFERS theoretical background

McBratney et al. 2002 put forward the idea of a soil inference system (SINFERS) in which the values of soil properties are inferred by the application of pedotransfer function (PTF) rules. That work also emphasized the need to avoid prediction via extrapolating from observed data because of the large errors (uncertainties) resulting from straying too far from the training data domain. This concept was further expounded by McBratney & Minasny (2004). The characterization of the prototype SINFERS, named SINFERS2, was conducted by Tranter (unpubl.) and was also described in Tranter & McBratney (2009). The currently inprogress upgrade to SINFERS architecture is an extension of these theoretical works. Knowledge of how that architecture is constructed in terms of its main components is useful in understanding SINFERS operation. 2

SINFERS GENERAL ARCHITECTURE

SINFERS extends the classic rule-based expert system architecture given in Giarratano and Riley (1994), and it utilizes a mixture of off-the-shelf,

281

open-source, and custom software components. It is currently composed of a simple console based driver program which exercises the SINFERS application programming interface (API). This API forms the primary abstraction layer for all SINFERS programming, and it is built to support model-based computing in the soil property domain. API functionality is organized into three primary concerns: (in order of importance): inferencing, persistence, and input/output (Fig. 1). Each of these concerns has its own internal architecture which are specific to this current SINFERS implementation. An explanation of the purpose and function of the API will give a window into how SINFERS works. Figure 1.

2.1

At the core of the SINFERS application is its application programming interface. The API provides the building blocks from which the formal SINFERS application will be constructed, and it is the foundation on which the current implementation rests. It is based on the ontological concept of reification, and it supports all inferencing, persistence, and I/O operations in SINFERS. 2.2

High-level SINFERS architecture.

SINFERS API

Importance of reification

The SINFERS API is composed of models (Java classes) based on the most common things as would be found in a fundamental ontology about soils and soil science (e.g., a soil profile, a soil sample, a soil property, a PTF, etc.). A canonical example of this reification of common soil science concepts into manipulable objects is the sinfers.core.PTF class. At first glance, a PTF is nothing more than the algebraic expression given by its regression bound to its output variable, and it would be easy to assume that this could be encoded as a simple Java method in a general purpose PTF class containing all such PTFs. However, there are many practical software engineering benefits to be gained by promoting such entities as PTFs to first-class objects. Rather than work with a Java method that simply encodes the algebraic expression of a given PTF, SINFERS actually passes around PTF objects which have variables for storing their state (e.g., argument values, uncertainty, centroids, and other metadata) and methods for computing their values and uncertainties among other behaviours. The most important benefit to constructing the API this way is that SINFERS rule engine is able to reason over facts derived directly from Java objects, and so soil property rules and control rules are derived and encoded naturally from the semantics and paradigms found in the SINFERS

ontology. This highly-reified, domain-aware API model allows programmers to extend SINFERS in the soil science domain’s language, which greatly enhances its reusability and portability. 2.3 Persistence layer During normal SINFERS operation, there is a need to persist not only static data but also the dynamically serialized state of various runtime SINFERS objects. To facilitate this, SINFERS currently uses thee relational databases to store records about PTFs, soil properties, and system users, respectively. Transactional I/O access to these databases is provided by classes in SINFERS API. The PTF database contains the metadata representations of the PTFs that SINFERS “knows about” in its knowledgebase. These records are essentially the serialized form of Java classes that model PTFs, which SINFERS lazily instantiates as PTF objects as needed at runtime. The soil property database contains static tables of useful soil properties sourced from Rayment (1992), and which is actually cached in memory at runtime so as to minimize database I/O time. Finally, the user database provides a repository for login authentication/authorization data allowing different access privileges. 2.4

Inferencing layer

A key design requirement in SINFERS development was to decouple the implementation of rules (domain knowledge) from the mechanisms used to compute PTFs so that the two would be free to vary independently with future design requirements. SINFERS currently uses two rulebases to control inferencing of soil properties. One rulebase contains the knowledge of what PTFs can be applied when certain soil properties are present in

282

working memory. The other contains control rules that govern when those rules should be applied. 2.5

Input/Output layer

The I/O Layer contains all the mechanisms for entering input into SINFERS as well as for obtaining output. Typical output would comprise a logical explanation of the results (usually a trace of the rule firings and their subjective but expert interpretations. Formatted reports of the computed soil property data are the responsibility of the reporting facility. Also included are facilities for inputting and editing PTFs and SINFERS model files. SINFERS operates on models of soil profiles structured by layers or horizons. These XML files are defined in the SINFERS input schema, and are locally uploaded at runtime for processing. Multiple sites can be processed in one model file, and typical output is usually just the original model file but with the newly computed soil properties included. 3

RULE-BASED INFERENCING OF SOIL PROPERTIES

The current SINFERS2 (Tranter, unpubl.), on which this current work is based, eschews errorprone extrapolation in favor of an inferential approach. It can be partially characterized as following the regression-tree technique outlined in that paper but implemented using forward-chaining rules logically based on modus ponens:α, α → β β. In fact, if one were to plot the generation of new soil properties on each SINFERS computation cycle, a directed acyclic graph very similar to the one given in McBratney (2002) would be obtained. A hypothetical SINFERS forward creation graph for an initial input set of three observed soil properties (SP) is shown in Figure 2. In Figure 2, three initial soil properties have been stored in SINFERS, and these are used to compute subsequent generations of new soil properties. Each SINFERS property thus has an ancestry given by its dependency tree, as shown for soil property 9 in Figure 3. Figure 3 indicates that soil property 9 is a function of soil properties 4, 5, and 6 which are second generation properties formed after the first inferencing cycle. Each of these properties are functions of the initial soil properties 1, 2, and 3. In order to actually compute PTF values and their associated errors, it is necessary to couple knowledge about which PTF to use with the actual numerical methods for the computations. As the definitions for each PTF contain all the necessary meta-information to compute the numerical value

Figure 2. graph.

A forward-chaining soil property inferencing

Figure 3.

An example soil property ancestry graph.

and error for a given soil property, the only remaining procedural knowledge needed to perform the computations is the expertise of when and how to apply a particular PTF given a particular working memory state. To model that procedural knowledge, SINFERS uses forward-chaining rules that pattern-match on the argument lists of the known PTFs in its PTF database. When a proper subset of soil property facts in working memory match the argument list of a PTF object fact (loaded into working memory at initialization), a rule is activated that proposes the computation of the soil property associated with the PTF. When no more subsets of working memory cause PTF rules to activate in a given cycle, SINFERS rule engine executes (fires) all the activated PTF proposal rules, thereby augmenting working memory with new candidate properties. The addition of these new properties can result in new subsets being matched to other PTF definitions, and the whole process of matching, activation, and firing repeats cyclically until SINFERS has exhausted its PTF knowledgebase. In the case where a soil property has already been computed, SINFERS will replace that property with a more certain one provided the computation will not trigger a circular reference to another property. The current general inferencing scheme for SINFERS is summarized in Table 1. 4

SINFERS ROLE RELATIVE TO DSM

Now that the architecture and operation of SINFERS have been established, its relevance in

283

Table 1.

SINFERS inferencing scheme.

Fundamentally, SINFERS bridges the domain of DSM in the strict sense to the Digital Soil Assessment level where predications are transformed into actionable information. Figure 3 shows SINFERS place in a typical digital soil assessment roadmap, adapted from Carré & McBratney (2007).

IF (Criteria)

THEN (Action)

A new soil property can be computed

Do so unless doing so would overwrite an original input soil property

Only one PTF is available to compute a soil property

Use it without further consideration

5

More than one PTF is available to compute a soil property

1. Let each PTF propose a candidate with their computed value and uncertainty for consideration. 2. Let a control rulebase decide which candidate is best by some criterion (typically the one with the smallest uncertainty

A soil property can be revised with a more certain value

Do so unless it would cause a circular reference with an existing property’s dependencies

A soil property being considered for replacement appears anywhere in the dependency history of the PTF supplying the replacing value

Forbid the operation

Thus, by inferring new properties from a small initial input set using carefully constructed rules and more robust error-predicting, SINFERS extends the utility of pedotransfer functions as defined by Bouma (1989) while furthering the suggestions outlined in Tranter (2009) and meeting the highlevel objectives of McBratney & Minasny (2004). However, one historical criticism levelled against AI is that it over-promises and under-delivers, and indeed the period from 1980 to 1995, known as the AI Winter, bears sad testament to this fact. Researchers such as Dale & McBratney (1989) have lamented that lack of suitable data and scientific commitment have also contributed to the rarity of systems such as SINFERS being constructed. Those authors go on to express cautious skepticism regarding the promises of new systems citing “a long history of using emotive words for simplistic concepts”. While these criticisms may certainly have been true in 1989, much has changed in applied AI to warrant a critical reappraisal. In the last two decades, exponentially more powerful and cheaper computing and the advent of the Internet have all greatly increased the likelihood that new information will soon be synthesizable by artificially intelligent agents. Unlike the 80’s where there was no “killer app” for AI technology, it is now readily utilized across multiple domains and markets—the ever-expanding mobile app market being one and Apple’s SIRI being a recent example. Practical problems in semantic reasoning, filtering Big-Data, and cloud computing are all catalyzing new AI development. It is entirely feasible that some exotically innovative technology like intrinsic motivation a.k.a. artificial curiosity (Schmidhuber, 1999) or IBM’s quantum chip will provide the next breakthrough, but more likely it will be a synthesis and integration of existing methods. In summary, system designers should be sensitive to the wariness on the part of end-users who commission such systems, but they should also be ready to accept the engineering challenges and to employ sound engineering practices to mitigate design flaws and to guard against failed expectations. The truth is that the reality of applied AI is finally catching its hype, and there is every reason for the soil science domain to benefit from investing in exploring applied AI’s untapped potential.

Figure 4.

SINFERS relationship to DSM roadmap.

relation to the DSM domain can be articulated. SINFERS supplements the determination of raw data by field measurements and/or geospatial inferencing, and by this it provides a source of reliable input to systems at the soil assessment level.

284

CONCLUSIONS

6

FUTURE WORK

In evolving beyond the current prototype, web service methods will be exposed that allow SINFERS to accept batch input of large datasets. Additional knowledge bases (modules) will be added that will provide advanced runtime error checking, fault-handling, and process monitoring. The current version supports console I/O only, but the next evolution will be a web-application featuring a rich-client web GUI. The authors are collaborating with the Australian Terrestrial Ecosystem Research Network (TERN) organization to develop a standardized SINFERS model that meets their future downstream soil assessment needs. ACKNOWLEDGEMENT The authors wish to acknowledge the support of CSIRO for funding this research and our colleagues at TERN for their collaboration. REFERENCES Campbell, A.N. 1982. Recognition of a Hidden Mineral Deposit by an Artificial Intelligence Program. Science 217(4563): 927–929. Carré, F., McBratney, A.B., Mayr, T. & Montanarella, L. 2007. Digital soil assessments: Beyond DSM. Geoderma 142(1–2): 69–79. Cook, S.E. 1996. A rule-based system to map soil properties. Soil Science Society of America Journal 60(6): 1893–1900. Dale, M.B., McBratney, A.B. & Russell, J.S. 1989. On the role of expert systems and numerical taxonomy in soil classification. Journal of Soil Science 40(2): 223–234.

Giarratano, J.C. & Riley, G. 1994. Expert Systems: Principles and Programming. PWS Publishing Co. Harris, T. 1990. A Rule-Based Expert System Approach to Predicting Waterborne Soil-Erosion. In: J. Boardman, I.D.L. Foster, J.A. Dearing (Eds.), Soil Erosion on Agricultural Land. Proceedings of British Geomorphological Research Group, Chichester, UK. Kotdawala, S.J. & Hossain, M. 1994a. Knowledge and Data-Driven Expert-System for Soil Compaction. In: H.J. Siriwardane, M.M. Zaman (Eds.), Computer Methods and Advances in Geomechanics, pp. 465–470. McBratney, A. & Minasny, B. 2004. Soil inference systems. In: Y. Pachepsky, W.J. Rawls (Eds.), Developments in Soil Science: pp. 323–348. Elsevier. Rayment, G.E. 1992. Australian laboratory handbook of soil and water chemical methods / G.E. Rayment and F.R. Higginson. Australian soil and land survey handbooks; v. 3. Inkata Press, Melbourne. Ross, J. 1993. An Expert-System for Soil-Erosion Mitigation in Logging Operations on Steep Land. AI Applications 7(4): 69–70. Schmidhuber, J. 1999. Artificial curiosity based on discovering novel algorithmic predictability through coevolution, Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, p. 1618, Vol. 1613. Tarnawski, V.R., Wagner, B., Leong, W.H. & Gori, F. 2001. An expert system for estimating soil thermal and transport properties. Strojniski Vestn.-J. Mech. Eng. 47(8): 390–395. Tranter, G. 2009. Realising the concept of a soil inference system. unpublished PhD thesis, University of Sydney, Sydney, NSW, AU. Tranter, G., McBratney, A.B., Minasny, B. & Morris, J.C. 2009. Realising the concept of a soil inference system: A domain expert’s perspective. Proceedings of Pedometrics: Biennial Meeting of Commission 1.5 Pedometrics, Division 1 of the International Union of Soil Science.

285