A Strategy for Evaluating Web-Based Discretionary Decision Support Systems Maria Jean J. Hall1, Andrew Stranieri,1 and John Zeleznikow2 1
Donald Berman Laboratory for Information Technology and Law Applied Computing Research Institute, La Trobe University Bundoora, Victoria, Australia
[email protected];
[email protected] 2 Joseph Bell Centre for Forensic Statistics and Legal Reasoning, Faculty of Law, University of Edinburgh, Old College, South Bridge Edinburgh, UK
[email protected]
Abstract. The World Wide Web facilitates user access to knowledge-based decision support systems. Such web-enabled systems can provide users with advice about how decision-makers exercise discretion. GetAid, developed using the web-based shell environment WebShell, is an example of a web-based decision support system operating in a discretionary legal domain. This paper presents the Context, Criteria, Contingency evaluation framework for knowledge-based systems, general in design but geared towards the evaluation of legal knowledge-based systems. Central to this framework is a hierarchical model of evaluation criteria arranged in four quadrants: verification and validation, user credibility, technical infrastructure and the impact of the system upon its environment. This framework frames an evaluation both in terms of the context of use of the system and the context of its evaluation and includes guidelines for the selection of appropriate evaluation criteria under differing contingencies. A case study is presented describing the use of this evaluation framework in planning the evaluation of the web-deployed GetAid system.
1
Introduction
Discretion involves reasoning where a decision maker is free to select one from a number of plausible outcomes [1]. This exercise usually involves the decision maker selecting and weighting relevant factors as desired. Amongst other issues, the presence of discretion mandates special consideration when undertaking an evaluation of knowledge-based decision support systems. The Donald Berman Laboratory for Information Technology at La Trobe University has focussed upon building legal decision support systems with a variety of partners, including Victoria Legal Aid (VLA). VLA based in Victoria, Australia is a government-funded provider of legal services for disadvantaged clients (www.legalaid.vic.gov.au). Its goals include providing legal aid in the most effective, economic and efficient manner and pursuing innovative means of providing legal services in the community. Regulations specify a simple set of rules that govern eligibility for legal aid, however the rules include open textured terms such as a Y. Manolopoulos and P. Návrat (Eds): ADBIS 2002, pp. 108-120, 2002.
A Strategy for Evaluating Web-Based Discretionary…
109
client’s acquittal prospects and the likely sentence. Predicting acquittal prospects or a sentence requires the exercise of some discretion. There is rarely one correct answer and two assessors, typically lawyers, often disagree. Expert systems are typically deployed where large knowledge bases need to be consulted and experts are not readily available. Commonly there is one correct answer with no discretion involved. Conventional expert systems, often developed using shell environments, are executed over local or wide area networks within one organisation. The World Wide Web is at most, a convenient mechanism to distribute the userinterface, particularly if users are geographically dispersed. There is little inherent in the World Wide Web to enhance the use of knowledge-based decision support systems. However deploying a discretionary system on the World Wide Web can lead to increased decision making transparency as the user, whatever their location, can now more effectively understand how a discretionary decision has been reached. Outcome consistency can be enhanced. Unfortunately, placing legal knowledge-based systems (KBS) on the World Wide Web is non-trivial. There are many reasons why the majority of commercial KBS have not been designed to execute on the World Wide Web. Few expert systems shells have been developed for web environments. Those developed are typically very expensive and beyond the reach of most users. Furthermore, traditional rule-based system architectures are not particularly well suited for web-based shells. For example, the traditional separation of domain knowledge from control knowledge [2] requires that the inference engine scan large segments of the knowledge base to find candidate rules to fire. If both inference engine and knowledge base reside and execute on the server then the time required for this to occur in a web based KBS is prohibitive. This is in addition to any client/server transmission delays and the time required for the resolution of rule conflicts. The opportunity for potentially any number of users to simultaneously access a web-based knowledge-based system, places real constraints on concurrency control mechanisms. Huntington argues that difficulties with the introduction of web-based expert systems diminish if shells are designed to execute largely on the client’s machine, instead of on the server [3]. Java applets are promoted for this end, e.g. the Java Expert System shell JESS (http://herzberg.ca.sandia.gov/jess/). However the appeal of this approach is diminished because client-side shells are difficult to realise in practice. The knowledge base and inference engine components of a KBS are typically large programs that require substantial resources and time to download. Furthermore, execution on the client-side is likely to be limited to users with powerful computers, restricting the universality of this approach. Although server-side applications such as Jnana (www.jnana.com) have been developed for web environments, they are typically too expensive for use by small to medium sized enterprises. The development and subsequent evaluation of discretionary knowledge-based decision support systems deployed on the World Wide Web highlights issues that are less critical for conventional systems. When these systems model discretionary reasoning, they must be readily maintainable by experts. Many applications of discretionary reasoning are small and dynamic. In order to maintain the currency of the knowledge base, such systems need to be easily modifiable, preferably by domain experts with minimal computer expertise and little training. The cost-effectiveness of smaller systems can be compromised if a knowledge engineer is required to maintain
110
Maria Jean J. Hall et al.
the systems. Web-based systems also provide the opportunity to contribute to automating the knowledge acquisition process by capturing decisions made by users who are also domain experts. Web based systems require different evaluation strategies from conventional systems as the method of deployment impacts on the evaluation criteria chosen. Maintainability, portability, machine independence, and currency, are often benefits of using the World Wide Web and should be considered in any evaluation. Other, possibly less beneficial, criteria that could be taken into account include accessibility and availability, performance, response time, and the efficiency impact on other software operating concurrently in the environment. A web-based decision support system makes feasible the distribution of decision-making and the concurrent responsibility that goes hand in hand with this. The impact of these changes on clients, staff and the organisation as a whole should be evaluated. A significant issue for discretionary systems is the assessment of the validity of the system’s recommendations. An evaluation strategy is required for assessing validity in the presence of some possibly disputed outcomes resulting from the exercise of discretion Stranieri et al discussed how to use knowledge discovery and model discretionary reasoning in Australian Family Law [4]. Their decision support system, Split-Up used a combination of rules and knowledge discovery techniques to advise a divorcing couple as to how marital property should be distributed. The Split-Up project developed criteria for the evaluation of discretionary KBS [5]. This initial work did not focus upon the World Wide Web, however the Knowledge Acquisition technique, Sequenced Transition Networks introduced in this project, later formed the basis for the development of the web-based shell WebShell. This paper describes a server side web-based shell WebShell that is both small and simple. Here the separation of domain knowledge from control knowledge that was the hallmark of traditional expert systems has been relaxed. This allows application developed using the shell to be compact, fast and inexpensive to build. The paper continues with a brief description of the use of WebShell in the development of the GetAid application for the VLA. A general evaluation framework is presented together with an overview of its central evaluation criteria model. The application of the framework to planning the evaluation of GetAid is reported as an example of how web-based applications that deliver discretionary reasoning can be evaluated. The paper concludes with a discussion of some issues regarding the evaluation of webenabled knowledge-based systems that arose from the GetAid evaluation.
2
The C. C. C. Evaluation Framework
The Context, Criteria, Contingency (C, C, C) evaluation framework has been developed at the Donald Berman Laboratory [6, 7]. It is designed to be of general use when planning an evaluation, but is specifically tailorable to the evaluation of legal knowledge based systems operating in discretionary domains. The C.C.C. framework addresses four areas: − Context: assessing the context of system operation i.e. usage − Context: assessing the context of the evaluation itself − Criteria: a hierarchical four quadrant model of evaluation criteria
A Strategy for Evaluating Web-Based Discretionary…
111
− Contingency: guidelines for the selection of appropriate criteria to satisfy varying evaluation contingencies Initially an evaluator requires to gain a thorough understanding of the context in which the system operates. This usage context considers the parent organization, the application domain, management support, funding, degree of risk exposure, system resourcing and constraints, and the work environment. User issues such as values, motivation, skills, experience and training are also important. An understanding of these issues allows the Evaluator to better appreciate the system context and assists in selecting appropriate criteria to be used in the evaluation. Before proceeding further, the evaluator requires to also understand the context of the evaluation itself. The evaluation context considers the evaluation feasibility, resources available, constraints applied and the autonomy permitted to the Evaluator. For example, there may be a requirement for an evaluation to conform to existing corporate evaluation procedures. The political, economic and social environment of the evaluation and identification of relevant stakeholders are also considered. The Evaluator, in cooperation with stakeholders, frames the evaluation in terms of the philosophy of the evaluation e.g. summative/formative or qualitative/quantitative, and issues of power, organisational culture, available evaluator skills and experience, resource availability and economic feasibility. The evaluation is planned in terms of when, by whom, from who’s viewpoint, what, where and why and of course how. With a full understanding of the context of system operation and the evaluation itself, the evaluator can now plan the evaluation including which criteria are suitable to assess the system. Figure 1 below illustrates the Evaluation Criteria Model included in the C. C. C. evaluation framework. This model organises potential evaluation criteria into four quadrants bound by two axes: People/Technology and Micro/Macro (small and focussed/large and generalised). Over one thousand evaluation criteria for inclusion in the model were drawn from many sources including a number of recognized international standards, reports in the literature and suggestions made by industry and academic based IT practitioners. These criteria are mapped onto the appropriate quadrant where they are arranged in a hierarchical fashion. For example: − The Validation and Verification (V and V) quadrant is concerned with the Micro/Technical aspects of technology and the system development process. Internal criteria include those concerned with validity, and software design whilst external criteria canvas areas such as efficiency, reliability and security. Specific evaluation criteria for KBS included in this quadrant are concerned with knowledge base validity, inferencing, learning and the provision of explanations. − The Technical Infrastructure quadrant is concerned with the Macro/Technical infrastructure requirements such as the technical fit of the system with existing systems, resource requirements and availability, portability etc. − The User credibility quadrant is Micro/People oriented and is subdivided into three main areas: User satisfaction, Utility (fitness for purpose, usefulness) and Usability (ease of use). − The Impact quadrant is Macro/People oriented and is concerned with impact of the system upon its environment, including tasks, people, the parent organisation and beyond.
112
Maria Jean J. Hall et al.
Micro
Technology
V and V
User credibility People
Technical Infrastructure
Impact
Macro
Fig. 1. Evaluation Criteria Model
Most criteria are easily categorised as belonging to a particular quadrant where they are organised in a hierarchy comprising groups, sub-groups, sub-sub groups etc. For instance, the User Credibility quadrant contains groups of criteria used to assess usability, usefulness and user satisfaction. The large group of criteria addressing usability are further organised into sub-groups concerned with operability, accessibility, flexibility in use, understandability, ease of learning, assistance available, human factors and interface issues. Descending one level lower in this hierarchy, the sub-group involved with criteria to measure the assistance available in turn consists of sub-sub-groups including training, documentation, in-system point of need support and external human help. A few groups of criteria are not readily attributable to just one quadrant and fit more naturally on the borders between quadrants. For instance criteria concerned with the satisfaction of user requirements lie naturally between the V and V and User Credibility quadrants, as they are concerned with usefulness. Similarly Personal Impact criteria can be considered both in the User Credibility (user satisfaction subgroup) and Impact quadrants. The Personal Impact criteria hierarchy when expanded includes sub-groups; motivation and rewards, job satisfaction, productivity, changes in status, autonomy and the job, health and welfare, social interaction, participation and involvement, privacy, and the availability of personal choices. This model suggests potential evaluation criteria but assumes that when framing the evaluation, the evaluation team will decide on the appropriate criteria, metrics, methods and acceptance levels from amongst the many choices available. The evaluation contingency guides this choice of evaluation criteria. Contingency guidelines are still under development and have been drawn from many sources e.g. [8-11]. The purpose of the evaluation (why), the stage of the product life cycle (when) and the capabilities of the evaluation team (by whom) all influence this choice, as does the role of the evaluation (from who’s viewpoint). Gregory and Jackson suggest a more formal contingency framework [12]. The Multiview methodology also proposes differing perspectives for choosing evaluation criteria [13]. The long-term goal of the evaluation framework project is to develop guidelines that can be used to frame evaluations of legal KBS under differing contingencies.
A Strategy for Evaluating Web-Based Discretionary…
113
The next sections describe the GetAid system and the use of the C. C. C. Evaluation Framework in planning its evaluation.
3
The GetAid System
The GetAid system was developed by the Donald Berman Laboratory with domain expertise provided by Victoria Legal Aid (VLA). When an applicant for legal aid approaches VLA, their application is assessed to determine their eligibility for legal aid. If the applicant satisfies a financial test, it then undergoes a merits test. The merits test involves a prediction about the likely outcome of the case. VLA grants officers who have extensive experience in the practices of Victorian Courts assess the merits test. Currently VLA employs many lawyers to assess eligibility for legal aid. Their assessment involves the integration of procedural knowledge found in regulatory guidelines with expert lawyer knowledge that involves a considerable degree of discretion. This task consumes more than 50% of VLA’s operating budget, yet provides no direct services to its clients. The provision of a web-based expert system that advises potential clients as to their likelihood of being granted legal aid has great financial benefits for VLA allowing them to concentrate their resources on giving legal advice and support. To meet this need, a web-based system GetAid has been developed which advises solicitors and their clients as to whether the client is eligible for legal aid [14]. GetAid has captured the reasoning of VLA grants assessors is a web-based system that can be used directly by VLA clients and lawyers. In this paper we discuss on-going development of the GetAid system and a strategy to evaluate its efficiency and effectiveness that is applicable not only to GetAid but also to other knowledge-based decision support systems on the World Wide Web. GetAid was developed using the web-based shell environment WebShell [14]. Knowledge is stored in a relational database and modelled using two representations, a variant of a standard decision tree for procedural type tasks and an argument tree for tasks that are more discretionary. This facilitates on-going maintenance. VLA regulations relating to aid for summary (lower Court) offences are encoded as a single decision tree. Nodes in the decision tree represent issues such as the prospects for acquittal, that a domain expert is bound by regulation to consider when determining eligibility for legal aid. The determination of each issue on the decision tree is largely rule-based and involves little discretion. Rather than translate decision tree knowledge into rules, the decision trees are mapped into sets i.e. sequenced transition networks. These sets can readily be stored in a relational database format in a way that simplifies the inference engine design. Discretionary knowledge is modelled using argument trees [14]. This work used an argument structure based on the work of the philosopher, Stephen Toulmin [15]. A recent survey reveals that the majority of researchers that have adopted the Toulmin Structure have not used the original structure but variations on the original concept [16]. An analysis of those variations motivated the structure used for the argument trees in WebShell. The inference mechanism in WebShell consists of two components; a lookup table for exceptions and a weighted sum formula. Once the user has supplied values for
114
Maria Jean J. Hall et al.
input data items, the WebShell inference engine attempts to looks up an output claim value in the lookup table of exceptions. This table stores values that are exceptions to the weighted sum formula that have been detected during the evaluation phase of knowledge based system development. If no entry is found in the exception lookup table, the inference engine applies a weighted sum formula according to weights associated with each data item. Using a lookup table to store the mapping between data values and claim values also makes possible the use of a variety of other inference mechanisms including neural networks. However a real-time, web based implementation cannot rebuild a neural network for each inference without causing consultation delays. Hence, storing all possible data item inputs and corresponding claim value outputs in the lookup table enables fast inferences even when the original source was a neural network.
4
Evaluating the GetAid System
The GetAid evaluation was guided by the C.C.C. Evaluation Framework described in Section 2 and involved five distinct stages: 1. Initial framing of the evaluation 2. A laboratory-based investigation of an earlier GetAid prototype assessed face validity, sub system validity and conducting an input output (I/O) comparison. Issues identified in this stage were fed back to the developers and were the catalyst for some GetAid enhancements. 3. Field tests. This stage conducted several months later on an enhanced GetAid prototype, had two objectives: to certify the validity of decisions reached by GetAid and to further enhance GetAid with respect to its usability and the inferences it makes. 4. Technical evaluation conducted by interview with the VLA Information Technology manager, using a checklist of criteria suggested by the evaluation model. 5. Impact assessment 4.1 Evaluation Stage 1: Framing the Evaluation The C. C. C. Evaluation Framework approaches the framing of an evaluation in two ways; from the contexts of the system’s use and of its evaluation. Framing the GetAid evaluation, was undertaken by a committee of three: GetAid’s designer, knowledge engineer and developer (the Developer), the VLA domain expert charged with representing VLA in the evaluation (the Domain Expert) and the researcher and author of the evaluation framework (the Evaluator). This three-person evaluation committee of Developer, Domain Expert and Evaluator, framed and planned the subsequent evaluation process guided by the C. C. C. evaluation framework. VLA sought a summative evaluation that would warrant the validity of the decisions that GetAid made, whilst the Developer was also concerned with a formative evaluation that would improve the inferences GetAid made. User satisfaction, decision credibility, and issues associated with the Internet deployment such as
A Strategy for Evaluating Web-Based Discretionary…
115
availability and response time were also canvassed. Evaluation criteria were selected from all four of the quadrants of the evaluation model in an attempt to plan a broad based evaluation that satisfied the requirements of all stakeholders. 4.2 Evaluation of the Prototype Stage 2: Face Validity, Subsystem Validity and Input/Output Validity This stage of GetAid’s evaluation was based upon ideas for the evaluation of KBS proposed by Borenstein and O’Leary and colleagues [17, 18]. Three separate evaluation activities were carried out on an earlier GetAid prototype. The first part of the evaluation sought to determine if the prototype exhibited face validity i.e. was correct “on the face of it”. The objective was to verify consistency between the developer and user views of the GetAid requirements; i.e. confirming that the Legal Aid grant eligibility problem had been correctly identified, with all factors included and the expert knowledge correctly represented. A VLA employee with expertise in assessing the eligibility of grants for legal aid and the Evaluator operated the prototype together, concluding that face validity was established as the prototype well structured and adequately reflected the Legal Aid grant eligibility domain. Some minor issues were fed back to the Developer for consideration. Sub-system validity was considered next. The Developer partitioned the GetAid prototype into modules, each a representation of an argument and generating specific outputs from given inputs. The Domain Expert identified sets of inputs and valid outputs for selected arguments. Thus the validity of GetAid sub modules was tested with real VLA data chosen as appropriate and representative by the Domain Expert. Feedback was provided to the Developer who then further revised the prototype. The final evaluation activity, associated with the early prototype was an Input/Output (I/O) Comparison evaluation which viewed the prototype as an I/O transformation from input case data to an assessment of eligibility for legal aid. This evaluation sought to establish whether GetAid outcomes were comparable to those of an expert. A sample of assessors of legal aid entitlement comprising the GetAid system, ten expert users and eight novice users, individually assessed ten cases supplied by the VLA. The objectives were to determine whether expert users agreed with each other on their assessment of legal aid eligibility for an individual case and if when considering a group of cases, GetAid’s assessments were closer to the expert panel’s assessments than to the novices’ assessment of legal aid eligibility. The results of this evaluation were mixed. − In 7 cases, there was good agreement between the ten experts who assessed the cases. On a case-by-case basis, there was a closer agreement between GetAid and the Experts than between GetAid and the novices. − In 1 case, all the expert users and most of the novice users disputed GetAid’s assessment. This case was referred back to the Developer, who agreeing with these user opinions, subsequently updated the exception table in GetAid’s inferencing mechanism. − In 2 cases, there was disagreement amongst the expert users on the assessment. On further examination by the Developer and the Domain Expert it was determined that these cases had a significant discretionary component in their
116
Maria Jean J. Hall et al.
assessment of legal aid. It was considered reasonable that experts should be in disagreement as discretion was involved. The results of this test, whilst extremely valuable, were not statistically significant due to the small sample size. The resourcing of I/O comparison testing is expensive and so this activity was not extended to more cases. However lessons learned from this I/O comparison evaluation, particularly the part played by discretion, were incorporated into the planning of the next stage of the evaluation. 4.3 Evaluation Stage 3: Past Cases Validity and New Cases Field Trials An enhanced GetAid prototype, updated and deployed on the Internet (http://www.ballarat.edu.au/~astranieri/webshell.html) was the subject of a two-part evaluation trial; the past cases trial where the GetAid assessed outcome was validated against an already known outcome and the new cases field trial. There were three objectives to these trials: − To confirm the validity of GetAid’s legal aid entitlement assessments − To formatively improve the assessments of legal aid eligibility made by GetAid − To determine the usability and user satisfaction with GetAid. In the past cases trial, a stratified random sampling technique was used to select 269 past case files, all with the outcome of a previous application for legal aid recorded. Administrative officers, trained by domain experts, entered details from these files into GetAid. GetAid’s assessment of eligibility for a grant of legal aid and the original recommendation were both recorded. In 75% of the cases GetAid agreed with the original assessment. The use of the administrative officers for this task was predicated by availability, maintaining privacy and economic reasons. The 25% disagreements between the original data and GetAid’s assessment were initially referred to the Developer, who corrected obvious errors in 15%. The remaining 10% with no such error apparent were referred to a panel of three VLA experts. This panel determined if the administrative officers entering the data had made a mistake (GetAid was not really in dispute with the previous assessment), the prior assessment was in error, this was a discretionary case or GetAid’s assessment was in error when the Developer would be asked to update GetAid’s inferencing. This evaluation of previously assessed cases served both to warrant the validity of GetAid assessments and to improve the inferences GetAid made. The percentage of disputed cases dropped over the course of the trial. The second field trial used 383 new cases. Grants Officers, VLA employed Lawyers and some private practitioners working outside the VLA spent four weeks using GetAid in the course of their normal activity of assessing the eligibility of applicants for grants of legal aid. They recorded both their personal assessment of the outcome of the request for a grant of legal aid and the decision arrived at by GetAid which was the same in 88% of the cases. Such a comparison served to warrant whether VLA should use GetAid in practice. Four times during the course of the trial the users were surveyed, canvassing their opinion as to the usability and usefulness of GetAid and their overall level of satisfaction with the software. Their opinions
A Strategy for Evaluating Web-Based Discretionary…
117
changed during the course of the trial, both as they gained familiarity with GetAid and as GetAid’s inferences improved due to updates by the Developer. User satisfaction evaluation criteria from the third Micro/People quadrant canvassed included: − Ease of use − Availability − Usefulness on the job − Navigation − Reliability − Usefulness as a training tool − Ease of learning − Response time − Overall user satisfaction 95% of respondents found the system both easy to learn and to use. Respondents were also highly satisfied with the other user satisfaction criteria canvassed. 4.4 Evaluation Stage 4: Technical Feasibility GetAid’s technical feasibility was ascertained through a structured interview, conducted by the Evaluator with the VLA’s evaluation manager. A checklist of technical evaluation criteria from the Macro/Technical quadrant included: − Fit with the existing VLA systems: Integration and interoperability − Efficiency, performance impact on existing systems − Resourcing feasibility: Hardware, software, web hosting, support and maintenance − Maintainability, Portability and Installability − Quality of the technical solution and opportunity for technical transfer Results of the first four evaluation stages were combined into a report to the management of VLA for their consideration in their determination of the future direction of granting legal aid in Victoria.
5
Issues Arising from the Evaluation of GetAid
The experience of conducting the evaluation of GetAid both in the laboratory and in the field highlighted issues worthy of further consideration. 5.1 Evaluation stage 5: Evaluating System Impact The fifth stage of evaluation namely the impact of GetAid on its environment has not yet been conducted. This evaluation would consider the Impact Macro/People quadrant including: − Work tasks and practices − Individual workers job satisfaction, job variation, motivation, productivity, autonomy, status − Victoria Legal Aid as an organization including changes in processes, structure and decision-making, impact of GetAid on organisational productivity and goals. − The broader perspective of VLA clients, private practitioners seeking Legal Aid for their clients, decision making transparency and service to justice in Victoria.
118
Maria Jean J. Hall et al.
− Economic issues, industrial relations, social and political factors This stage of the evaluation may also require considering the impact of not developing the system. Such an occurrence was demonstrated in another legal field, where public backlash against unfettered judicial discretion in sentencing led to the replacement of any discretion with rigid regimes in some jurisdictions [19]. Such formulae have led to unjust outcomes. The social impact of a web-based system for sentencing needs to be evaluated not only by using some of the criteria suggested above but also by considering the possible outcomes of failing to develop such a system. The evaluation of GetAid’s impact on its environment is a sensitive issue, as it must consider confidential information and has a propensity to be both socially and politically contentious. This evaluation is best be conducted in-house by the management of the VLA. 5.2 Formative Evaluation of a Flexible System One of the main discriminators between evaluations is whether the evaluation is of a formative or summative nature [20, 21]. The objective of a formative evaluation is to learn, provide feedback, and to refine and improve the product. A summative evaluation warrants (certifies) that a product has passed some preset barrier, acknowledging it is now fit for use. Ligezinski and colleagues propose that a computer system is flexible if it can accommodate changes in user requirements without the need for reprogramming by an Information Technology professional [22, 23]. They consider the ability to accommodate rapidly changing user requirements in a dynamic work environment, to be the main advantage of a flexible system. GetAid demonstrates such flexibility and the World Wide Web provides the opportunity for its easy and rapid deployment. Not only does the World Wide Web enhance the feasibility of deploying a flexible system, it also increases the opportunity for its formative evaluation. The GetAid inferencing mechanism, the basis of its reasoning, is centrally held on the server, and was continuously improved throughout the trial as user feedback prompted changes to the lookup exception table part of the underlying knowledge base. GetAid’s webbased users had immediate access to all changes. Thus the validity of GetAid’s decisions increased throughout the evaluation trial. The evaluation exercise was more than a warranty of validity; it also made a significant formative contribution The assumption underpinning the introduction, during the trial, of an expert committee to reconsider cases where GetAid/Past Cases disputed assessments occurred, was the acceptance that there is often no single correct answer in discretionary reasoning. Nevertheless, the objective of the GetAid system is to produce claims that are within the range of plausibility. GetAid can also be used in an alternative mode, not to suggest an outcome but rather to capture expert knowledge. In this mode, the system does not invoke inference mechanisms to reach conclusions in each argument but prompts an expert user to select the conclusion that they consider the most appropriate in the circumstances. Allowing an expert to use GetAid in this way, automates the capture of their expert knowledge. The subsequent processing of this knowledge, using data
A Strategy for Evaluating Web-Based Discretionary…
119
mining techniques such as neural networks and association rules further refines GetAid’s inference mechanisms.
6
Conclusion
This paper has discussed the on-going development of the GetAid system and general strategies to evaluate its efficiency and effectiveness. The presence of discretionary decision making and the resultant requirement for implementing flexibility in GetAid’s inferencing mechanism, made continuous update throughout the evaluation trial both necessary and feasible, and contributed towards the strong formative emphasis of the GetAid evaluation. Currently WebShell is being used to deploy other legal knowledge-based systems on the World Wide Web including sentencing, family law, refugee law and copyright law. The C.C.C. evaluation framework, evaluation strategies and techniques discussed in this paper will be used to evaluate the performance of these new systems. This Evaluation Framework could also contribute to the evaluation of other knowledgebased systems deployed on the World Wide Web whether or not the exercise of discretion is a consideration in their decision-making domain. Acknowledgements This research is funded by an Australian Research Council Grant, in collaboration with Phillips and Wilkins Barristers Solicitors, Victoria Legal Aid and Software Engineering Australia (Qld).
References 1. Dworkin, R.: Law's Empire, Duckworth, London (1986) 2. Shortliffe, E.H.: Computer based medical consultations: MYCIN, Elsevier, New York (1976) 3. Huntington, D.: Web-based expert systems are on the way: Java based Web delivery. PCAI Intelligent Solutions for Desktop Computers, (2000). 14(6), 34-36 4. Stranieri, A., et al.: A hybrid-neural approach to the automation of legal reasoning in the discretionary domain of family law in Australia. Artificial Intelligence and Law, (1999). 7(2-3), 153-183 5. Stranieri, A. and Zeleznikow, J.: The evaluation of legal knowledge based systems. In: Seventh International Conference on Artificial Intelligence and Law. Oslo, Norway, ACM Press, New York, NY (1999) 6. Hall, M.J.J. and Zeleznikow, J.: Acknowledging insufficiency in the evaluation of legal knowledge-based systems: Strategies towards a broad-based evaluation model. In: 8th International Conference on Artificial Intelligence and Law. St Louis, MO USA, ACM (2001) 7. Hall, M.J.J. and Zeleznikow, J.: The Context, Criteria, Contingency evaluation framework for legal knowledge-base systems. In: BIS 2002. Poznan, Poland, ACM (2002)
120
Maria Jean J. Hall et al.
8. Ginsberg, M.J. and Zmud, R.W.: Evolving criteria for Information Systems assessment:. In: Information Systems Assessment: Issues and Challenges: Proceedings of the IFIP WG 8.2 Working Conference on Information Systems Assessment. Noordwijkerhout, The Netherlands, Oxford, North Holland (1988) 9. Kitchenham, B.A.: Evaluating software engineering methods and tool. 2. Selecting an appropriate evaluation method-technical criteria. Sigsoft Software Engineering Notes, (1996). 21(2), 11-15 10.Oppermann, R. and Reiterer, H.: Software evaluation using the 9241 evaluator. Behaviour & Information Technology, (1997). 16(4-5), 232-245 11.Serafeimidis, V. and Smithson, S.: Rethinking the approaches to information systems investment evaluation. Logistics Information Management, (1999). 12(1/2), 94–107 12.Gregory, A.J.U. and Jackson, M.C.: Evaluation methodologies: a system for use. Journal of the Operational Research Society, (1992). 43(1), 19-28 13.Avison, D.E., Horton, J., Powell, P., Nandhakumar, J.: Incorporating evaluation in the information systems development process. In: Second European Conference on Information Technology Investment Evaluation. Henley on Thames, UK., Oper. Res. Soc. Birmingham, UK (1995) 14.Stranieri, A. and Zeleznikow, J.: WebShell: A knowledge based shell for the world wide web. In: Proceedings of ISDSS2001- Sixth International Conference on Decision Support Systems. Brunel University, London: (2001) 15.Toulmin, S.: The Uses of Argument, Cambridge University Press, Cambridge, UK (1958) 16.Stranieri, A., Zeleznikow, J., and Yearwood, J.: Argumentation structures that integrate dialectical and monoletical reasoning. To appear in Knowledge Engineering Review, (2002) 17.Borenstein, D.: Towards a practical method to validate decision support systems. Decision Support Systems, (1998). 23(3), 227-239 18.O'Leary, T.J., et al.: Validating expert systems. IEEE Intelligent Systems & Their Applications, (1990). 5(3), 51-58 19.Frieberg, A. and Ross, S.: Sentencing Reform and Penal Change. The Victorian Experience, Federation Press, Sydney (1999) 20.Boloix, G. and Robillard, P.N.: A software system evaluation framework. Computer, (1995). 28(12), 17-26 21.Farbey, B., Land, F., Targett, D.: The moving staircase. Problems of appraisal and evaluation in a turbulent environment. Information Technology & People, (1999). 12(3), 238-52 22.Ligezinski, P. and Hall, M.J.J.: Designing flexible software to accommodate dynamic user requirements: An alternative solution to a continuing IS problem. In: World Conference on Systemics, Cybernetics and Informatics ISAS '97. Caracas, Venezuela, International Institute of Informatics and Systemics (1997) 23.Woolfolk, W.W., Ligezinski, P., Johnson, B.: The problem of the dynamic organization and the static system: principles and techniques for achieving flexibility. In: Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences. IEEE Comput. Soc. Press. Part vol.3, 1996, pp.482-91 vol.3. Los Alamitos, CA, USA. (1996)