Toward Adding Knowledge to Learning Algorithms ... - Semantic Scholar

4 downloads 43 Views 978KB Size Report
Our system SMILE uses a classijication-based approach tojnd abstract fact situations in le- gal texts. To reduce the cotnple.rity irzherent in legal texts, we take.
Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases Stefanie Briininghaus

and Kevin D. Ashley

Learning Research and Development Center, Intelligent Systems Program and School of Law University of Pittsburgh Pittsburgh, PA 15260

[email protected],

[email protected]

order to help find the best cases for an attorney’s

Abstract

Where this involves

Case-based reasoning systems have shown great protnise for legal argumentation, but their development and wider availability are still slowed by the cost of manually representing cases. In this paper, we present our recent progress toward automatically indexing legal opinion texts for a CBR system. Our system SMILE uses a classijication-based approach tojnd abstract fact situations in legal texts. To reduce the cotnple.rity irzherent in legal texts, we take the individud sentences from a marked-up collection of case sumtwries as examples. We illustrate how integrating a legal thesaurus a& linguistic information with a machine learning algorithm can help to overcome the diSJiculties creuted by legal language. The paper discusses results from a preliminary experiment with a decision tree learning algorithm. Experiments indicate that learning on the basis of sentences, rather than full documents, is effective. They also confirm that adding a legal thesaurus to the learning algorithm leads to improved pet$ormance for some, but not all. indexing concepts.

retrieving

like legal argumentation. Practitioners

often

of multiple

dealing ticated

approaches

trieval

1997). or filtering

systems are also available

(Schweighoer.

to mention

and other commercially

(Moens.

(Smith

et

Uytten-

Winiwarter,

&

only a few applications.

highly successful

information

All these systems attempt to facilitate

(Branting

Permission to make digital or hard copies of all or part of this work for personid or

on manually

and expensive.

representing We present which

01, the tint pafir To copy otherwise. to Irepublish. to post on stmer!, or to redistribute 1~ fiw. wqweh prior yoclfic

ICAIL-9Y

Oslo,

8/99/6 $5.00

Norway

pcrwiGon

ioldlw ;t fee

Copyright

(c) I YYY ACM

l-2

1 I3- 165-

symbolic

indexed

1993). They

Al representation.

manual indexing

this problem,

with a few notable

exceptions

1997). our progress

SMILE’,

case squibs to help bootstrap

for case-base

little research has at-

toward

legal cases for their use in our CBR a smaller-sized,

is time-

of these systems

cost and effort

Surprisingly,

& R&land we discuss

education

cases is a serious obstacle.

The wider availability

by the significant

our system

employs

worker’s & Ashley

have not bridged the gap between

and maintainance.

In this paper,

for instance

Skalak, &Friedman

Since legal cases are long and complex, consuming

reason

these tasks.

wider usein legalpracticeand

texts and the corrsponing

Their dependence

to support

problems,

(Aleven 1997). These systems still, however, the opinion

cases,

systems have been designed and imple-

(Rissland, for a

a machine

automatically systsm CATO.

learning

manually-indexed the development

approach,

collection

of

and maintainance

of a larger case-base. In the following ing.

We will

section, we will give a short introduction and discuss available

methods

then show why these methods

for index

and suggest alternatives.

We have experimented

and will present encouraging

’ SMart Index LEnmer

to our learn-

are not appropriate.

suggested improvements.

clw~room uht: ia grm~d without fee provided that copies me not made or distributed tbr protit or commerci;d adwntage and tint copies bear this notice and the full citation

is

in the context

19911, trade secret law (Aleven

1997), or tax problems show great potential

CBR application.

access to legal texts in

documents

cases and rules. Al representations

mented for a number of different compensation

re-

for legal professionals.

matched problems

with cases are more effective

like SPIRE (Daniels

many sophis-

for retrieving

et al. 1997). summarizing

1995) legal documents.

WestLaw

conferences,

have been presented

cr/. 1995: Greenleaf dale, & Dumotier

ICAIL

retrieving

partially

Legal case-based reasoning

with text has long been a focus of Al and Law research.

At this year’s and the previous

Merkl

in the law are written,

merely

of cases, evaluate

cases, and combine

of reasoning

tempted to overcome Since almost all cases and other materials

compare

about the significance

development

Motivation

however.

need.

reasoning.

not sufficient:

has been prevented 1

information

cases for more advanced

with some of the results.

2

Problems with Learning to Index Legal Cases for CaseBased Argumentation

Despite

the fact that it is widely

the bag-of-words

capture the subtleties When trying analogize

to convince

the problem

the court to rule in a party’s favor. lawyers

to and distinguish

it from previously

cases. In doing so. they often compare of prototypical

fact patterns,

their client’s

which

in CAT0

structional

environment

CAT0

to compare

(Aleven

tend to strengthen

argumentation 1997; Aleven

uses a set of 26 abstract

and contrast

use CATO’s

Factor

edge about

arguments

Database

of about

which

legal

150 cases.

of examples

or &furs,

than CATO’s

to the factor

represen-

approach

in research.

by competitors.

to those gener-

cide whether whether

the information

it actually

records

ing that justifies frequently

from text has be-

The opinion

learning

secret claim

They

other claims.

Even within

are the positive

and negative

of the cases examples,

de-

with

a number

had been used with

of (existing)

learning

algo-

tions.

All of these algorithms

words.

Text is split into single words, or short sequences of words

(n-grams),

which

are treated as independent

based on statistical

properties

The algorithms

use advanced

document. bayesian

models. to derive a classifier

vectors.

We found

statistical

are

methods,

and negative

we think, are: (I) deficiencies

resentation,

and (2) the complexity

in relation

to the number of training

texts

“regular”

indexing

are also characterized

style.

issues

technique.

by very

in exceptionally

complex long sen-

They often use domain-specific The vocabulary

English,

below,

saurus lists alternative are not readily

expres-

and word-distributions

and many

terms

have a specific

one way of capturing (Burton

1992; Statski

the specific

terms or phrases for a legal concept

found in an all-purpose

word

1985). A legal thewhich

dictionary.

of rather simple docu-

3.1

instances.

for a computer-based

usages are legal thesauri

like

These additional

in a legal context.

As discussed

and the

the examples’

did not perform

they could not learn classifiers

this failure,

Weights

of usenet news.

hoped for our text corpus (Brtininghaus tween positive

tokens.

by weighting

that these algorithms

against

opinion

from

meaning

the

issues, or even

of the trade secret claim,

for classifying

use of the information.

sions and a unique differ

the trade

For in-

tences with many branches.

as a bag-of-

of the collection

They work best on large collections

ments, like archives

factors.

documents

Often,

For instance,

and procedural

prose. Judges tend to express themselves

success for other text collec-

represent

topics.

discussed.

the discussion

decisions.

two and twenty

the text.

further

Legal

The opinions

may discuss the length of an injunction

make it more difficult

the factor applies or not.

between

multiple

problem

not all passages are as relevant

1997).

the court

law, and the reason-

about the outcome.

court may deal with jurisdictional

we have taken up the

and the opinions

involve

of

must de-

as a trade secret and

In its opinion,

and may be cited in subsequent

is not the only

defendant’s

Each factor can be seen as a concept,

imdeal

commer-

the court

was protectable

texts are long, usually

pages in length.

text documents

& Ashley

shows the

automatically

In a lawsuit,

its final decision

(Brtininghaus

assigned

larger

or misappropriation

was misappropriated.

are published

amples in CATO’s

which

words,

the suggested

of confidential

discovery

the facts of the case, the applicable

stance, the court

Case Database

with examples

the owners

the information

is to make use of recent progress in Ma-

to classify

domain

we seek to classify

basic argument

to legal cases based on the ex-

We experimented

instances

Texts

and helps to illustrate

from improper

It would be desir-

In the last years, learning

to CATO’s

The documents

idea of using ML to assign factors

rithms,

information

have to be magnitudes

Factors in Opinion

Trade secret law protects

has been shown to be

to a human instructor.

by these advances

pending on whether

would

cial information

under abstract concepts (Sahami et al, 1998), and progress has been made towards extracting information from text (Craven et&. 1998).

in the Case Database

too much irrelevant

If the number of training

with trade secret law.

to test generalizations,

arguments

have been used successfully

Motivated

some of

in other domains.

methods can filter out irrelevant

more clearly

provements.

of case briefs, or squibs. Stu-

with CAT0

Recognizing

difficulties

case bases for other domains. (ML):

are also unlike

Case Database.

A brief introduction

can

its Case

come a focus in this research field. In various approaches, algorithms

3

knowl-

CAT0

of cases from

their own written

when compared

chine Learning

but the number

additional

They also practice

The instruction

A promising

are very long and contain

that tends to mislead classifiers.

to law stu-

with

texts employed

These arguments

contains

In addition

about the domain.

able to develop

simple

statistical

issues and concerns.

for any combination

moves, and compare

It does

of words, and

adverbs and conjuncts. we worked

is large enough,

fact patterns,

dents use the cases in the Case Database

ated by CATO.

all propositions,

the short and relatively

can be made by symbolically

tation of the cases, it has a collection or theories

about the order and relation

1997) an in-

skills

of distinctions.

Hierarchy.

higher-level

generate

& Ashley

enough to

cases by means of eight basic argument arguments

about the significance

effective

or weaken

Retrieval,

is not powerful

of the language used in legal opinions.

information

The legal opinions

in law has been im-

to teach argumentation

More advanced

reasoning

removes

They

plemented

moves.

decided

and contrast cases in terms

claim.

A model of this case-based

dents.

not consider

used in Information

or vector representation

& Ashley

As mentioned

1997). For most

to discriminate

reliably

approach.

be-

Factor:

F15 Unique-Product

in the bag-of-words

above, CATO’s

They indicate

the case. Let us consider

The two main reasons for

of legal opinions,

Example

as we had

give a more concrete

rep-

factors are the goal concepts

the presence of a particular one of CATO’s

impression

factors

for our

fact pattern in

in more detail, to

of what we are trying

to accom-

plish in SMILE.

in particular

In trade secret law, the allegedly

instances available.

must meet certain criteria

10

misappropriated

to be protectable

information

as a trade secret. If, for

instance.

a similar

and discloses available

product

is available

the information,

the information

and may not be protected

In CAT0 plaintiff:

this fact pattern

Fl5,

vided when working

By contrast,

may be generally by a factor

The following

a number

is pro-

with CATO:

Plaintiff

was the only manufacturer

making

the prod-

plicable

uct. not known

or available

outside

Also, it shows that plaintiff’s for plaintiff’s Some typical

examples

innovative

plaintiff’s

information

was

ful representation

business.

Finally,

was valuable

of sentences indicating Case Database,

introduced

evidence

product in the industry.

that Fl5 applies,

Techniques

are:

The information public

V. Bowen)

of the process were entirely

2. use algorithms

in the diagram

is not generally

4.1

competitors.

Tri-Tron

(from

Lemonade

other than that of the plaintiff.

Marking

(from

for Factors

the sentences

for factors

small number similar

wording.

and sentences new cases. relevant

of patterns,

focus on a limited

Experts can relatively that pertain

training

is typically

learning

follow

a

the passages

sentences

tors over the content-words. passage as an entire context. or restriction

of statements

that deciding

information

whether

more detailed

In legal texts, moreover,

In the Mason case example,

the negation

Methods

that combine

somewhere

else, it would

suggests that a better representation

is

but they work

the

of attributes. learning

cluding

domain

saurus.

If the examples

for

the cases. instead

of the complete

computational of possibly

best with

advantages. useless features

full-text

documents, have been

large numbers

of ex-

(in particular

those

algorithms

get bogged down by large

the complexity

of the examples

marked-up

sentences

facilitates

in the form of a domain-specific

are sentences,

idea, namely

This reto a factor.

from parsing.

from

knowledge,

the

become the examples

representation) Reducing

It al-

namely

to a factor.

to classify

tearning

us to add knowledge

A similar

not be

offers

For learning

a thesaurus can be better applied

After

it

a manual process, this step

sentences

a large number

symbolic

(3) Finally,

Lemo-

(I)

examples,

that take a vector over the entire vocabulary Many

numbers allows

the negation of “order”

in effect, Though

examples

that use a more powerful

for the legal conse-

other than that of the plaintiff.”

all. if one could order the product unique. This example

a

than vec-

“It appears that one could not order a Lynchburg

nade in any establishment

as training

amples.

One needs to look at the sentence or is very important

will,

algorithm.

opinions

used successfully, however,

unit to convey

has three advantages.

or passages pertaining

sentences

the learning

algorithms

for a factor.

documents,

into smaller sentences.

to mark up the sentences referring

are not appropriate.

in the text

on full-text

sentences are the appropriate

(2) The use of marked-up

when indexing

the sentences

algorithms

adds little work for an expert indexing

set of issues and use

assigned

In fact, they often underline

factor applies in a case requires

crucial:

generalty

easily identify

to the factors

It is clear from the examples,

quences.

to a factor

process.

on smaller and more relevant

The marked-up

relevant

vectors

Units of Information

up the sentences

lows focussing

found in a few places, in the form of sentences or short passages. Moreover,

as examples,

information.

marked-up

how evidence

in SMILE

and

to break up the document

quires one manually

illustrate

above, our approach

to the induction

Focus on Smaller

is more apprpriate

Muson 13.Jack Daniel Distillery)

examples

Factors

that learn rules, rather than weighting

Rather than running

known to the

It appears that one could not order a Lynchburg

These

background

unique in tran-

In our application,

in any establishment

outlined

over the entire vocabulary,

(from Sperry Rmd v. Rothlein)

nor to any of Tri-Tron’s

Evidence

methods.

specific

I. use sentences instead of entire documents

relevant

3.2

for a more power-

is to:

v. Velto) l

rather than requiring

domain

for Finding

Based on the requirements

that Pan] Brick was a unique

(from Innovcltivc

Better

3. add knowledge

Several features

sistor manufacturing. l

has to be ap-

used bag-of-words

to include

4

v. Frey) l

than the commonly

and contain

algorithm

It has to allow

like a legal thesaurus.

instead of having to climb the tree. (from Phillips

can climb

examples.

knowledge,

It has a unique design in that it has a single pole that a hunter

l

The learning

it must be possible

has to have

has to address the

are very long, complex,

information.

of training

treated

business.

found in cases in CATO’s l

apparently

approach

to rather small numbers of instances,

thousands

This factor shows that the information

to assign indices

A successful

that legal documents

a lot of irrelevant

of a classifier. normally

to remove.

for learning

of properties.

problem

texts certain words,

are too important

Thus, an algorithm

favoring

definition

much, if at all, to the performance

in legal opinion

as stopwords,

against use by competitors.

is represented

Unique-Product.

will not contribute

from other manufacturers

the knowledge

in the relevant

contained

inthein

place.

to focus on sub-passages

(although

not

sentences) within a document, has been used before, in the SPIRE system (Daniels & Rissland 1997). The goal of SPIRE is also

of the exam-

to assist a user in indexing

ples is needed. In other domains. for instance for the classification of newswire articles, removing stopwords is sensible. These terms

cases for a case-based

tem. Cases in SPIRE are not indexed

II

by factors,

reasoning

but represented

sysas

frames.

Many of the features involve

as the amount

quantitative

information

such

about the role of words.

of money to be paid or the length of a bankruptcy

plan. For each indexing

concept,

marked passages drawn

from its case opinions.

mation to identify

SPIRE has a library

the most promising

attempted

To accomplish With its internal

that goal, SPIRE forwards to be indexed,

relevance

ates a new query

feedback

4.3

Integrating

which

the marked passages,

to the INQUERY

algorithms.

from the marked

the highest ranked

passages.

This query

often

drafting

is used

with

the new document,

and entirely

methods.

the new

not rely on background

to modify

knowledge.

SPIRE’s

simplicity

also leads to limitations. texts.

SPIRE

adversarial

or the problem favoring

of interpreting

SPIRE would

a Rule

Learning

termine whether few, highly

the

for our experiments.

It is organized

sets. Examples

sentence contains fore we think.

features.

may discuss

admission

and the evidence 5

the product,”

which

Design

5.1

Algorithm

‘unique’

and ‘product’,

a symbolic

tree-structure

construct

with

For example, algorithm

implicit

some rules to de-

Case

goods

then f15 applies.”

like lD3.

for smaller

algorithms

sized collections

based on powerful in (Bruninghaus

statistical & Ashley

which

ples (in the magnitude or so cases included

are generally

reported

This is a short passage from its squib:

no other establishment In 1982,

those described for our appliFor the 120

here, we have 2200

sentences in the squibs, but only about 50 to 100 positive

popularity

(the drink comdrinks),

the drink,

a sales representative

Mason’s

restaurant Mason

but

f15 f 161 of the dis-

and drank

[f 1 Mason disclosed

in exchange,

Lynch-

part of the recipe

claimed,

for a promise

that Mason and his band would

be used in a sales pro-

motion

having been under the

f I][ f2l Randle recalled

impression

examples

had duplicated

it could easily be duplicated.

Randle,

visited

to Randle

large numbers of exam-

Fl5.

and F21, Knew-

prised about one third of the sales of alcoholic

algo-

has the fac-

F6, Security-Measures,

[f 15 f16 Despite its extreme a

up the squibs of

F 16, Info-Reverse-Engineerable,

Info-Confidential.

better suited

to work reliably.

here, we marked

Disclosure-in-Negotiations,

burg Lemonade.

in particular

reported

product

of SMILE

is the Mason case 2, which

tors Fl,

models,

in the experiment

merchandise

Mark-Ups

tillery,

of thousands)

secret undisclosed

revelation

and Implementation

like ours. As noted above, algorithms

models require

hidden

material

discovery

cases. An example

There-

generates

rule induction

1997) are less appropriate

cation, since the underlying

as sets of synonyms,

to trade secret law are:

inventory

disclosure

experts claimed

rule-learning

For-

one and six of about 20,000

disguised

CATO’s

rithm, is most promising. Symbolic

relevant

concealed

Unique-Product,

one such rule is: “if the

rules. or another

to between

queries.

a copy of the WestLaw

stock supplies

a factor applies to a sentence. The rules use only a

relevant

some form of a the-

this is an important

offered

We have found that one can manually

maintain

to employ

commodity

of le-

Given

what we intend to overcome.

Using

is by

to detect synonyms. like WestGroup

For the experiments 4.2

that “covenant”

can be approached

unrevealed

not be able to deal with an

like “No other manufacturer

infer

most other

variety

a judge

of a factor

that it does not apply,

For instance,

is exactly

from

negation.

where

the application

a conclusion

limitation. example

legal documents

it cannot

when

by itself can not cope

and use it for purposes of expanding

clandestine

why factors are hard

1985)

in part by the fact

algorithm

This limitation

where each word belongs

it

Statski

for terms used in

motivated

we have the opportunity

Thesaurus synonym

of the approach,

before

nature of legal discourse,

the evidence favoring

is a strength

does not address the wide expressive

gal language

retrieval and does

sequences of words.

We discussed

to lind. and what distinguishes

tunately.

it also does not rely on full

sentences, but rather on fixed-length Although

information

a thesaurus

saurus internally

is that SPIRE is based

these methods,

For instance,

1992;

not to repeat themselves

A learning

word for “contract.”

Legal publishers

us to pursue the idea to work with

domain-independent

It does not attempt

Thesaurus

definitions

This is probably

synonymity.

applying

on existing

approaches

like a thesaurus.

(Burton

and sometimes

their decisions.

another

case. from SMILE

use legal thesauri

list synonyms

that judges appear to make an effort

gener-

It presents these passages to the user to assist in indexing

sentences. The main difference

knowledge.

an Application-Specific

legal documents.

system.

INQUERY

passages within

SPIRE has in part motivated

none of these previous

passages in a new document Attorneys

as well as the document

Also,

background

of manually

It uses this infor-

to be indexed.

to retrieve

to include

that Mason’s

recipe was a “secret formula.”

f2l]

for each factor. Decision

tree learning

algorithms

sify texts, e.g., newswire ier, Raskinis, decision

tree technique

it has been sufficient factors,

more powerful

the patterns

articles

& Ganascia

contain

1996).

(Lewis

& Riguette

to use only a few, obvious patterns are necessary. stopwords

fxx]

1994; Moulin-

For these applications,

has been quite successful.

certain

For our classification

have been used before to clas-

words.

approach,

instances

instances.

the sentences bracketed

for the factor

* Muson v. Juck Dunids

below,

information

12

Dktillep’.

[fxx

.

xx; all other sentences are

A factor can be considered

at least one of the sentences is a positive

there,

For finding

As described

and linguistic

negative

the basic

However,

are positive

to apply to a case if

example

of the factor.

5 18 So.2d I30 (Ala.Civ.App.

1987)

+-

I3

+

Positive

Figure 1: Decision tree for simple sentences 5.2

Decision

Figure 2: Decision tree learned without knowledge from a thesaurus

Tree Induction

These sentencesare then used as training examples for a machine learning algorithm. Ideally, judges would describe the facts of a casein simple and straight forward phrases. They would always use the same words or expressions, never use negation, etc. The the positive (+) and negative (-) examples given to a classifier might look like this: Figure 3: Decision tree learned when knowledge from a thesaurus is added to examples

+ The product was unique. - His product was identical to plaintiff’s, - The recipe was always locked away in a unique safe.

inference. With the help of a thesaurus, however, it is possible to induce a better tree. There are two ways to include the information of a thesaurus. First, we can use a thesaurusto discover synonyms while inducing a decision tree, without the need to change the representationof the examples. Instead of learning a tree, where the nodes are words, we can learn a tree where the nodes are categories from the thesaurus. The relevant category in the WestLaw Thesaurus for this example is:

.- Plaintiff employed unique security measures. If the sentenceswere as simple as in this example, applying a decision tree algorithm would work perfectly. In inducing the tree, an algorithm like ID3 recursively selects the attribute that best discriminates between positive and negative examples and splits up the training set, according to whether this attribute applies. The processis repeateduntil it has a set of only positive or negative examples. Here, ID3 would first split up the examples into those that have the word “product”, and those that don’t. It would then try to find a way to distinguish between the first and the secondexample, and select the word “unique”. The corresponding decision tree is shown in Figure 1. Of course,judges do not restrict their factual descriptions in this way. In the next section, we discuss how adding knowledge from a legal thesaurusand adding linguistic knowledge may help. 5.3

Adding

0 agreementcontract covenant promise testament If we modify the learning algorithm accordingly, ID3 will choosethis category to discriminate perfectly between positive and negative examples, shown in Figure 3. This tree will also correctly classify examples which use the term agreementinstead of contract or covenant. The main benefit of having a simpler decision tree can not be shown in a simple example. A well-known problem with ID3 is its tendency to overfit the data. In particular with many attributes available, the algorithm learns overly specific trees, which often misclassify new examples. Using the thesaurus in a way suggested in Figure 3 counteracts the overfitting, and thereby can lead to better performance. A second way to include a thesaurusis to change the representation of the examples, and expand them in advance by adding all possible synonyms before the learning algorithm is applied. Our positive examples would then appear like:

a Thesaurus

To illustrate our intuitions about adding a thesaurus, let’s assume, we have positive (+) and negative (-) examples of someconcept: + He signed the contract. + He signed the covenant. - He signed the postcard. - He signed the book. Half of the examples is positive, half negative. No single term can discriminate between positive and negative examples. A decision tree algorithm would createa tree as in Figure 2, branching out too much. The knowledge to recognize that covenant and contract are synonyms is missing, and there is no reliable way to make that

+ He signed the contract + agreementcovenant promise testament. + He signed the covenant + agreementcontract promise testament.

13

Now, a tree can be easily by choosing

learned for the example

either the term covenant

tween positive

or contract

and negative examples.

sentences,

to distinguish

Similarly,

be-

This also results in a simpler

categories

in the WestLaw

sentences contain tial”,

which

tive link (A) between

helps to generalize

similar

Thesaurus.

Consider

are both indicative

of factor

they are not in the same synonym

expanded

to include

synonyms

by the algorithm.

the term “secret”,

possible

meanings.

One will

categories

6

liminary

artery freeway

has the distinct multiple

have to add the synonyms

I

highway

interstate

parkway

may

then be a positive

Design for Integrating

relation

example

about

for both “highway”

grate a parser into the learning

linguistic

and

information negation,

is to inte-

evaluated

here does not include

sentation,

this seems the best place to discuss the anticipated

a parser in our program.

information

who made hunter stands.

way to discriminate

between

ples is to include

the fact that in the positive

turer”

by “no other”

and “hunter

use of

stand” most likely therefore

ing CMU’s

l

Fl5, Unique-Product

l

Fl6, Info-Reverse-Engineerable

l

Fi 8, Identical-Products,

l

F21. Knew-Info-Confidential.

overly

the sentence,

spe-

e.g.

//~~~.link.cs.cmu.edu.)

a range of difficulty

for the learning

As discussed above, courts employ

By contrast,

riety of different

fact situations

from which

would

words’

object

and verb of the sentence

at-

man to discover,

and asserting

standing only

us-

Probably would

its presence requires

where

To simplify

this factor

the problem

applies,

further,

short summaries,

about one page of text.

part-of-speech

is tagged, and, most interesting

at hand, the combination (Ds), and as an idiomatic

no-other

cantly

can be derived.

as training

is labeled as determiner

string stored in the dictionary

marked-up

for the task

examples,

bogged down

of a noun (IDC).

14

factors

than finding

We adopted this simplification,

the

to avoid

computationally,

undercontains

so it is not a good

examples. Their

squibs, rather The squibs are

primary

The drafters

in mind in preparing

Thus, finding

easier problem

more abstract

language

we used CATO’s

opinions

squarely

for a hu-

of a new method.

than the full-text

factors

natural

Also, the Case Database

to show the applicability

than F6.

is even difficult

only very advanced

be appropriate.

five examples

candidate

The

are identified.

that

some factors which

be even harder to learn from examples Agreement-not-specific,

of the case facts. The subject,

a prod-

a very wide va-

it may be inferred

to note that we have omitted

F5. for instance, inferences.

above would be:

information

We

a small

in discussing

the squibs identify

is to restate the case facts succinctly.

various

that they

algorithm.

to be found much more easily than

uct’s uniqueness.

had CATO’s From this parser output,

factors:

and

F15, Unique-Product,

we believe

(The terms “filled

from the available

instance in the example

provide

It is important

occur in only one case,

be deleted

Link Parser. (See http:

output for the positive

by in-

on this as

F6 applies.

the two exam-

To avoid learning

by parsing

F6, Security-Measures

F6, Security-Measures.

tributes. can be found

l

expected

sentence “manufac-

as an attribute.

be pruned away.)

cific trees, they would The attribute

working

we use a subset of CATO’s

set of patterns and some standard phraseology

- He was a manufacturer

will

for example,

Setup and Assumptions

Fl, Disclosure-In-Negotiations

would

Assume we have the examples: made filled chocolate.

and therefore

be any benefit

We have selected these factors because we anticipated

in the repre-

+ No other manufacturer

chocolate”

can improved, We are currently

l

about the

system. While the version of SMILE linguistic

Experimental

Information

of words in sentences and to represent

is modified

there would

For the time being, we did not investigate

knowledge.

For this experiment,

Linguistic

way to include

The only reliable

and whether

algorithm.

instead of full-

the next phase of our experimentation.

which is undesirable.

The most promising

linguistic

tree induction

using sentences

.. .

when used in a text about document

5.4

work,

a pre-

we tested a pro-

apply:

The term artery, “vein,”

would

we conducted

environment,

of the ID3 decision

in what ways the representation cluding

with the word artery,

vein venule vessel

would

In a simplified

from adding a thesaurus.

for all

and wrong inferences

For sentence

text opinions

dis-

meanings

0 artery capillary

traffic

experiment.

totype implementation

as

6.1 l

by an adjec-

Experiment

To find out how well some of our ideas work,

and thus be recognized

Overgeneralizations

is indicated

in

tree.

and “chocolate”.

Our goal was to find out whether

occur and affect performance. the following

“filled”

chocolate”

a decision

and “confiden-

set, they would both be

that it does not deal with polysemy,

and uses for one word.

cases where two

F6, Security-Measures.

On the other hand, this second approach advantage

and used in learning

further than using the

words, like “undisclosed”

Although

about phrases, or the role of attributes

can be derived

For instance, the phrase “tilled

tree, and thereby can help to avoid overfitting. This second approach

information

the sentence,

of the squibs the summaries

in the squibs is a signifi-

factors in the full-text

however, having

function

opinions.

to get a set of consistently the learning

and to satisfy

algorithm

our curiosity

get as to

!

,a

115

Precision

48.70%

Recall

71.42%.

I

11

116

116

32.14%/

30.00%

60.55%

16

44.44%

58.97%

50.00%,

60.00%

81.69%

54.54%

.. 63.96%

.. lmpravements with Thesaurus bySsntence,s

Table I : Precision and recall for finding factors in cases whether the method would work with the squibs before we undertook scaling up to the more complex opinions. We believe that the effects observed on squibs will also apply to opinions, but that experiment remains to be carried out. 6.2

R1

LB

$18

Implementation Figure 4: Relative performance improvement by adding a thesaurus

For the experiment, we implemented the basic ID3 algorithm (Mitchell 1997; Quinlan 1993) without any modifications. and without methods to prune the learned trees. To generatethe examples for the experiment, we split the squibs into sentences. Depending on their markup (see Section 5. I), they were labeled as positive or negative examples. The casesare treated as binary attribute vectors, to simplify the representation. We only use the words that occur in the positive examples as attributes, which significantly decreasescomplexity. For this experiment, we also removed stopwords, and performed only very minor morphological corrections, like removing the plural-s suffix. In short, we have used a “bag-of-words” representation in this preliminary experiment to test the value of adding knowledge from a legal thesaurus. As described above, we plan to improve upon this representation in subsequentwork. The experiments were run as a 5-fold cross validation, as suggested, for instance, in (Quinlan 1993: Mitchell 1997). In five rounds, we left out one fifth of the examples as test data, and used the rest as training examples. This way, each sentence from the squibs was used exactly once in testing. In order to maintain a uniform class distribution, we performed the random partitioning of positive and negative examples seperately. 6.3

II

text opinions as our examples in these experiments. However, in most casesthe sentencesmarked as positive instances for a factor in the squibs are very similar to the corresponding sentencesin the full-text opinions, In fact, in some cases they were copied verbatim from the full-text opinions. In the much more compact squibs, we have fewer negative examples per document, but the complexity and length of the positive examples does not differ dramatically. This certainly increases the likelihood that using marked-up sentences will also be appropriate for the full opinions. One may notice that, even though these results are positive, there is still room for improvement. This can be expected, since in the experiments, we did not include any linguistic information. In Sections 3.2 and 5.4, we discussedwhy in particular negation is needed for finding factors in legal texts, and how we are planning to integrate the necessarylinguistic knowledge. Even more interesting is the next result of our experiment. It shows that adding a thesaurushelps when classifying sentencesunder factors, but that the usefulness dependson the factor. In Figure 4, we show the relative improvement of using the thesaurus both while (solid color) and before (striped) learning when compared to the plain decision tree algorithm. We calculated the difference in precision between the learner that used the thesaurus and the one that did not, and divided by the precision for the plain learning algorithm. The result is the relative change in precision, and allows us to compare the effects across factors. We did the samefor recall, and for both ways of integrating the thesaurus. The graph indicates that for factors Fl5, Unique-Product, and F21, Knew-Info-Confidential, adding the thesaurus clearly improves performance. This confirms our intuitions, which we illustrated in Section 4. For Fl, Knew-Info-Confidential, adding the thesaurus while learning is useful, while adding it before learning is without effect. For F16, Info-Reverse-Engineerable, and Fl8, Identical-Products, adding the thesaurus increases precision, and decreasesrecall. It seemsthat the thesaurus makes the algorithm more “conservative” in discovering positive instances. The thesaurus is also not useful for factor F6, Security-Measures, which could be expected. In a commercial context, there is a wide variety of measuresto keep information secret. They are often very practical matters not related to legal concepts. Therefore, a thesaurus of

Results

We were interested in the effect of two techniques, namely using marked up sentencesinstead of the full documents to discover the factors, and adding a domain specific thesaurus. The results of the experiments suggest both are beneficial. First, using a rule-learning algorithm for marked-up sentences as examples seemsto be a good approach to tackle the problem. It reduces complexity, but still. the individual sentences contain enough information to be useful as examples for the factors. In our previous experiments where we attemptedto learn factors from full-text opinions, the statistical learning methods could only learn the goal concepts to a very limited degree. Most of the time the classifiers could not discover the positive instances, which lead to low precision and recall. Here, however, the decision tree algorithm could achieve precision and recall of up to 80% for finding which factor applies to a case.as shown in Table I. Part of this is certainly due to the fact that we used only the case summaries, and not the much longer and more complex full-

IS

legal terms has little effect.

7

References Aleven,

Conclusion

In this paper we presented our work toward legal documents. concepts

Our ultimate

goal is to identify

reasoning

cases as examples,

systems.

we apply

Using

machine

ever, methods that were successfully directly

automatically

applicable,

and complex.

a collection

learning

Moreover,

the representation enough

techniques.

normally

Branting,

Assigning

these problems,

we suggested

representation.

Craven,

M.; Freitag,

our ideas on adding domain

of a legal thesaurus

and linguistic

knowl-

and Slattery,

information

we tested whether

and whether

tion to be used as examples decision

algorithm.

algorithm

could

single sentences under the factors, before. work,

interesting

would

ations are fairly

leads to better precision of real-world

and recall.

situations,

tic information

knowledge information

improve-

the intuitions

dis-

fact situ-

Moens,

to better

we think that us-

l?

Legal lnferencing

M. 1994. A comparison

for text categorization.

of two learning

In Proceedings of the Third An-

T. 1997. Muchine Learning. M.-F.;

Uyttendale,

MC Graw Hill.

C.; and Dumotier,

J. 1997. Abstracting

of Legal Cases: The SALOMON Experience,

In Proceedings of the

Sixth International (ICAIL-97).

Intelligence and Law

Moulinier,

adding domain

and adding

our goal of indexing

S.; and Chung,

AustLll’s

Intelligence and Law (ICAIL-97).

D., and Riguette,

Mitchell,

knowledge

does not help.

algorithm,

G.; Cant,

nual Symposium on Document Analysis and Information Retrieval (SDAIR-94).

not observe

In this way, we intend

for a learning

help toward

Lewis,

algorithms

a parser, and add linguis-

in the form of a legal thesaurus, will

Conference on Artificial

was whether

we could

used in legal texts. Overall,

A.; King,

opinions.

the additional

adding a thesaurus

G.; Mowbray,

via the World Wide Web. In Proceedings of the Sixth International

If a factor covers a wide variety

be to integrate

to the examples.

ing sentences as examples

K.;

Knowledge

J., and Rissland,

1997. More Than Wyshful Thinking:

individual

where the underlying

and uniform,

Our next steps will capture the language

Although

the results suppport

For factors,

concrete

T.; Nigam,

Symbolic

Using cases to seed information

Greenleaf,

and not the

test whether

for full-text

to the

reasons, we ran

lead to the performance

ments over the plain algorithms. better results for all factors,

find fac-

comparable

part of our experiments

adding a legal thesaurus

cussed in the paper.

We found that

from CATO,

we will

sentences also make better examples most

A.; Mitchell,

to Extract

E. 1997. What you saw is what you retrieval. In Proceedings of the Second International Conference on Case-Based Reasoning (ICCBR-97). Daniels,

want:

learn quite well

For practical

on short case summaries In future

using single

and thereby

This result is not directly

we have reported

opinions.

Learning

from the World Wide Web. In Proceedings of the Fifteenth National

they contain enough informa-

for a learning

tree induction

tors in case squibs.

D.; McCallum.

S. 1998.

Macmillan.

Conference on ArtiJicial Intelligence (AAAI-98).

experiment.

sentences is appropriate

for

Conference on Case-Based Reasoning (ICCBR-97).

and enables us to use a better and more powerful

In a preliminary

The

Learning

Cases. In Proceedings of the Second

on indi-

from parsing the sentences.

full-text

K. 1997. Using Machine

to Textual

W. 1992. Legal Thesaurus. Simon & Schuster

edge in the form

the experiments

Studies 34(6).

Burton,

We illustrated

experiments

from rules and structured

This reduces the complexity

vidual sentences related to the factors. of the examples,

focusing

explanations

Journal on Man-Machine

S., and Ashley, Indices

International

To overcome

to classify

Intelligence und Law

L. 1991. Building

Briininghaus,

used in

case opinions.

a simple

Conference on Artificial

cases. International

long

used for sim-

the language

Environ-

In Proceedings of the

Howare not

are exceptionally

to capture

a Learning

Skills.

Aleven. V. 1997. Teaching Case-Based Argumentation through u Model and Examples. Ph.D. Dissertation, University of Pittsburgh.

of indexed

used in other domains

because legal documents

pler texts is not powerful

indexing

and maintainance

K. 1997. Evaluating

Argumentation

Sixth fnternational (ICAIL-97).

indexing

certain

in case texts to help the construction

of case-based

V., and Ashley,

ment for Case-Based

rization:

Conference on Art$cial

1.; Raskinis, A symbolic

G.; and Ganascia, approach.

J. 1996.

Text catego-

In Proceedings of the Fifth An-

nual Symposium on Document Analysis and Information Retrieval (SDAIR-96).

linguistic

legal texts auto-

matically.

Quinlan,

R. 1993. C4.5: Programs for Machine Learning.

Morgan

Kaufman,

Acknowledgements Rissland. This research has been supported

by the National

Science Founda-

Through

accessible

Peter Jackson for making the WestLaw

D.: and Friedman,

Indexing

of the Thirteenth international ligence (IJCAI-93).

tion, under Grant IR196- 19713. We would like to thank West Group and in particular

E.; Skalak, Multiple

Thesaurus

to us.

Sahami,

M.; Craven,

and Heuristic

Case Retrieval

Search.

In Proceedings

Joint Conference on ArtiJicial Intel-

M.: Joachims,

T.; and McCallum,

1998. Learning for Text Categorizations, Workshop. AAAI Press.

16

T. 1993.

A., eds.

Papers from the AAAI-98

Schweighofer.

E.; Winiwarter.

W.; and Merkl.

tion Filtering:

The Computation

of Similarities

D. 1995. Informain Large Corpora

of

Proceedings qf the Fifth International Conference on Artifciul lntrlligerzce mu/ LOW(ICAIL-95). Legal Texts.

In

Smith, J.; Gelbart, Shinehoft,

D.; McCrimmon,

M.; and Quintana.

Legal Discourse:

The Flexlaw

K.; Athertin,

L. 1995. Artificial

B.; Ma&lean, Intelligence

Legal Text Management

J.: and

System.

Artificiul Intelligence and Law 2( I ). Statski,

W.

Publishing.

1985. West’sLegal 7’hesmrus and Dictionary. West

Suggest Documents