Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases Stefanie Briininghaus
and Kevin D. Ashley
Learning Research and Development Center, Intelligent Systems Program and School of Law University of Pittsburgh Pittsburgh, PA 15260
[email protected],
[email protected]
order to help find the best cases for an attorney’s
Abstract
Where this involves
Case-based reasoning systems have shown great protnise for legal argumentation, but their development and wider availability are still slowed by the cost of manually representing cases. In this paper, we present our recent progress toward automatically indexing legal opinion texts for a CBR system. Our system SMILE uses a classijication-based approach tojnd abstract fact situations in legal texts. To reduce the cotnple.rity irzherent in legal texts, we take the individud sentences from a marked-up collection of case sumtwries as examples. We illustrate how integrating a legal thesaurus a& linguistic information with a machine learning algorithm can help to overcome the diSJiculties creuted by legal language. The paper discusses results from a preliminary experiment with a decision tree learning algorithm. Experiments indicate that learning on the basis of sentences, rather than full documents, is effective. They also confirm that adding a legal thesaurus to the learning algorithm leads to improved pet$ormance for some, but not all. indexing concepts.
retrieving
like legal argumentation. Practitioners
often
of multiple
dealing ticated
approaches
trieval
1997). or filtering
systems are also available
(Schweighoer.
to mention
and other commercially
(Moens.
(Smith
et
Uytten-
Winiwarter,
&
only a few applications.
highly successful
information
All these systems attempt to facilitate
(Branting
Permission to make digital or hard copies of all or part of this work for personid or
on manually
and expensive.
representing We present which
01, the tint pafir To copy otherwise. to Irepublish. to post on stmer!, or to redistribute 1~ fiw. wqweh prior yoclfic
ICAIL-9Y
Oslo,
8/99/6 $5.00
Norway
pcrwiGon
ioldlw ;t fee
Copyright
(c) I YYY ACM
l-2
1 I3- 165-
symbolic
indexed
1993). They
Al representation.
manual indexing
this problem,
with a few notable
exceptions
1997). our progress
SMILE’,
case squibs to help bootstrap
for case-base
little research has at-
toward
legal cases for their use in our CBR a smaller-sized,
is time-
of these systems
cost and effort
Surprisingly,
& R&land we discuss
education
cases is a serious obstacle.
The wider availability
by the significant
our system
employs
worker’s & Ashley
have not bridged the gap between
and maintainance.
In this paper,
for instance
Skalak, &Friedman
Since legal cases are long and complex, consuming
reason
these tasks.
wider usein legalpracticeand
texts and the corrsponing
Their dependence
to support
problems,
(Aleven 1997). These systems still, however, the opinion
cases,
systems have been designed and imple-
(Rissland, for a
a machine
automatically systsm CATO.
learning
manually-indexed the development
approach,
collection
of
and maintainance
of a larger case-base. In the following ing.
We will
section, we will give a short introduction and discuss available
methods
then show why these methods
for index
and suggest alternatives.
We have experimented
and will present encouraging
’ SMart Index LEnmer
to our learn-
are not appropriate.
suggested improvements.
clw~room uht: ia grm~d without fee provided that copies me not made or distributed tbr protit or commerci;d adwntage and tint copies bear this notice and the full citation
is
in the context
19911, trade secret law (Aleven
1997), or tax problems show great potential
CBR application.
access to legal texts in
documents
cases and rules. Al representations
mented for a number of different compensation
re-
for legal professionals.
matched problems
with cases are more effective
like SPIRE (Daniels
many sophis-
for retrieving
et al. 1997). summarizing
1995) legal documents.
WestLaw
conferences,
have been presented
cr/. 1995: Greenleaf dale, & Dumotier
ICAIL
retrieving
partially
Legal case-based reasoning
with text has long been a focus of Al and Law research.
At this year’s and the previous
Merkl
in the law are written,
merely
of cases, evaluate
cases, and combine
of reasoning
tempted to overcome Since almost all cases and other materials
compare
about the significance
development
Motivation
however.
need.
reasoning.
not sufficient:
has been prevented 1
information
cases for more advanced
with some of the results.
2
Problems with Learning to Index Legal Cases for CaseBased Argumentation
Despite
the fact that it is widely
the bag-of-words
capture the subtleties When trying analogize
to convince
the problem
the court to rule in a party’s favor. lawyers
to and distinguish
it from previously
cases. In doing so. they often compare of prototypical
fact patterns,
their client’s
which
in CAT0
structional
environment
CAT0
to compare
(Aleven
tend to strengthen
argumentation 1997; Aleven
uses a set of 26 abstract
and contrast
use CATO’s
Factor
edge about
arguments
Database
of about
which
legal
150 cases.
of examples
or &furs,
than CATO’s
to the factor
represen-
approach
in research.
by competitors.
to those gener-
cide whether whether
the information
it actually
records
ing that justifies frequently
from text has be-
The opinion
learning
secret claim
They
other claims.
Even within
are the positive
and negative
of the cases examples,
de-
with
a number
had been used with
of (existing)
learning
algo-
tions.
All of these algorithms
words.
Text is split into single words, or short sequences of words
(n-grams),
which
are treated as independent
based on statistical
properties
The algorithms
use advanced
document. bayesian
models. to derive a classifier
vectors.
We found
statistical
are
methods,
and negative
we think, are: (I) deficiencies
resentation,
and (2) the complexity
in relation
to the number of training
texts
“regular”
indexing
are also characterized
style.
issues
technique.
by very
in exceptionally
complex long sen-
They often use domain-specific The vocabulary
English,
below,
saurus lists alternative are not readily
expres-
and word-distributions
and many
terms
have a specific
one way of capturing (Burton
1992; Statski
the specific
terms or phrases for a legal concept
found in an all-purpose
word
1985). A legal thewhich
dictionary.
of rather simple docu-
3.1
instances.
for a computer-based
usages are legal thesauri
like
These additional
in a legal context.
As discussed
and the
the examples’
did not perform
they could not learn classifiers
this failure,
Weights
of usenet news.
hoped for our text corpus (Brtininghaus tween positive
tokens.
by weighting
that these algorithms
against
opinion
from
meaning
the
issues, or even
of the trade secret claim,
for classifying
use of the information.
sions and a unique differ
the trade
For in-
tences with many branches.
as a bag-of-
of the collection
They work best on large collections
ments, like archives
factors.
documents
Often,
For instance,
and procedural
prose. Judges tend to express themselves
success for other text collec-
represent
topics.
discussed.
the discussion
decisions.
two and twenty
the text.
further
Legal
The opinions
may discuss the length of an injunction
make it more difficult
the factor applies or not.
between
multiple
problem
not all passages are as relevant
1997).
the court
law, and the reason-
about the outcome.
court may deal with jurisdictional
we have taken up the
and the opinions
involve
of
must de-
as a trade secret and
In its opinion,
and may be cited in subsequent
is not the only
defendant’s
Each factor can be seen as a concept,
imdeal
commer-
the court
was protectable
texts are long, usually
pages in length.
text documents
& Ashley
shows the
automatically
In a lawsuit,
its final decision
(Brtininghaus
assigned
larger
or misappropriation
was misappropriated.
are published
amples in CATO’s
which
words,
the suggested
of confidential
discovery
the facts of the case, the applicable
stance, the court
Case Database
with examples
the owners
the information
is to make use of recent progress in Ma-
to classify
domain
we seek to classify
basic argument
to legal cases based on the ex-
We experimented
instances
Texts
and helps to illustrate
from improper
It would be desir-
In the last years, learning
to CATO’s
The documents
idea of using ML to assign factors
rithms,
information
have to be magnitudes
Factors in Opinion
Trade secret law protects
has been shown to be
to a human instructor.
by these advances
pending on whether
would
cial information
under abstract concepts (Sahami et al, 1998), and progress has been made towards extracting information from text (Craven et&. 1998).
in the Case Database
too much irrelevant
If the number of training
with trade secret law.
to test generalizations,
arguments
have been used successfully
Motivated
some of
in other domains.
methods can filter out irrelevant
more clearly
provements.
of case briefs, or squibs. Stu-
with CAT0
Recognizing
difficulties
case bases for other domains. (ML):
are also unlike
Case Database.
A brief introduction
can
its Case
come a focus in this research field. In various approaches, algorithms
3
knowl-
CAT0
of cases from
their own written
when compared
chine Learning
but the number
additional
They also practice
The instruction
A promising
are very long and contain
that tends to mislead classifiers.
to law stu-
with
texts employed
These arguments
contains
In addition
about the domain.
able to develop
simple
statistical
issues and concerns.
for any combination
moves, and compare
It does
of words, and
adverbs and conjuncts. we worked
is large enough,
fact patterns,
dents use the cases in the Case Database
ated by CATO.
all propositions,
the short and relatively
can be made by symbolically
tation of the cases, it has a collection or theories
about the order and relation
1997) an in-
skills
of distinctions.
Hierarchy.
higher-level
generate
& Ashley
enough to
cases by means of eight basic argument arguments
about the significance
effective
or weaken
Retrieval,
is not powerful
of the language used in legal opinions.
information
The legal opinions
in law has been im-
to teach argumentation
More advanced
reasoning
removes
They
plemented
moves.
decided
and contrast cases in terms
claim.
A model of this case-based
dents.
not consider
used in Information
or vector representation
& Ashley
As mentioned
1997). For most
to discriminate
reliably
approach.
be-
Factor:
F15 Unique-Product
in the bag-of-words
above, CATO’s
They indicate
the case. Let us consider
The two main reasons for
of legal opinions,
Example
as we had
give a more concrete
rep-
factors are the goal concepts
the presence of a particular one of CATO’s
impression
factors
for our
fact pattern in
in more detail, to
of what we are trying
to accom-
plish in SMILE.
in particular
In trade secret law, the allegedly
instances available.
must meet certain criteria
10
misappropriated
to be protectable
information
as a trade secret. If, for
instance.
a similar
and discloses available
product
is available
the information,
the information
and may not be protected
In CAT0 plaintiff:
this fact pattern
Fl5,
vided when working
By contrast,
may be generally by a factor
The following
a number
is pro-
with CATO:
Plaintiff
was the only manufacturer
making
the prod-
plicable
uct. not known
or available
outside
Also, it shows that plaintiff’s for plaintiff’s Some typical
examples
innovative
plaintiff’s
information
was
ful representation
business.
Finally,
was valuable
of sentences indicating Case Database,
introduced
evidence
product in the industry.
that Fl5 applies,
Techniques
are:
The information public
V. Bowen)
of the process were entirely
2. use algorithms
in the diagram
is not generally
4.1
competitors.
Tri-Tron
(from
Lemonade
other than that of the plaintiff.
Marking
(from
for Factors
the sentences
for factors
small number similar
wording.
and sentences new cases. relevant
of patterns,
focus on a limited
Experts can relatively that pertain
training
is typically
learning
follow
a
the passages
sentences
tors over the content-words. passage as an entire context. or restriction
of statements
that deciding
information
whether
more detailed
In legal texts, moreover,
In the Mason case example,
the negation
Methods
that combine
somewhere
else, it would
suggests that a better representation
is
but they work
the
of attributes. learning
cluding
domain
saurus.
If the examples
for
the cases. instead
of the complete
computational of possibly
best with
advantages. useless features
full-text
documents, have been
large numbers
of ex-
(in particular
those
algorithms
get bogged down by large
the complexity
of the examples
marked-up
sentences
facilitates
in the form of a domain-specific
are sentences,
idea, namely
This reto a factor.
from parsing.
from
knowledge,
the
become the examples
representation) Reducing
It al-
namely
to a factor.
to classify
tearning
us to add knowledge
A similar
not be
offers
For learning
a thesaurus can be better applied
After
it
a manual process, this step
sentences
a large number
symbolic
(3) Finally,
Lemo-
(I)
examples,
that take a vector over the entire vocabulary Many
numbers allows
the negation of “order”
in effect, Though
examples
that use a more powerful
for the legal conse-
other than that of the plaintiff.”
all. if one could order the product unique. This example
a
than vec-
“It appears that one could not order a Lynchburg
nade in any establishment
as training
amples.
One needs to look at the sentence or is very important
will,
algorithm.
opinions
used successfully, however,
unit to convey
has three advantages.
or passages pertaining
sentences
the learning
algorithms
for a factor.
documents,
into smaller sentences.
to mark up the sentences referring
are not appropriate.
in the text
on full-text
sentences are the appropriate
(2) The use of marked-up
when indexing
the sentences
algorithms
adds little work for an expert indexing
set of issues and use
assigned
In fact, they often underline
factor applies in a case requires
crucial:
generalty
easily identify
to the factors
It is clear from the examples,
quences.
to a factor
process.
on smaller and more relevant
The marked-up
relevant
vectors
Units of Information
up the sentences
lows focussing
found in a few places, in the form of sentences or short passages. Moreover,
as examples,
information.
marked-up
how evidence
in SMILE
and
to break up the document
quires one manually
illustrate
above, our approach
to the induction
Focus on Smaller
is more apprpriate
Muson 13.Jack Daniel Distillery)
examples
Factors
that learn rules, rather than weighting
Rather than running
known to the
It appears that one could not order a Lynchburg
These
background
unique in tran-
In our application,
in any establishment
outlined
over the entire vocabulary,
(from Sperry Rmd v. Rothlein)
nor to any of Tri-Tron’s
Evidence
methods.
specific
I. use sentences instead of entire documents
relevant
3.2
for a more power-
is to:
v. Velto) l
rather than requiring
domain
for Finding
Based on the requirements
that Pan] Brick was a unique
(from Innovcltivc
Better
3. add knowledge
Several features
sistor manufacturing. l
has to be ap-
used bag-of-words
to include
4
v. Frey) l
than the commonly
and contain
algorithm
It has to allow
like a legal thesaurus.
instead of having to climb the tree. (from Phillips
can climb
examples.
knowledge,
It has a unique design in that it has a single pole that a hunter
l
The learning
it must be possible
has to have
has to address the
are very long, complex,
information.
of training
treated
business.
found in cases in CATO’s l
apparently
approach
to rather small numbers of instances,
thousands
This factor shows that the information
to assign indices
A successful
that legal documents
a lot of irrelevant
of a classifier. normally
to remove.
for learning
of properties.
problem
texts certain words,
are too important
Thus, an algorithm
favoring
definition
much, if at all, to the performance
in legal opinion
as stopwords,
against use by competitors.
is represented
Unique-Product.
will not contribute
from other manufacturers
the knowledge
in the relevant
contained
inthein
place.
to focus on sub-passages
(although
not
sentences) within a document, has been used before, in the SPIRE system (Daniels & Rissland 1997). The goal of SPIRE is also
of the exam-
to assist a user in indexing
ples is needed. In other domains. for instance for the classification of newswire articles, removing stopwords is sensible. These terms
cases for a case-based
tem. Cases in SPIRE are not indexed
II
by factors,
reasoning
but represented
sysas
frames.
Many of the features involve
as the amount
quantitative
information
such
about the role of words.
of money to be paid or the length of a bankruptcy
plan. For each indexing
concept,
marked passages drawn
from its case opinions.
mation to identify
SPIRE has a library
the most promising
attempted
To accomplish With its internal
that goal, SPIRE forwards to be indexed,
relevance
ates a new query
feedback
4.3
Integrating
which
the marked passages,
to the INQUERY
algorithms.
from the marked
the highest ranked
passages.
This query
often
drafting
is used
with
the new document,
and entirely
methods.
the new
not rely on background
to modify
knowledge.
SPIRE’s
simplicity
also leads to limitations. texts.
SPIRE
adversarial
or the problem favoring
of interpreting
SPIRE would
a Rule
Learning
termine whether few, highly
the
for our experiments.
It is organized
sets. Examples
sentence contains fore we think.
features.
may discuss
admission
and the evidence 5
the product,”
which
Design
5.1
Algorithm
‘unique’
and ‘product’,
a symbolic
tree-structure
construct
with
For example, algorithm
implicit
some rules to de-
Case
goods
then f15 applies.”
like lD3.
for smaller
algorithms
sized collections
based on powerful in (Bruninghaus
statistical & Ashley
which
ples (in the magnitude or so cases included
are generally
reported
This is a short passage from its squib:
no other establishment In 1982,
those described for our appliFor the 120
here, we have 2200
sentences in the squibs, but only about 50 to 100 positive
popularity
(the drink comdrinks),
the drink,
a sales representative
Mason’s
restaurant Mason
but
f15 f 161 of the dis-
and drank
[f 1 Mason disclosed
in exchange,
Lynch-
part of the recipe
claimed,
for a promise
that Mason and his band would
be used in a sales pro-
motion
having been under the
f I][ f2l Randle recalled
impression
examples
had duplicated
it could easily be duplicated.
Randle,
visited
to Randle
large numbers of exam-
Fl5.
and F21, Knew-
prised about one third of the sales of alcoholic
algo-
has the fac-
F6, Security-Measures,
[f 15 f16 Despite its extreme a
up the squibs of
F 16, Info-Reverse-Engineerable,
Info-Confidential.
better suited
to work reliably.
here, we marked
Disclosure-in-Negotiations,
burg Lemonade.
in particular
reported
product
of SMILE
is the Mason case 2, which
tors Fl,
models,
in the experiment
merchandise
Mark-Ups
tillery,
of thousands)
secret undisclosed
revelation
and Implementation
like ours. As noted above, algorithms
models require
hidden
material
discovery
cases. An example
There-
generates
rule induction
1997) are less appropriate
cation, since the underlying
as sets of synonyms,
to trade secret law are:
inventory
disclosure
experts claimed
rule-learning
For-
one and six of about 20,000
disguised
CATO’s
rithm, is most promising. Symbolic
relevant
concealed
Unique-Product,
one such rule is: “if the
rules. or another
to between
queries.
a copy of the WestLaw
stock supplies
a factor applies to a sentence. The rules use only a
relevant
some form of a the-
this is an important
offered
We have found that one can manually
maintain
to employ
commodity
of le-
Given
what we intend to overcome.
Using
is by
to detect synonyms. like WestGroup
For the experiments 4.2
that “covenant”
can be approached
unrevealed
not be able to deal with an
like “No other manufacturer
infer
most other
variety
a judge
of a factor
that it does not apply,
For instance,
is exactly
from
negation.
where
the application
a conclusion
limitation. example
legal documents
it cannot
when
by itself can not cope
and use it for purposes of expanding
clandestine
why factors are hard
1985)
in part by the fact
algorithm
This limitation
where each word belongs
it
Statski
for terms used in
motivated
we have the opportunity
Thesaurus synonym
of the approach,
before
nature of legal discourse,
the evidence favoring
is a strength
does not address the wide expressive
gal language
retrieval and does
sequences of words.
We discussed
to lind. and what distinguishes
tunately.
it also does not rely on full
sentences, but rather on fixed-length Although
information
a thesaurus
saurus internally
is that SPIRE is based
these methods,
For instance,
1992;
not to repeat themselves
A learning
word for “contract.”
Legal publishers
us to pursue the idea to work with
domain-independent
It does not attempt
Thesaurus
definitions
This is probably
synonymity.
applying
on existing
approaches
like a thesaurus.
(Burton
and sometimes
their decisions.
another
case. from SMILE
use legal thesauri
list synonyms
that judges appear to make an effort
gener-
It presents these passages to the user to assist in indexing
sentences. The main difference
knowledge.
an Application-Specific
legal documents.
system.
INQUERY
passages within
SPIRE has in part motivated
none of these previous
passages in a new document Attorneys
as well as the document
Also,
background
of manually
It uses this infor-
to be indexed.
to retrieve
to include
that Mason’s
recipe was a “secret formula.”
f2l]
for each factor. Decision
tree learning
algorithms
sify texts, e.g., newswire ier, Raskinis, decision
tree technique
it has been sufficient factors,
more powerful
the patterns
articles
& Ganascia
contain
1996).
(Lewis
& Riguette
to use only a few, obvious patterns are necessary. stopwords
fxx]
1994; Moulin-
For these applications,
has been quite successful.
certain
For our classification
have been used before to clas-
words.
approach,
instances
instances.
the sentences bracketed
for the factor
* Muson v. Juck Dunids
below,
information
12
Dktillep’.
[fxx
.
xx; all other sentences are
A factor can be considered
at least one of the sentences is a positive
there,
For finding
As described
and linguistic
negative
the basic
However,
are positive
to apply to a case if
example
of the factor.
5 18 So.2d I30 (Ala.Civ.App.
1987)
+-
I3
+
Positive
Figure 1: Decision tree for simple sentences 5.2
Decision
Figure 2: Decision tree learned without knowledge from a thesaurus
Tree Induction
These sentencesare then used as training examples for a machine learning algorithm. Ideally, judges would describe the facts of a casein simple and straight forward phrases. They would always use the same words or expressions, never use negation, etc. The the positive (+) and negative (-) examples given to a classifier might look like this: Figure 3: Decision tree learned when knowledge from a thesaurus is added to examples
+ The product was unique. - His product was identical to plaintiff’s, - The recipe was always locked away in a unique safe.
inference. With the help of a thesaurus, however, it is possible to induce a better tree. There are two ways to include the information of a thesaurus. First, we can use a thesaurusto discover synonyms while inducing a decision tree, without the need to change the representationof the examples. Instead of learning a tree, where the nodes are words, we can learn a tree where the nodes are categories from the thesaurus. The relevant category in the WestLaw Thesaurus for this example is:
.- Plaintiff employed unique security measures. If the sentenceswere as simple as in this example, applying a decision tree algorithm would work perfectly. In inducing the tree, an algorithm like ID3 recursively selects the attribute that best discriminates between positive and negative examples and splits up the training set, according to whether this attribute applies. The processis repeateduntil it has a set of only positive or negative examples. Here, ID3 would first split up the examples into those that have the word “product”, and those that don’t. It would then try to find a way to distinguish between the first and the secondexample, and select the word “unique”. The corresponding decision tree is shown in Figure 1. Of course,judges do not restrict their factual descriptions in this way. In the next section, we discuss how adding knowledge from a legal thesaurusand adding linguistic knowledge may help. 5.3
Adding
0 agreementcontract covenant promise testament If we modify the learning algorithm accordingly, ID3 will choosethis category to discriminate perfectly between positive and negative examples, shown in Figure 3. This tree will also correctly classify examples which use the term agreementinstead of contract or covenant. The main benefit of having a simpler decision tree can not be shown in a simple example. A well-known problem with ID3 is its tendency to overfit the data. In particular with many attributes available, the algorithm learns overly specific trees, which often misclassify new examples. Using the thesaurus in a way suggested in Figure 3 counteracts the overfitting, and thereby can lead to better performance. A second way to include a thesaurusis to change the representation of the examples, and expand them in advance by adding all possible synonyms before the learning algorithm is applied. Our positive examples would then appear like:
a Thesaurus
To illustrate our intuitions about adding a thesaurus, let’s assume, we have positive (+) and negative (-) examples of someconcept: + He signed the contract. + He signed the covenant. - He signed the postcard. - He signed the book. Half of the examples is positive, half negative. No single term can discriminate between positive and negative examples. A decision tree algorithm would createa tree as in Figure 2, branching out too much. The knowledge to recognize that covenant and contract are synonyms is missing, and there is no reliable way to make that
+ He signed the contract + agreementcovenant promise testament. + He signed the covenant + agreementcontract promise testament.
13
Now, a tree can be easily by choosing
learned for the example
either the term covenant
tween positive
or contract
and negative examples.
sentences,
to distinguish
Similarly,
be-
This also results in a simpler
categories
in the WestLaw
sentences contain tial”,
which
tive link (A) between
helps to generalize
similar
Thesaurus.
Consider
are both indicative
of factor
they are not in the same synonym
expanded
to include
synonyms
by the algorithm.
the term “secret”,
possible
meanings.
One will
categories
6
liminary
artery freeway
has the distinct multiple
have to add the synonyms
I
highway
interstate
parkway
may
then be a positive
Design for Integrating
relation
example
about
for both “highway”
grate a parser into the learning
linguistic
and
information negation,
is to inte-
evaluated
here does not include
sentation,
this seems the best place to discuss the anticipated
a parser in our program.
information
who made hunter stands.
way to discriminate
between
ples is to include
the fact that in the positive
turer”
by “no other”
and “hunter
use of
stand” most likely therefore
ing CMU’s
l
Fl5, Unique-Product
l
Fl6, Info-Reverse-Engineerable
l
Fi 8, Identical-Products,
l
F21. Knew-Info-Confidential.
overly
the sentence,
spe-
e.g.
//~~~.link.cs.cmu.edu.)
a range of difficulty
for the learning
As discussed above, courts employ
By contrast,
riety of different
fact situations
from which
would
words’
object
and verb of the sentence
at-
man to discover,
and asserting
standing only
us-
Probably would
its presence requires
where
To simplify
this factor
the problem
applies,
further,
short summaries,
about one page of text.
part-of-speech
is tagged, and, most interesting
at hand, the combination (Ds), and as an idiomatic
no-other
cantly
can be derived.
as training
is labeled as determiner
string stored in the dictionary
marked-up
for the task
examples,
bogged down
of a noun (IDC).
14
factors
than finding
We adopted this simplification,
the
to avoid
computationally,
undercontains
so it is not a good
examples. Their
squibs, rather The squibs are
primary
The drafters
in mind in preparing
Thus, finding
easier problem
more abstract
language
we used CATO’s
opinions
squarely
for a hu-
of a new method.
than the full-text
factors
natural
Also, the Case Database
to show the applicability
than F6.
is even difficult
only very advanced
be appropriate.
five examples
candidate
The
are identified.
that
some factors which
be even harder to learn from examples Agreement-not-specific,
of the case facts. The subject,
a prod-
a very wide va-
it may be inferred
to note that we have omitted
F5. for instance, inferences.
above would be:
information
We
a small
in discussing
the squibs identify
is to restate the case facts succinctly.
various
that they
algorithm.
to be found much more easily than
uct’s uniqueness.
had CATO’s From this parser output,
factors:
and
F15, Unique-Product,
we believe
(The terms “filled
from the available
instance in the example
provide
It is important
occur in only one case,
be deleted
Link Parser. (See http:
output for the positive
by in-
on this as
F6 applies.
the two exam-
To avoid learning
by parsing
F6, Security-Measures
F6, Security-Measures.
tributes. can be found
l
expected
sentence “manufac-
as an attribute.
be pruned away.)
cific trees, they would The attribute
working
we use a subset of CATO’s
set of patterns and some standard phraseology
- He was a manufacturer
will
for example,
Setup and Assumptions
Fl, Disclosure-In-Negotiations
would
Assume we have the examples: made filled chocolate.
and therefore
be any benefit
We have selected these factors because we anticipated
in the repre-
+ No other manufacturer
chocolate”
can improved, We are currently
l
about the
system. While the version of SMILE linguistic
Experimental
Information
of words in sentences and to represent
is modified
there would
For the time being, we did not investigate
knowledge.
For this experiment,
Linguistic
way to include
The only reliable
and whether
algorithm.
instead of full-
the next phase of our experimentation.
which is undesirable.
The most promising
linguistic
tree induction
using sentences
.. .
when used in a text about document
5.4
work,
a pre-
we tested a pro-
apply:
The term artery, “vein,”
would
we conducted
environment,
of the ID3 decision
in what ways the representation cluding
with the word artery,
vein venule vessel
would
In a simplified
from adding a thesaurus.
for all
and wrong inferences
For sentence
text opinions
dis-
meanings
0 artery capillary
traffic
experiment.
totype implementation
as
6.1 l
by an adjec-
Experiment
To find out how well some of our ideas work,
and thus be recognized
Overgeneralizations
is indicated
in
tree.
and “chocolate”.
Our goal was to find out whether
occur and affect performance. the following
“filled”
chocolate”
a decision
and “confiden-
set, they would both be
that it does not deal with polysemy,
and uses for one word.
cases where two
F6, Security-Measures.
On the other hand, this second approach advantage
and used in learning
further than using the
words, like “undisclosed”
Although
about phrases, or the role of attributes
can be derived
For instance, the phrase “tilled
tree, and thereby can help to avoid overfitting. This second approach
information
the sentence,
of the squibs the summaries
in the squibs is a signifi-
factors in the full-text
however, having
function
opinions.
to get a set of consistently the learning
and to satisfy
algorithm
our curiosity
get as to
!
,a
115
Precision
48.70%
Recall
71.42%.
I
11
116
116
32.14%/
30.00%
60.55%
16
44.44%
58.97%
50.00%,
60.00%
81.69%
54.54%
.. 63.96%
.. lmpravements with Thesaurus bySsntence,s
Table I : Precision and recall for finding factors in cases whether the method would work with the squibs before we undertook scaling up to the more complex opinions. We believe that the effects observed on squibs will also apply to opinions, but that experiment remains to be carried out. 6.2
R1
LB
$18
Implementation Figure 4: Relative performance improvement by adding a thesaurus
For the experiment, we implemented the basic ID3 algorithm (Mitchell 1997; Quinlan 1993) without any modifications. and without methods to prune the learned trees. To generatethe examples for the experiment, we split the squibs into sentences. Depending on their markup (see Section 5. I), they were labeled as positive or negative examples. The casesare treated as binary attribute vectors, to simplify the representation. We only use the words that occur in the positive examples as attributes, which significantly decreasescomplexity. For this experiment, we also removed stopwords, and performed only very minor morphological corrections, like removing the plural-s suffix. In short, we have used a “bag-of-words” representation in this preliminary experiment to test the value of adding knowledge from a legal thesaurus. As described above, we plan to improve upon this representation in subsequentwork. The experiments were run as a 5-fold cross validation, as suggested, for instance, in (Quinlan 1993: Mitchell 1997). In five rounds, we left out one fifth of the examples as test data, and used the rest as training examples. This way, each sentence from the squibs was used exactly once in testing. In order to maintain a uniform class distribution, we performed the random partitioning of positive and negative examples seperately. 6.3
II
text opinions as our examples in these experiments. However, in most casesthe sentencesmarked as positive instances for a factor in the squibs are very similar to the corresponding sentencesin the full-text opinions, In fact, in some cases they were copied verbatim from the full-text opinions. In the much more compact squibs, we have fewer negative examples per document, but the complexity and length of the positive examples does not differ dramatically. This certainly increases the likelihood that using marked-up sentences will also be appropriate for the full opinions. One may notice that, even though these results are positive, there is still room for improvement. This can be expected, since in the experiments, we did not include any linguistic information. In Sections 3.2 and 5.4, we discussedwhy in particular negation is needed for finding factors in legal texts, and how we are planning to integrate the necessarylinguistic knowledge. Even more interesting is the next result of our experiment. It shows that adding a thesaurushelps when classifying sentencesunder factors, but that the usefulness dependson the factor. In Figure 4, we show the relative improvement of using the thesaurus both while (solid color) and before (striped) learning when compared to the plain decision tree algorithm. We calculated the difference in precision between the learner that used the thesaurus and the one that did not, and divided by the precision for the plain learning algorithm. The result is the relative change in precision, and allows us to compare the effects across factors. We did the samefor recall, and for both ways of integrating the thesaurus. The graph indicates that for factors Fl5, Unique-Product, and F21, Knew-Info-Confidential, adding the thesaurus clearly improves performance. This confirms our intuitions, which we illustrated in Section 4. For Fl, Knew-Info-Confidential, adding the thesaurus while learning is useful, while adding it before learning is without effect. For F16, Info-Reverse-Engineerable, and Fl8, Identical-Products, adding the thesaurus increases precision, and decreasesrecall. It seemsthat the thesaurus makes the algorithm more “conservative” in discovering positive instances. The thesaurus is also not useful for factor F6, Security-Measures, which could be expected. In a commercial context, there is a wide variety of measuresto keep information secret. They are often very practical matters not related to legal concepts. Therefore, a thesaurus of
Results
We were interested in the effect of two techniques, namely using marked up sentencesinstead of the full documents to discover the factors, and adding a domain specific thesaurus. The results of the experiments suggest both are beneficial. First, using a rule-learning algorithm for marked-up sentences as examples seemsto be a good approach to tackle the problem. It reduces complexity, but still. the individual sentences contain enough information to be useful as examples for the factors. In our previous experiments where we attemptedto learn factors from full-text opinions, the statistical learning methods could only learn the goal concepts to a very limited degree. Most of the time the classifiers could not discover the positive instances, which lead to low precision and recall. Here, however, the decision tree algorithm could achieve precision and recall of up to 80% for finding which factor applies to a case.as shown in Table I. Part of this is certainly due to the fact that we used only the case summaries, and not the much longer and more complex full-
IS
legal terms has little effect.
7
References Aleven,
Conclusion
In this paper we presented our work toward legal documents. concepts
Our ultimate
goal is to identify
reasoning
cases as examples,
systems.
we apply
Using
machine
ever, methods that were successfully directly
automatically
applicable,
and complex.
a collection
learning
Moreover,
the representation enough
techniques.
normally
Branting,
Assigning
these problems,
we suggested
representation.
Craven,
M.; Freitag,
our ideas on adding domain
of a legal thesaurus
and linguistic
knowl-
and Slattery,
information
we tested whether
and whether
tion to be used as examples decision
algorithm.
algorithm
could
single sentences under the factors, before. work,
interesting
would
ations are fairly
leads to better precision of real-world
and recall.
situations,
tic information
knowledge information
improve-
the intuitions
dis-
fact situ-
Moens,
to better
we think that us-
l?
Legal lnferencing
M. 1994. A comparison
for text categorization.
of two learning
In Proceedings of the Third An-
T. 1997. Muchine Learning. M.-F.;
Uyttendale,
MC Graw Hill.
C.; and Dumotier,
J. 1997. Abstracting
of Legal Cases: The SALOMON Experience,
In Proceedings of the
Sixth International (ICAIL-97).
Intelligence and Law
Moulinier,
adding domain
and adding
our goal of indexing
S.; and Chung,
AustLll’s
Intelligence and Law (ICAIL-97).
D., and Riguette,
Mitchell,
knowledge
does not help.
algorithm,
G.; Cant,
nual Symposium on Document Analysis and Information Retrieval (SDAIR-94).
not observe
In this way, we intend
for a learning
help toward
Lewis,
algorithms
a parser, and add linguis-
in the form of a legal thesaurus, will
Conference on Artificial
was whether
we could
used in legal texts. Overall,
A.; King,
opinions.
the additional
adding a thesaurus
G.; Mowbray,
via the World Wide Web. In Proceedings of the Sixth International
If a factor covers a wide variety
be to integrate
to the examples.
ing sentences as examples
K.;
Knowledge
J., and Rissland,
1997. More Than Wyshful Thinking:
individual
where the underlying
and uniform,
Our next steps will capture the language
Although
the results suppport
For factors,
concrete
T.; Nigam,
Symbolic
Using cases to seed information
Greenleaf,
and not the
test whether
for full-text
to the
reasons, we ran
lead to the performance
ments over the plain algorithms. better results for all factors,
find fac-
comparable
part of our experiments
adding a legal thesaurus
cussed in the paper.
We found that
from CATO,
we will
sentences also make better examples most
A.; Mitchell,
to Extract
E. 1997. What you saw is what you retrieval. In Proceedings of the Second International Conference on Case-Based Reasoning (ICCBR-97). Daniels,
want:
learn quite well
For practical
on short case summaries In future
using single
and thereby
This result is not directly
we have reported
opinions.
Learning
from the World Wide Web. In Proceedings of the Fifteenth National
they contain enough informa-
for a learning
tree induction
tors in case squibs.
D.; McCallum.
S. 1998.
Macmillan.
Conference on ArtiJicial Intelligence (AAAI-98).
experiment.
sentences is appropriate
for
Conference on Case-Based Reasoning (ICCBR-97).
and enables us to use a better and more powerful
In a preliminary
The
Learning
Cases. In Proceedings of the Second
on indi-
from parsing the sentences.
full-text
K. 1997. Using Machine
to Textual
W. 1992. Legal Thesaurus. Simon & Schuster
edge in the form
the experiments
Studies 34(6).
Burton,
We illustrated
experiments
from rules and structured
This reduces the complexity
vidual sentences related to the factors. of the examples,
focusing
explanations
Journal on Man-Machine
S., and Ashley, Indices
International
To overcome
to classify
Intelligence und Law
L. 1991. Building
Briininghaus,
used in
case opinions.
a simple
Conference on Artificial
cases. International
long
used for sim-
the language
Environ-
In Proceedings of the
Howare not
are exceptionally
to capture
a Learning
Skills.
Aleven. V. 1997. Teaching Case-Based Argumentation through u Model and Examples. Ph.D. Dissertation, University of Pittsburgh.
of indexed
used in other domains
because legal documents
pler texts is not powerful
indexing
and maintainance
K. 1997. Evaluating
Argumentation
Sixth fnternational (ICAIL-97).
indexing
certain
in case texts to help the construction
of case-based
V., and Ashley,
ment for Case-Based
rization:
Conference on Art$cial
1.; Raskinis, A symbolic
G.; and Ganascia, approach.
J. 1996.
Text catego-
In Proceedings of the Fifth An-
nual Symposium on Document Analysis and Information Retrieval (SDAIR-96).
linguistic
legal texts auto-
matically.
Quinlan,
R. 1993. C4.5: Programs for Machine Learning.
Morgan
Kaufman,
Acknowledgements Rissland. This research has been supported
by the National
Science Founda-
Through
accessible
Peter Jackson for making the WestLaw
D.: and Friedman,
Indexing
of the Thirteenth international ligence (IJCAI-93).
tion, under Grant IR196- 19713. We would like to thank West Group and in particular
E.; Skalak, Multiple
Thesaurus
to us.
Sahami,
M.; Craven,
and Heuristic
Case Retrieval
Search.
In Proceedings
Joint Conference on ArtiJicial Intel-
M.: Joachims,
T.; and McCallum,
1998. Learning for Text Categorizations, Workshop. AAAI Press.
16
T. 1993.
A., eds.
Papers from the AAAI-98
Schweighofer.
E.; Winiwarter.
W.; and Merkl.
tion Filtering:
The Computation
of Similarities
D. 1995. Informain Large Corpora
of
Proceedings qf the Fifth International Conference on Artifciul lntrlligerzce mu/ LOW(ICAIL-95). Legal Texts.
In
Smith, J.; Gelbart, Shinehoft,
D.; McCrimmon,
M.; and Quintana.
Legal Discourse:
The Flexlaw
K.; Athertin,
L. 1995. Artificial
B.; Ma&lean, Intelligence
Legal Text Management
J.: and
System.
Artificiul Intelligence and Law 2( I ). Statski,
W.
Publishing.
1985. West’sLegal 7’hesmrus and Dictionary. West