Canada. N2L. 3G1. Systems. Current hypertext systems have no intelligent ...... Apple. Computer. Corporation. 1988. Hypercard. Script. Language. Guide:.
SIGDOC
Graph-Based
Retrieval Yuri
of Information
Quintana,
Mohamed
Department
of
Waterloo,
Current
hypertext
systems
to browsing),
them
farther
text
Retrieval
away
from
the information
System)
is comprised
means
users can get disoriented that
allows
of an authoring
they
users to retrieve
system,
a browser,
Waterloo
specific
in large hypertext
seek. This
paper
information
Lo
Engineering
Canada
for finding
Systems
Andrew
Design of
Ontario,
have no intelligent
Kamel,
Systems
University
(as opposed
in Hypertext
92
N2L
3G1
information.
documents
describes
an information
in hypertext
and a graph-based
When
searching
for specific
and may end up following retrieval
documents
information
based
retrieval
system
called
on its semantic
facility.
The
information
a path
that
takefs
HRS (Hypercontent.
graph-baaed
HR.S
retrieval
facility
allows users to retrieve specific information in hypertext documents by posing English language queries. The retrieval Graphs, a knowledge representation scheme. The English language queries posed is based on the use of Conceptual by users are automatically converted to Conceptual Graphs by a parser. The information in hypertext documents is also
facility
represented
using
by a semantic
Conceptual
baaed
Graphs.
search.
needs to find specific document. The paper
This
information concludes
both the browsing capabilities
1
Query
technology
processing is useful
and does not bow that natural language
of hypertext
is treated for retrieval
Hypertext
[1].
that
and the semantic search capabilities
formation
that
(GUI)
will
A user may view
HyperCardR
allow
that
users
Sys-
of nodes
and retrieval
knowledge
of natural
is performed
domains
language query processing.
to display
a hypertext or just
document
document
in hypertext
to address
such as Keyword
this
[15, 1], and Document ods requires
the in-
the hypertext
document.
in order
to
is usually
non-
documents
is
highly interconnected into a web-like pattern, and links allow users to selectively choose the order of nodes to be viewed.
by providing
tools
[12, 28], Recency
Markers
user to either
the
document
links
problem
Searching
the hypertext
to ezplore the contents
[22]. Browsing
since the information
have tried
the
connect
nodes and select hypertext nodes.
information
of a hypertext
links
its way
graphical
have
is comprised
and
systems
that
as Apple’s
document
in hypertext display other
find qwcijic linear
such
information
Hypertext
14, 25, 38] has made
applications
A hypertext
contain
nodes.
[10,
computer
user interfaces tem
in large
Current hypertext systems have no intelligent mearus of finding speci$c information. Many hypertext system:~
technology
many
matchingprocess,
where a user the organisation of the hypertext document or the words used in the retrieval of information in hypertext documents can provide users with
Introduction
into
as a graph of information
History
[4]. Each of these methknow
the words used in
or to know the organization
of
Keyword Search methods are only useful if the user has knowledge of the words used in the document. Evidence of this can be found in the evaluation of Superbook [12], a hypertext system with a rich indez keyword search facility.
Statistics
and asked to find pertext document
students
were given questions
answers to these questions in a hyon statistics. When questions con-
Hypertext has been shown to be a useful way to [15]. However, when searching for browse information
tained words that did not occur in the neighborhood of the target information, users took longer and tended to give up searching without finding the desired information more often than users with conventional docu-
specific
mentation.
information,
users can get disoriented
and may
is in relation to other nodes in the document. Some studies [12, 24] have shown that users may end up fol-
Recency History methods display a recent history of nodes visited. Sometimes recent nodes visited are displayed as a map [15]. Document Markers (sometimes
lowing
called
no longer have any indication
a path
that
of where the current
takes them
farther
node
away from
the
information they seek. Users can become frustrated and confused in the web of information. This situation has been called hyper-chdter [10] and it is especially prevalent when using large hypertext documents.
from
@
1992 ACM
089791 -533-W92/0010/0157
$1.50
anywhere
) can also be used to mark
users can quickly in the hypertext
History and Document Markers user is searching for information viously viewed information.
P~mcqytimtfea
mFdti&dkm+tibthe copies at. net reads or dkibutcd for dixect -acid advantsgs, rhe ACM wpyisht nodcx and the tide oftlu pMication md b date.w==. ~d n~= iS @UI tit COPY%~ by -on of the Aswcialim fa GxqnUing kfschmy. To COQyothsrwi% or to Rpubhi-1, Xequilu afss Snd/or SpeCmcprmisim.
Bookmarks[4]
cific node so that
an
a spe-
go to a marked node document.
Recency
are useful only if the that is similar to rwe-
of In this paper, we report on the development intelligent graph-based approach for representing
and retrieving
specijic
information
in hypertext
docu-
157
S/GDOC
92
ments. HRS (Hypertext Retrieval System) uses concep tual graphs [29] to represent the information in hyper-
The display of a hypertext document in the version of HRS is shown in FigMicrosoft’ Wlndows=w
text documents. A user can ask a natural language question regarding a topic and the system tinds the nodes in the hypertext document that answer the question.
a CO1OUI monitor. When the cursor is moved to an area with a link, the cursor changes from an arrow to a
HRS consists ●
A
of the following
Browsing
lows
Systern.
The browsing
users to display
follow
hypertext
components:
a hypertext
links
system
document
in a non-linear
aland
ure 1.
Links
to other
to pose queries
on documents. Keyword
The Keyword ●
A Keyword keywords
●
A
Graph-Based
cility. tic
This
based
retrieval Each
glish
which
language
node
as a graph
trieved,
in
the
Graphs. matching
a seman-
hypertext
document
in
by users
Query
Conceptual base.
Once
is treated
a node
a user can browse the node, follow
pose additional
is re-
links, or
queries.
HRS is an improvement tems because it providea ing capabilities
En-
are converted
processing
process.
Fa-
in a hypertext
in a knowledge posed
over existing hypertext sysusers with both the brows-
of hypertext
and the semantic
search
capabilities of natural language queries. The paper is organized as follows. Section 2 describes the user interface and gives examples of browsing and searching in HRS. Examples illustrate the problems encountered by users when searching for specific information by following hypertext links or using keyword search. Section 3 describes the graph-bssed retrieval facility in HRS that is based on conceptual graphs. Section 4 describes the implementation of HRS in Microsoft Windows and Prolog. Section 5 gives a comparison other knowledge retrieval systems. conclusions system.
and future
extensions
enter keywords ument.
performs
representation
is stored
searches for
Retrieval
information
queries
to Conceptual
This
document.
facility
of the
a corresponding
Graphs
Facility.
Information
search
document. has
Search
in the hypertext
between HRS and Section 6 includes to the current
HRS
are displayed
in red on
‘hand” icon. A single click on the mouse button will display a new node, namely the node that is linked to the highlighted text area. The Search command allows users are supported:
sequence.
nodes
Two
Search facility that
The Query
retrieval
methods
and Knowledge-Based
are to be searched Result
Search.
in HRS allows
dialogue
users to
for in the doc-
box lists the nodes
that contain one or more of the keywords that occur in the query. This dialogue box allows users to select a new node for display. Users often have problems finding information with keyword searches. For example, consider a user who is trying to find the answer to the question ‘How do I read a line from
a jile
?“ in the
One
is the
description
answer
fgets.
However,
fgets function stream”.
of the
the hypertext states
The
C programming
node that
“fgets gets a string
keywords
from
language.
C library
function
describes from
the question
the
an input would
not
match the words in the hypertext node that describes fgets. The word read corresponds to get, the word line corresponds to the word st~”ng, and the word jile corresponds to the words input stream. However, this correspondence
applies
in this situation queries.
and may not necillustrates
that
essarily
be true for other
keyword iar with
searching is only useful when the user is familthe worda in the hypertext document.
This
The graph-based information retrieval facility allows users to enter a natural language query. The information in hypertext nodes is represented in a knowledge base and the knowledge and compared
with
base is semantically
the query.
The words
searched used in the
query need not be the same as the words used in the nodes that are returned. The processing of queries is based on a semantic-based edge representation [29].
The
approach
formalism
semantic-based
that
uses a knowl-
called conceptual
approach
graphs
to information
re-
trieval in hypertext domains is a more accurate and precise retrieval method than existing keyword meth2
The
HRS
System
This section discusses how users interact with HRS to browse a hypertext document and retrieve information. HRS is currently running on three operating system platforms:
PC
DOS’~,
MicrosoftR
Windowsrx
3.0,
and Unix=~ [3]. The Microsoft Windows version allows users to display nodes and select links using the mouse. Dialogue boxes are used to allow users to pose natural language questions for information retrieval. The Unix and DOS versions of HRS input to select links.
158
currently
use line-oriented
ods. HRS is designed with a layered architecture. user interface is separated from the application by an Applications Programming Interface (API)
The code layer.
This allows the application code, which includes the search facilities and database management system, to be moved other computer platforms and new interfaces can then be integrated. Figure 3 shows the HRS architecture. Subsequent sections of this paper describe the design and implementation facility in HRS.
of the graph-based
retrieval
SIGDC)C
92
a
.EiIe
~ome
search
Qoto
Help
C Library
INDEX TOPIC
DESCRIPTION
ftLLOCftT 10N CHIIIWCTER CONUERS 10N DNE DEU I CE DI RECTORY Dos FILE
tteamry Relocation Character ttanagement conversion Time and Date
Device
Dos File Hanagenent Heap Hanagenent Interrupt Handling Ilath & Trig Functions tkmory Ilanagement Miscellaneous Ilultibyte Characters ~~:ng System 1/0
HE(iP INTERRUPT ?tftTH HEflORY ruse ttULTIBYTE OPEIMTING PROCESS STREM STRING Other
areas
1/0
Directory
stream
1/0
String
ttanagemant
of
Help:
C_Language_Syntax
C_HELP_WIN_PtWEL
WtTCOFt_C
b
Figure
1: Browsing
a
a Hypertext
Document
WA
\hrs\c.hlp
~le
~ome
search
~oto
Help
INDEX
C Library
TOPIC ftLLOC9TION CIMRRCTER CONVERSION DflTE DEU I CE DIRECTORY
Itenory Charac Convers Time an Deu ice Directo]
Dos FILE HEfIP
HULTIBYTE OPERMING
PROCESS STR131H STRING areas
What
a natural
of
Help:
language
question
such
as...
is . . . ?
Howmany...?
H?e ttal Howdolreadal~ne Heap Intarru tlath & ?lemor ttisce f: 1 Ilult ibyt Operat ir l-q Process Straam String Ra nagenbcvw
INTERRUPT HftTH ttEHORY ttISC
Dther
Enter
+
fromafile?
C_Language_Syntax
& ●
-I
WITCOtl_C
CJtELPJ’lRINflfNWL
7 I r.L~
I*
!
Flgure2:
Natural
Language
Query
Facility
159
SIGDOC’92
Application
Graph Based Knowledge
Programming Interface Natural Language Query
*
System
El
H Keyword
❑ l-n n A
Retrieval
Retrieval
System
Keyword
.
.
Search
-
Dkplay Node
End
\ Hypertext
Database Management
n c1
System
Hypertext Oocument
n n
user
Open Hypertext Document
Node Index
close
Hypertext Document
Figure
3
Graph
Based
Knowledge
3: The HRS System
Re-
trieval to improve
One approach is to provide language in
a natural
interface
a natural
language
the retrieval
language
allows
users
of information
interface. to
such as English
to specijy how to tlnd the information.
information
without
having
Semantic
search
users’ requests
with
To date, most of the existing natural language interf&ccs for retrieving information have focused on interfaces to database systems [7, 11]. These systems provide the novice user with a means of accessing a database in a natural cryptic
language without requiring the user to learn the commands generally associated with database
query languages.
Examples
have been considered
of application
include
airline
domains
fight
that
information
[36], statistics of lunar rocks [37], airplane maintenance data [31], and naval ship data [16]. However, these syatema provide data retrieval as opposed to knowledge retrieval. Few natural language interfaces oped to retrieve information from [13, 18,27, procedures,
160
35]. Knowledge objects,
have been knowledge
can be description
or general
facts
The representation of knowledgel is therefore more complex than of data in databases. In [19] we introduced a method for knowledge retrieval based on matching conceptual tneval).
A natural
request
methods can then be used to match the information they seek.
:=:
“
H=
about
develbases
of events, the world.
grapk,
GKR
Integrating GKR to retrieve
(Graph-based
as a component
information
from
Knowledge
R.e-
of HRS allows users
hypertext
documents
us-
ing natural language queries. A knowledge base ia used to represent the information in a hypertext document. GKR interprets natural language queries using a semantic interpreter and performs a semantic based search of the knowledge base using a graph matching process. The following sections describe the GKR facility artd gives examples of knowledge retrieval from a hypertext document
on the C programming
Creating
3.1
The semantic
that
represents
Conceptual
interpreter
icon, and a join guage sentence,
language.
is comprised
Graphs of a parser,
a lex-
algorithm. The input is a natural lanand the output is a conceptual graph the meaning
of the sentence.
The parser grammatically analyses tences, both declarative and interrogative, lFor a survey see [5, 6].
English senand produces
SIGDCIC
a syntactic parse structure for the sentence. The parser has the abfity to parse the sentence into one or
be parsed using the DCG rules shown in Figure 4. The parae tree is shown in Figure 5. The nodes of the tree
more syntactic
are syntactic
parse structures.
tures may reflect
different
Different
interpretationa
parse strucof a sentence.
structures
of the sentence.
the tree are the actual
consider the simple sentence: “John seea The two possible parses would Bti with a telescope”. show that, in one case, John has the telescope and seea
natural
language
92
The leaves of words.
For example,
Bill;
in the other,
Bill
has the telescope
sent ence / noun-phrase
and is seen by
John.
/
/
GKR allows a programmer to deline a Dejinite Clawe Grammar (DCG) similar to that of [26]. These clauses specify the syntactic structure of the English sentences these to be analyzed. A DCG converter transforms DCG rules into Prolog [2] rules that are then executed by the parser. A simple example of DCG rules and their corresponding
sent ence noun.phras
Prolog
-->
rules is given in Figure
noun-phrase
e -->
noun-phrase
--> * You ~ .
noun
-->
‘files
~non_hidden
verb_phrase
-->
verb,
verb-aur, __> )c~).
verb_inf
-->
~erase]
verb_obj
-->
noun_phrase.
noun-phrase(Sl
noun( noun(
verb(
:- adjective(Sl noun(S2, S3) .
,S3)
S1 ,S3
)
:-
verb-aux(
[canl
verb-inf(
[erase
:-
noun(Sl
,S2),
canonical
verb_obj
(S1 ,S2)
Figure4:
lSl], :-
DCGRules
5: A Parse Tree Structure
_
associated with a word (such as the noun), a word also has a conceptual
,S2).
1S11
:-
,S2)
,
graph.
of things
are represented
(S2 ,S3) . S1, S2) , S2, S3 ).
Sl).
by denoting
converted
(Si,
that
are perceived,
and 2) relatiorm
S2).
into Prolog
graphically
by rectangles,
The noconcepts
relations
by
circles, and arrows are used to ahow the direction of relations. Individuals are represented as particular instances of a concept and appear to the right of the COIlcept name. Conceptual graphs can also be represented
Sl). noun-phrase
the word jile can
relationships that can exist between concepts. tation is a connected graph notation, where
, S2 ).
verb(Sl
For example,
The conceptual graph notation consists of two basic typea of symbok, 1) concepts: peoples underlying interpretations
verb-object verb-aux( verb_fin(
Sl],
\ files
mean a collection of information stored on a computer, or a metal tool found in machine shops. Each meaning is referred to as a word sense. A conceptual graph that defines the meaning of a word sense is referred to as a
noun-phrase(Sl ,S2) , verb-phrase (S2, S3 ) .
( Ih’kon-hidden
verb_phrase(Sl
/ non-hidden
erase
can
graph for each meaning.
[Youl Sll ,S1). [files 1S11 ,S1) .
adjective
/
graph that defines the meaning of the word. Words with multiple meanings or usages will have a conceptual
.
S2)
/
/
matical information part of speech,eg.
j ect.
verb_inf.
,S3)
noun_phrase(Sl,
J.
verb_ob
verb_aur
:-
/ // You
I noun_phrase / \ adjective noun
verb.inf /
Figure
~. -->
,S3)
verb.aur /
GKR uses a lexicon to convert parse trees into con~ceptual graphs. The lexicon is a dictionary that contains the words used by the parser. In addition to the gram~-
adjective
sentence(Sl
/
verb.obj
/\
noun.
noun.
-->
-->
4.
/ /
\
verb
noun
, verb_phrase.
adjective,
noun
verb
\ verb-phrase
concepts
between
square brackets
and rela-
tions between round brackets. To clarify this, Figure 6 shows the canonical Rules
graphs
for the word jile. The first canonical graph is for the word sense of tile as in a metal tool found in machine shops represented
by the concept
[tile.tool].
The second
The parser generates only parse trees that follow the patterns defined in the DCG grammar. However, this grammar may not have all the possible patterns for the entire English language, and furthermore, the gram-
is for the word sense of a computer
tile used to organixe
and store data, represented The join algorithm
a
sentation
for
mar may not necessarily
semantic
lexicon.
follow
formal
English
grammar
rules. The sentence,
“You can erase non-hidden
files”,
can
a sentence,
The
by the concept creates
using
algorithm
the
[tile].
conceptual parse
replaces
tree
repreand
natural
the lan-
guage words in the parse tree with the canonical graph that defines the meaning of the word. It then combines
161
SIGDOC’92
sent ence Word ----
Concept -------
you
[user]
erase
[erase] [f ilel
file
[file-tool]
->
[file:
(purp)
->
{*3]
-> (char)->
defined.
[comp-storagel ->(obj )-> [binary-digits]
6: Semantic
Lexicon
and Canonical
As discussed in the previous than
one word
one canonical algorithm,
canonical
graphs
section,
(one for
will
to create
will
sense).
to use all
a conceptual
possible
graph
for the
sentence. For example, if there are K words in a sentence, where the a+h word has Ni canonical graphs, then the total number of possible combinations of canonical graphs that can be used to create the conceptual graph N1. NZ. . . . NK. isT= a different interpretation
a canonical
be joined
larger the
with
graph.
is produced
incorrect
word
using senses
Consider the sentence, jilesn. The join algorithm join
two
until
or
the
more
final
shows
how
to the
graph
will be eliminated canonical
Hence no resulting
sentence
manner,
graph other
the
graphs
The heuristic
that are
“You that
is created.
graph
[tile:{*}]
word
rules define
graph
sense.
a for
In this
eliminated.
can erase non-hidden semantic
lticon
share
common
concepts
For
example,
Figure
for the noun for
if it canto form
conceptual
uses the
graph
[non_hidden]
graphs
the
what
adjective
concepts
jiles
to 7
is joined
nonhidden. may occur
in the same tree for the sentence to be semantically valid. For example, not all syntactically valid sentences
162
it is possible
in GKR
basis by noting
sleep.
could
Heuristic
rules
be added
to reject
this
that ideas cannot to reject
these
Some sentences are both syntactically and semantically correct but may have two interpretations. In some cases, a previous sentence can be used to select one of the interpretations. If the sentence “John bought a tele‘John
in the same situation
sees Bill with a telescope”,
as the sentence
then one can infer that
John has the telescope and ia using it to see Bill (as ok posed to Bill having the telescope). GKR produces all semantic interpretations (as defied by join rules) and does not use reasoning among previous sentences (context reasoning) to select a single correct interpretation. Context reasoning is a challenging ever, GKR can later be extended capabfity.
Each combination represents of the sentence. Some combi-
nations will be eliminated by heuristic join rules ~oin rules combhe canonical graphs into larger graphs). For example,
However,
sentence on a semantic
scope” occurred
have more
each word
attempt
rules) graph
a word may have
sense and hence graph
The join
not
of a Join
types of sentences. It is important to note that the domain of knowledge must be limited in scope because the number of heuristic join rules increases with the size of the domain.
Graphs
graphs in the tree (through the use of heuristic until a single conceptual graph is created. This represents the meaning of the entire sentence. than
7: Example
are semantically valid. Consider Chomsky’s [9] &nous sentence “Colorless green ideas sleep furioudyn which may be syntactically correct depending on the grammar
[filel->(purp)->
more
fjron-hidd~
[grindl
schema:
Figure
/ /
Figure
[file]
/
[erase]
[user]
bon-hiddenl
\ verb-obj
/
/
non-hidden
schema:
/ verb
/ noun
[file-tool] Chidderd
[file.tool]
verb-phrase
noun-phrase
hidden
file
\
/
It is possible
that
more
than
task in itself. to incorporate one conceptual
Howthis graph
can be created from a sentence. This typically occurs with compound sentences where more than one idea may be expressed. A compound sentence occurs when two valid sentences are joined by the word and. An exand -Pie of this type of sentence is “John likes Mary Bill likes Cindyn. In this me, the two ideas are not directly connected. Hence two conceptual graphs would be created.
3.2
Processing
Conceptual
Graph
Queries The retrieval of knowledge in GKR is done by matching the query, that is expressed as a conceptual graph, to conceptual graphs in the knowledge base. The query
SIGD(2C’92
graph is compared to other grapk using a Projection Algorithm. For example, the question “Can I emse jiles Qti can be answered . non-hidden
jiles”.
by the
CYOU can ewe
assertion
The projection is shown in Figure matches each concept and relation. It
9. This example also assumes that
individuals
have unique
Often, the retrieval of information straight forward matching of concepts example.
In the previous
of the assertion
graph,
names.
is not a simple as in the previous
case, the query
w=
a sub-set
and this was sufficient
reason to
answer the query. However in other ceses, the sub-set condition is not valid. For example, consider the sentence ‘Can I erase the jile myjile ?“ whose conceptual graph is shown in Figure 10. GKR generates an additional query (sub-goal), namely to determine if “myfile is a non-hidden jile”. Once this is determined, the initial query can be answered. GKR automatically generates subgoals. The sub-goals are generated by noting that the mapping requires
of the concept
a unification
the set referent ents
and
{*}.
[fde:
between (For
individuals, of individuals
dividual
myjile
that
to [file
the individual
a detailed
see [29]
to a set
myfile]
). are
is a particular
myjile
discussion
The
symbol
not
specified.
instance
{*}], and
of refer{*}
refers The
in-
maintain a list of all previous sub-goals and detect a loop when a sub-goal is generated which has previously occurred
in the deduction.
significant
amounts
To retrieve
This solution
of additional
conceptual
would
require
memory.
graphs
from
the knowledge
base, GKR uses an exhaustive sequential seareh of conceptual graphs in the knowledge base. Conceptual graph called
matching is performed Concept Relation (CR)
tual graph
into
using a data structure list. It splits a concep
concept-relation-concept
triplets
which
form the members of a CR list. A query graph is a subset of another conceptual graph if the query graph’s CR list has all of its members in the CR list of the other conceptual graph. Hence the query shown in Figure 9 is represented
[erase]
by a CR list that
->
[user]
bon-hidde~