herald

HERALD Hybrid Environment for Robust Analysis of Language Data Afzal Ballim, Giovanni Coray and Vincenzo Pallotta Swiss Federal Institute of Technology Lausanne {Ballim,Coray,Pallotta}@di.epfl.ch April 19, 1999 Abstract This project addresses the problem of performing structural and semantic analysis of data where the syntactic and semantic models of the domain are inadequate, and robust methods must be employed to perform a “best approximation” to a complete analysis. This problem is particularly pertinent in the domain of text analysis. The ability to deal with large amounts of possibly ill-formed or unforeseen text is one of the principal objectives of current research in Natural Language Processing by computer (NLP), an ability which is particularly necessary for advanced information extraction and retrieval from large textual corpora. The results of this work can, however, be applied in other domains where a mix of partial grammatical and semantic models exist, such as in image analysis. The project builds on previous FNSRS projects by the proposers. In particular it integrates discourse analysis methods is a direct continuation of FNSRS project ROTA which addressed the problems of developing robust grammatical analysis on noisy or partially described data. While the proposers have had much success in this latter project on the development of efficient robust techniques for grammar-based structural analysis of data, these techniques must be supplemented by semantic analysis, because many analysis problems cannot be resolved in any other way. This project proposes the investigation of such methods and their integration with structural analysis into a hybrid architecture.

Keywords Robust semantic analysis, Intelligent information extraction, Discourse analysis.

1 Introduction The domain of text analysis has been chosen for its richness at both the structural and semantic level, as well as the wide number of domains upon which it touches. The rapid expansion of information systems at a global level, which has engendered the necessity for large-scale automatic analysis of textual data, makes this an area where fundamental research can be of great benefit. Information retrieval, data warehousing, and knowledge management are all areas which can immediately profit from progress in this domain.

1.1 State of the Art From a very superficial observation of the human language understanding process, it appears clear that no deep competence of the underlying structure of the spoken language is required in order to be able to process acceptably distorted utterances. On the other hand, the more experienced is the speaker, the more probable is a successful understanding of that distorted input. How can this kind of fault-tolerant behavior be reproduced in an artificial system by means of computational techniques? Several answers have been proposed to this question and many systems implemented so far, but no one of them is capable of dealing with robustness as a whole. Psycholinguistic theories are based on an idealized concept of language performance and/or competence, even when statistical methods are introduced to explain phenomena which are hardly understandable by means of a formal theory. As remarked by Ted Briscoe in section 3.7 of [ZU96]: “Despite over three decades of research effort, no practical domainindependent parser of unrestricted text has been developed”. Even if this statement dates back to 1996, during these last

1

two years no real improvements has been made in achieving full robustness for an NLP system. However several attempts has been carried out in order to approximate a robust behavior. The most common approach is to extend a classical theory of language understanding, often only at a specific level (morphologic, syntactic, semantic or pragmatic), trying to embody a certain degree of robustness. This kind of approach may seem reasonably adequate since it is often based on a solid background, but it suffers from the problem of being biased and constrained by canonical approaches to NLP. Three decades of research in NLP and computational linguistics cannot be discarded, however, but it would be useful to change perspective and see if the problem of robustness can be tackled from a different point of view. A natural consequence of this last statement would be that one should start from the scratch and use past technology “by needs” and not because of “trends”. Going in more detail, the two main reasons of failures in following the above approach are: 1. Since humans are capable of dealing with acceptable ill-formed text without any deep competence of the underlying structure of the language, it seems that proposed theories and systems are not able to perform an approximate matching between input and pre-defined structures (at whatever linguistic level). 2. Humans are able to combine different level of understanding in order achieve an acceptable or even partial understanding of the input text. Thus, a failure at a certain level can be recovered by another level or a suitable combination of levels. Systems designed following classical approaches to robustness in NLP are often monolithic and not conceived to be integrated in a distributed computational environment with behaviour such as that shown by humans. In the last decade there has been a proliferation of stochastic and probabilistic methods applied to parsing technology. Unfortunately, as G. Gazdar pointed out in [G.G96] there are essentially four problems that cannot be solved by simply extending standard parsing techniques in the such a direction: 1. Statistical methods1are not able to extract useful probabilities from modesty sized corpora. 2. The “Sparse Data Problem”: N-gram-type systems2 are unable to deal with discontinuous dependencies which pervade natural language at any linguistic level. It is not possible to give an upper bound on the amount of linguistic material that can separate two dependent elements (e.g. relative clause). This means that even enlarging the window’s size we will not be able to cope with the phenomenon completely. Furthermore, by Zipf’s law (see [Zip35]) which states that the frequency distribution of words is inversely proportional to rank, it is easy to see that a very large amount of corpora is required to assign reliable probabilities to familiar but with low frequent constructs. Many word combinations cannot be observed in the training material and thus cannot be estimated properly. Additional computations like smoothing [CG96] or backing-off [Kat87] have to be used to alleviate this. 3. Lexicalization is not compatible with the current assignment of probabilities since they can be associated only with respect to the syntactic component of lexical entries. The criteria should be extended in order to differentiate probabilities by their source (i.e. orthographical, morphological, syntactic and semantic information). Furthermore there is no lexical knowledge representation language able to deal directly with probabilities. 4. There is no great gain in robustness moving from CF-PSG (Context Free - Phrase Structure Grammars) to probabilistic ones since there is always the need of extending a limited coverage grammar and repairing the derived overgeneration using probabilities to select the “most probable” parse tree within a jungle of “possible” parse trees, even for previously simple, well formed and unambiguous sentences. Furthermore, whenever ambiguity is not solvable, the problem is shifted into the semantic analysis component. However it is not reasonable both to continue to pursue this approach in spite of discouraging results and to abandon techniques which can be profitably adapted to new perspectives. Maybe the main criticism is that stochastic and probabilistic approaches rely on the assumption that the best model of a linguistic phenomenon is the phenomenon itself. Thus, analysis becomes a measurement of how much a text fits into that model. It is clear that if an uttered sentence fails to be analyzed by such a parser, it is a model fault which is too 1 For

an introduction see [Cha97, KS97]. can be viewed as the simplest kind of statistical language model. In their most basic incarnation, they give for each word a probability of 1 preceding words. Usually, is either 2 in which case one speaks of a bigram language model, or 3 (trigram its occurrence that is conditioned by language model). For a general treatment see [Cha93, Ing96]). 2 N-grams

n?

n

2

narrow to be able to cope with unexpected input. Enlarging the generative power of the model often results in improving completeness but loosing soundness since ungrammatical utterances will also be successfully analyzed. Robustness can be tackled in different ways. An alternative approach to a pure probabilistic one has been proposed in the framework of the Vermobil project [RCJ + 98] where four different methods of parsing are integrated, namely, HPSG parser, probabilistic CFG LR-parser, chunk parser and a fall-back HMM-based3 dialogue act recognizer. When no parser is able to produce an analysis for the whole input, a module called robust semantic processing tries to combine partial analyses produced by the four parsers. Even if this cannot be considered a real distributed architecture (the robust semantic processing is more of an intelligent pipelining technique), moving towards a cooperative model of processing seems to be promising. Key Concepts in Robustness Although the number of NLP-papers referring to robustness is quite large, an agreement upon its meaning is still missing. A quite reasonable definition is: [Robustness is] ... a kind of monotonic behavior, which should be guaranteed whenever a system is exposed to some sort of non-standard input data: A comparatively small deviation from a predefined ideal should lead to no or only minor disturbances in the system’s response, whereas a total failure might only be accepted for sufficiently distorted input [Men95]. The big asset of this definition, which is close to that for graceful degradation, is that it does not assume any details about the NLP-system. Such assumptions render many competing definitions unusable. References to grammars or, more general, to parsing are not applicable to modern approaches to NLP like data-oriented processing (DOP) [BS96] which use phrases like circumvent parsing by means of parsed corpora. A weakness of the above definition is its vagueness related to judgments about the degree of deviation and the extent of disturbances. Objective comparisons related to robustness do not seem possible if a common set of evaluation criteria cannot be found, or a universal test-suite has been constructed. Attempts at this seem to have not gotten very far (see [GS95a, Bla96]). However only a “Turing Test-like” evaluation could give us a good feeling that the system is behaving in a human-like robust way. A useful a priori test to be made in order to better design a robust NLP system is the “Wizard of OZ” test (see [SJ98]). Based on the metaphor of the Turing Test, in the case of dialogue systems, a series of experiments is performed where the user is faced with a computer interface. The query-answering NLP system is simulated by a human operator and on-going dialogues between the user and the simulated computer system is recorded. Recorded corpora from those experiments are further annotated and statistically processed (see [DJA93]). In many cases, however, robustness is considered as an add-on to existing systems and it is not directly considered in the design specifications. It is also worth remarking that robustness is a necessary prerequisite in any system designed for real-life applications. Robustness at different Linguistic Levels It is quite obvious that if one is committed to a sequential organization of analysis modules dealing with each linguistic aspect, then errors produced by one module are propagated dramatically to the following ones, which must be written to explicitly cope with those errors, or else risk enlarging them. In this perspective a naive solution towards robustness may be to improve accuracy at early stages of the pipeline in order to minimize the overall inaccuracy. This policy, however, does not reflect the human behavior which appears to maximize accuracy at higher levels of language processing (e.g. there is no need to understand each word of the sentence to capture its complete meaning or the speaker intentions). 3 Hidden

Markov Models (HMMs) are more complex finite-state descriptions than n-grams, they comprise three types of probabilities: initial state probabilities (ISP), observation symbol probabilities (OSP), state transitions probabilities (STP). HMM can be viewed as devices for generation: Sequences of events are generated by an HMM (which except for the probabilities named above also is characterized by its number of states and its output symbols) by first choosing an initial state (according to the ISP), afterwards selecting an output symbol (in accordance with the OSP), set of following a transition to a new state (in congruence with the STP), and finally returning to the next to last step, or stopping. The main use of HMMs is in analysis (like finding the most probable part-of-speech assignment for a sequence of words). (see [Rab89, KS97] for a classical introduction to HMMs from an NLP point of view).

N

M

3

Morphology Robustness at this level is required in order to repair ill-formed utterances cause by the following reasons:

Speech recognition errors: These errors are generated by the speech recognition module that in certain condition is not able to provide the right textual representation of the uttered words. Written text errors: Misspellings and other form or other form of ill-formedness (e.g. concatenation of words). In the former case, if the speech module fails in recognizing the uttered world, a gap will be inserted and the treatment of the error is shifted to more abstract linguistic levels. More sophisticated speech recognizers are able to produce a set of possible representations for the uttered sentence, a word graph or lattice, containing sets of word hypotheses. In this case a kind of selection can be made which is based on the context (i.e. the surrounding words, see [WSP + 93], [Ing96]). Typically the ambiguity is taken in account at a higher linguistic level (see [WS97]). In the case of writing errors, corrections to the ill-formed words can be made by using symbolic (i.e. error-rules, see [SM98]) or sub-symbolic (i.e. neural networks in [EG94]) techniques. Robustness has always been a key issue in speech recognition for obvious reasons. The problem has been tackled roughly as a pattern recognition problem, the goal being to produce the best textual representation for the sampled word. Some of these technique (e.g. n-grams, Hidden Markov Models) can be applied quite successfully when the processing of morphosyntactical level is required. Syntax Analysis and Parsing In a classical sense, parsing analyzes utterances in two respects:

Parsing draws a crisp distinction between legitimate and illegitimate utterances. Parsing groups elementary units (e.g. words) appearing in natural language utterances into larger complex structures (e.g. parse trees). Accordingly, parsing can be viewed as low-level type of text analysis whose benefits are the following:

The legitimate/illegitimate distinction filters out utterances that should not be considered for further treatment (e.g, semantic analysis). Higher-level processing can often work better with the larger structures that are generated than with the elementary units (finding out whether or not an utterance is a question can be done more properly by looking at the overall utterance than looking at the individual words of which it is composed). Parsing may give hints about how to transform an illegitimate utterance into a legitimate one. Various techniques have been proposed that can be roughly classified as symbolic and sub-symbolic, even if there is a tendency to hybrid systems where the two techniques are suitably integrated. The aim of this section is to give an account of existing methods and techniques. This happens according to a classification that deviates from those chosen by other researchers. Instead of distinguishing, e.g., between syntactic and semantics-based approaches [Ste92], or between methods for a special domain (e.g., machine translation) [van95] the classification assumes four categories:

Knowledge-based Modified processing Changed interface or architecture Sub-symbolic techniques As in many other classifications, the border lines between these four categories are fuzzy. Several methods/systems can be classified one way or the other. To take this into account, an attempt will be made to point out where an alternative classification is possible.

4

Knowledge-based Knowledge-based approaches to robust parsing are characterized by the existence of information about possible distortions of the input, and associated symbolic rules for dealing with these cases. Spelling correction can for example use the observation that writing ht instead of th is quite common, and upon encountering an unknown word see whether or not a transposition of h and t provides a remedy. Often, the symbolic rules do not work under all circumstances, and consequently really are heuristics rather than rules. An example of this approach is to produce a parse that covers most of the input and let pseudo-words play the role of the uncovered input ([Wen93]). Knowledge-based techniques for achieving robustness have been criticized for several reasons:

The creation of knowledge (e.g., engineering a semantic ontology) is labour intensive. First, failures have to be detected and their reasons have to be analyzed; next, knowledge/rules have to be found that allow the treatment of previously uncovered input; finally, knowledge/rules has or have to be coded. The knowledge encoded in the rules often is domain specific. As a consequence, adaptation to new domains requires substantial efforts. Many knowledge-based schemes involve a considerable processing overhead. Post mortem approaches may parse the input several times, namely every time the input has been normalized in a new way. Approaches involving a modified grammar (e.g., one that is quite liberal w.r.t.. to agreement constraints) may be confronted with a vast number of parses due to over-generation. Some fragment integration approaches produce representations which only are of limited use. If, e.g., syntactic structures are produced, they may not be usable by subsequent components (like one for semantic construction). Modified Processing To the best of our knowledge, there exists no overview concerning the question of which parsing methodology is currently used most often in the realm of NLP (see [NL94] for the description of many of the algorithms used). Assuming (generalized) LR parsing of unification-based context-free grammars (see [NL95] for an introduction) to be widely used in many systems, however, seems to be a viable assumption. Accordingly, the common denominator of the approaches to robustness described in this section is that they deviate from this baseline. The most popular parsers for NLP are based on ideas from Earley, CYK, chart and LR-parsing [ASU86] techniques. All of them can be described in terms of moving from one configuration or state of the parser to another. To tackle the selection problem, the original algorithms describing the parser moves are often adapted to include probabilistic information found in the utilized grammars (see [MM91], [MW92], and [BC93]). The limited coverage problem has been addressed by, e.g., modifying the algorithms in such a way that the parser state can change without any input being consumed [LT93]. A modified processing strategy may also work on the input itself. One approach is to skip unanalyzable input until an element from a specific set of tokens (synchronizing token in compiler theory) like a punctuation mark (or a prosodic marker in systems including prosody) is encountered. Another technique is to swap input words around (see [Wen93] for a system that makes use of both). Special incarnations of the skipping approach are systems which trigger skipping after a time-out rather than after a parsing failure [SV92] as well as systems which require a certain ratio between covered and uncovered input [BR94]. Changed Interface or Architecture Exchanging linguistic information among different linguistic levels has been shown to be worthwhile in extending NLP system in a robust way. Often this has been done without an explicit notion of a distributed NLP architecture even if the current trends in computer science are actually distributed systems and agent architectures. Typically NLP system has been always considered as a sequential composition of linguistic modules with sometimes a very weak form of feedback among them. Coupling between processing at different linguistic levels has been often obtained by extending one level with features from other levels. An example of this kind of interaction is in HPSG [PS87, PS94] theory where the underlying feature based infrastructure is capable of representing both syntactic and semantic information. However HPSG is conceived as a syntactic theory and analyses are essentially parses. In [HKMS98] parsing is conceived as a constraint satisfaction problem where constraints are meant to explicitly exclude some ungrammatical input whereas unexpected phenomena will be accepted by default if they do not violate explicit constraints. Furthermore constraints can be “graded” in order to smoothly model the notion of ungrammaticality (see also [Erb93]). As in HPSG and LFG there is a coupling between syntactic and semantic level by means of “mapping” constraints which can be in turn graded. The syntactic and semantic layer may be implemented as autonomous computational entities which can be coordinated and synchronized only where strictly required (e.g. solving mapping constraints). As a natural consequence, this kind of parsing holds a high potential for parallel implementation overcoming in this way intractability of the constraint satisfaction problem (CSP). The system is extended also to deal with word graphs and a parallel implementation of a parsing algorithm is proposed in [HJZH93]. 5

Another example of changed architecture is given by a parsing technique called finite-state cascade where shallow parsers (finite-states transducers) are composed in a pipeline or, more generally, a network fashion (see [KCGS96]). These types of linguistic components are very efficient since they are essentially finite-state automata, though their single generative power is very poor (i.e. regular languages). A reductionist parser for French which addresses robustness issues is proposed in [CT96]. Changed Representations In computer science some advances are related to the invention of new methods for data representation (e.g., Adelson-Velskij-Landis trees (AVL trees)) and architectures (e.g., Common Object Request Broker Architecture (CORBA)). In a similar vein, some advances in robust processing can be traced to changes in the representations consumed or produced by the parser and the overall system architecture. Most often the progress is linked to the problems related to selection or limited coverage. It is not unusual for a parser to produce hundreds or even thousands of analyses. This may result in unacceptable processing speed or memory requirements. The techniques for tackling this problem on the level of syntactic representations are already present in the baseline processing describe in the previous paragraphs since they are an integrated part of Generalized LR Parsing [Tom91]: subtree sharing and local ambiguity packing. The former means that analyses may share identical subtrees, the latter means that non-terminals with non-isomorphic subtrees with identical yields may be represented by one single node. The advances related to the use of specific data structures for computational semantics are of a more recent date [Rey93]. Well-established is, e.g., the use of upper semi-lattices to capture the different readings that may result from the presence of multiple scope-bearing basic predicates (as in Chaque lausannois mange une fondue). This is often termed under-specification since not all scopal relationships are specified. One way out of the selection problem, which can be rephrased as select the most probable analysis, is the k-best approach (see [WSP+ 93] for the utilization in part-of-speech tagging). Here, one does not pass a single atomic representation over the interface but a set of representations, namely the k best ones. Obviously, this has consequences for the consumers of the representations since they have to be able to cope with set-valued input and have to provide means of picking from competing representations or of merging them. Repercussions for other modules are also entailed when in a chart-based parsing framework in the case of failure to find one single spanning edge, multiple completed edges are returned that cover the input only partially. Although this can also challenge the parser itself (since it has to compute which partial results to pass on), the burden is heavily shifted towards other components [HH94].The last issue may be considered as the primary concern of a robust NLP system. Semantics Although there is no clear evidence that the semantic processing follows the syntactic analysis step, sequential architectures for NLP have always been proposed. Only recently when the sequential nature of computation has been revised there has been a shift towards a more flexible configuration of linguistics modules. On the other hand committing to a rigid separation between syntax and semantics is not completely unjustified. Tradition in linguistics has kept separate these two phenomena of the human language and since linguists were the first in using computer technology for automatic language processing, it becomes clear the motivations of this choice. Developments in computer science have influenced linguistics, and more generally, cognitive theories. Agent architecture have been taken as a good approximating model of human mind where simple concurrent and communicating cognitive processes are able to “implement” the human thought process. More sophisticated linguistic theories try to tightly couple syntax and semantics in order to overcome expressiveness problems when dealing with complex linguistic phenomena (see [PS87] and [PS94]). At the word level, semantic information has been incorporated into the lexical analyzer (see [DKIZ95]) and combining the above theories with them, it is possible to solve syntactic ambiguities during the parsing process. It is worth noting that semantics is often used as a tool to improve flexibility of a syntax analyzer. In the SYSLID project (see [BHA96]) a robust parser constitutes the linguistic component (LC) of the query-answering dialogue system . An utterance is analyzed while at the same time its semantical representation is constructed. This semantical representation is further analyzed by the dialogue control module (DC) which then builds the database query. Starting from a word graph generated by the speech recognizer module, the robust parser will produce a search path into the word graph. If no complete path can be found, the robust component of the parser, which is an island based chart parser (see [HG95]), will select the maximal consistent partial results. In this case the parsing process is also guided by a lexical semantic knowledge base component that helps the parse in solving structural ambiguities. What we intend for semantics is to compute the corresponding structure carrying the meaning of a sentence. Similarly to what happens in formal languages, one would like to be able to assign a unique interpretation to a sentence computing 6

a morphism from the natural language to the meaning representation language. Of course there is no agreement at all on how much a semantic representation can be considered “meaningful”. However much efforts has been spent in computing sentence meaning in a compositional way, that is composing elementary word meanings into larger structures (e.g. lambda-terms) following suitable strategies (see [Mon73, Fre92, GM89, DWP81]). An alternative approach to compositional semantics that uses linear logic as the underlying logical framework is proposed in [DLPS97]). Usually there is an intermediate structure: the syntactic structure. Whether this intermediate structure is really sufficient for the meaning construction is debatable. In fact when dealing with ellipsis and comparative clauses with anaphoric references (see [HS76]) it is often necessary to rely on contextual information in order to centre the right focus. It is argued in [dHH98] that in such cases (e.g. incomplete construction involving nominal quantification and comparative construction), it is not possible to determine their interpretation in a purely compositional way (see [RB90]) . However, using shallow parsing techniques (see [Abn94, Abn]), it is possible, under certain restrictive assumptions, to extract the meaning of a sentence without any deep classical parsing process. In this framework, it is possible to build semantic representation of utterances assembling and completing where missing, simpler partial substructures (chunks) extracted by a weak form of syntactic analysis (chunk parsing). Reconstruction of the intended complete structure of the utterance can be guided by semantic and pragmatic knowledge. In [Zec98] the process of assembling chunks and producing a semantical representation can be made using sub-categorization frame information 4 from the well known lexicon Word-Net (see [Fel98]). Beyond the discussion as to what extent a semantic structure can be considered meaningful, it is clear that if a structure contains variables then they should be bound to individuals or left unbound considering the structure as a higher-order one (i.e. context sensitive). A recent approach extending the classical Montague Semantics can be found in [Mus95]. The main problem in semantics is to choose the “right” binding for variables among the competing assignments. These variables can be typed or untyped. Typed variables impose a more rigid policy on the assignment whereas the untyped ones allow more flexibility causing, as a side effect, increased ambiguity. Typically variables in semantic representations are the counterparts of pronouns. . Another problem that arises at the semantic level is that of quantifier scoping. In a sentence like “Every student wrote a program” it is clear from common-sense that its intended interpretation should be

8x:student(x) ) 9y:program(y ) ^ wrote(x; y ) and not

9y:program(y ) ) 8x:student(x) ^ wrote(x; y ):

Here the problem is twofold: generate all possible combinations of quantifier scopes and select the intended one. From the point of view of robustness, efficiency plays a crucial role. Computing and selecting a huge number of interpretations which will affect the whole meaning of a sentence and in some cases the surrounding text, should be kept tractable in some way. This problem can be tackled by keeping semantic representation under-specified (see [BGL+ 96, Bos96]) A suitable interplay among different linguistic levels and a smart representation of semantic entities may help in solving the above problems as in [Wor98]. Furthermore a constraint language over lambda-structures is proposed in [ENRX98] in order to deal in a compact and computational tractable way with under-specified scope representation and its related “capturing” problem. Pragmatics In the context of natural spoken dialogue systems one can be interested in extracting information about the ongoing dialogue at a higher level than syntax or semantic analysis. Typically information of this kind is referred to as speech, dialogue, or discourse structure and they can be extracted either from the previous analysis at lower linguistic level or directly from the text (for an introduction see [SH94]). Robustness in dialogue is crucial when the artificial system takes part in the interaction since inability or low performance in processing utterances will cause unacceptable degradation of the overall system. As pointed out in [AMRS96] it is better to have a dialogue system that tries to guess a specific interpretation in case of ambiguity rather than ask the user for a clarification. If this first commitment results later to be a mistake a robust behavior will be able to interpret subsequent corrections as repair procedures to be issued in order to get the intended interpretation. 4 A set of intermediate structures called frames are built from the parsed chunks according to sub-categorization constraints extracted from the WordNet lexicon. Frames are being generated on the basis of short clauses (e.g. minimal clausal unit containing at least one subject and an inflected verbal form.

7

In the DIALOGOS human-machine telephone system (see [ABD+ 97]) the robust behavior of the dialogue management module is based both on a contextual knowledge base of pragmatic-based expectations and the dialogue history. The system identifies discrepancies between expectations and the actual user behavior and in that case it tries to rebuild the dialogue consistency. Since both the domain of discourse and the user’s goals (e.g. railway timetable inquiry) are clear, it is assumed the systems and the users cooperate in achieving reciprocal understanding. Under this underlying assumption the system pro-actively asks for the query parameters and it is able to account for those spontaneously proposed by the user. The robustness degree of the system is evaluated using two metrics proposed in [DG95]. From another perspective increasing knowledge about the context can be extracted by a dialogue management module in order to improve robustness at syntactic or semantic level. Pragmatics information can contribute in building up both syntactic and semantic expectations (e.g. ellipsis or anaphora resolution). This method has been fruitful exploited in [Nas] as a method for completing partial parses of ill-formed sentences for which the parser cannot build a unique structure. Discourse information considered here includes patterns of various kind repeated frequently in the same text. Relying on the observation that in a consistent text, when an identical phrase is repeated in different sentences, the constituent words of those sentences tend to be associated in identical modification patterns with identical part of speech and identical modifiee-modifier relationship it is possible to guess the right interpretation in case of ambiguity or fill the missing constituents to build up the unifying structure for the available partial parses. The main drawback of this method is that it needs to reanalyze the part of text for which the parser failed in producing a complete parse using the discourse information built up on the basis of successful complete parses of the first analysis. Thus the method is not incremental and highly dependent on the source text. However this approach shows its effectiveness in the task of translating technical documentation.

2 Work by the Project Proposers Afzal Ballim has been actively involved in research in Artificial Intelligence (AI) and Information Systems since 1986, in particular in the domains of human-computer communication, and interactive document systems. As a research fellow of the Computing Research Laboratory (NMSU, USA) he worked on projects of knowledge representation [BCdRF89], natural language processing [WFBH89], and agent modeling in dialogue [BW90, BWB91, WB87]. As a researcher at the Institute Dalle Molle pour les Etudes Se´mantique et Cognitives (ISSCO), he worked on numerous projects related to dialogue and text processing, natural language processing and machine assisted translation. These include a project on using structural information in grammars [RBCWA92]; a machine translation project ([EBRWA90, RBEWA91]; and an FNSRS project on viewpoint modeling in dialogue [BW91a, BW91b, Bal93]. He devised a system called LHIP [BR94] which is a parser designed for broad-coverage handling of unrestricted text. The system interprets an extended DCG formalism to produce a robust analyzer that finds parses of the input made from “islands” of terminals (corresponding to terminals consumed by successful grammar rules). It is currently in use for processing dialogue transcripts from the HCRC Map Task Corpus [Ao92]. LHIP provides a processing method which allows selected portions of the input to be ignored or handled differently — this makes it useful in analyzing free text. Over one hundred copies of the LHIP system have been distributed to other researchers around the world. He was the ISSCO coordinator for the Linguistics Research and Engineering (LRE) project TRANSTERM (Creation, Reuse, Normalization and Integration of Terminologies in Natural Language Systems). As a researcher on another LRE project called MULTEXT, he was charged with two tasks: (1) the creation of a tool to find correspondences between texts and their translations (known as an alignment tool); (2) coordination of the task of defining standards for data interchange between tools processing a structured document. Since joining the EPFL, he has been working on the application of AI and NLP techniques to the processing and indexing of large collections of text [BVC96, LB98], on multi-lingual text analysis [LB96, BCLV98], on multimedia document analysis [BBC98] and on dialogue analysis for knowledge management [BK98]. He has taught courses on text understanding, information retrieval, and multimedia documents. He is co-founder of the MEDIA Research Group at the EPFL (Models & Environments for Document related Interaction and Authoring). Giovanni Coray is the director of the Laboratoire d’informatique theórique (LITH) at the EPFL de´partement d’informatique. He has directed several theses on document modeling and analysis, dynamic hyper-texts, prototyping translation tools, etc. Teaching activities include courses on pattern recognition, formal systems, semantics of programming languages, and multimedia document structures. He acts as scientific director of Suissetra, the Swiss association for the development of natural language translation tools and translator aids, and is member of several conference programme committees and editorial boards in the document processing field. 8

Vincenzo Pallotta is currently an assistant and a Ph.D. student at the Laboratoire d’informatique theórique (LITH) at the EPFL and he previously performed his research mainly in the field of computational logic applied to knowledge representation in Artificial Intelligence. During his M.Sc. thesis he developed and implemented an extension of the logical framework Features and Fluents [San94] for performing temporal reasoning about action and change fully integrated into a constraint logic programming paradigm [PT98]. He was technical collaborator at the CISIAU (Centro Interdipartimentale Sistemi Informatici per l’Area Umanistica) of the University of Pisa where he was involved in a project supported by the Chancellor Committee for the History of University about the Statistical Processing of Historical data of the university degrees. He was research collaborator at the Computer Science Department of the University of Pisa and he was involved in two national projects coordinated by CNUCE/CNR(Centro Nazionale Universitario per il Calcolo Elettronico - Consiglio Nazionale delle Ricerche) for the development of data mining and intelligent knowledge discovery tools within deductive databases. Among his professional activities, before and during his undergraduate studies he was involved in teaching within university (as assistant) and within regional public administration (Centri di Formazione Professionale) as lecturer. He was also employed as analyst-programmer in some software development companies. LITH is deeply involved in research on topics related to complex structured documents and their analysis. Since 1993 LITH has actively participated in the development of and initiatives related to the WWW. A new WWW browser was developed by LITH which extended the functions provided by early browsers. This software provided a new generation of WWW tools that unified the user interface of the OS with the web’s client, and that integrated a service-based approach.

3 Research Plan The goal of this project is to investigate a hybrid semantico-syntactic approach to robust text analysis (ROTA-II) that can be used on document collections to facilitate the task of intelligent indexing for information extraction and retrieval, or in other tasks that require extraction of semantic information from textual corpora. It is expected that the results of this research will be applicable in non-textual domains as well, where grammars and logical semantic descriptions can be employed. The importance and difficulty of such a task, while well known to people within the document processing community, has recently become apparent outside of this community through the difficulty of accessing material on the World-Wide Web (WWW). The current generation of systems that attempt to provide central access to the enormous amount of information on the WWW (approximately 50,000,000 documents, or 80,000,000,000 words) are based on standard information retrieval (IR) technology and have a tendency to either produce far too many results or not enough (i.e., they recall far too many documents and their precision is not accurate enough). The next generation of search engines will critically depend on robust analyzers such as that proposed in this project. Results of the First Phase Natural Language Processing (NLP) has significantly changed in the last decade [Wil96]. One way of describing what has happened is to say that there has been a move towards systems that do not only work under very specific conditions (toy worlds) but under a wide range of circumstances (real-world applications). An example for this shift can be found in the realm of lexicons for NLP-systems. Whereas at the end of the of 80s the average number of words in an NLP-system was sometimes reported to be less than 50 [GPWS96], currently the vocabularies of complex NLP-applications like spoken-language machine-translation systems is in the thousands of words [Wai96]. Intimately related to the transition that occurred is the increased ability of NLP-systems to deal, at least to a certain degree, with input that their creators did not have in mind. Several spoken-language dialogue systems can generate sensible responses for users even if confronted with syntactic constructions that are not covered by the grammar rules that are part of their syntax analysis module [HH94]. The probably most popular term to describe the kind of system performance mentioned above is robustness. The first phase of the ROTA project tackled this issue from the perspective of syntactic analysis - that is, using grammars to find the inherent structure underlying the object to be analyzed. It involved developing a robust syntactic analysis system which we refer to as Extended LHIP (or ELHIP). To consider the problem of robust analysis, one must understand that the following question is at the heart of the problem:

9

How can real-world text be translated into higher-order representations (e.g., of a meaning representation language)? This view clearly states that a simple grammatical/ungrammatical distinction or the construction of a syntax tree is of little help for most NLP-applications (and is in harmony with recent work on robust text analysis [Zel95]). Rather, a more complex description is required which reveals relations between entities in the text or data being analyzed. Within NLP, the trend has been to use computational logic (in some form) to achieve this. Historically the role of computational logic in computer science has been that of declarative languages to write executable specifications. In computational linguistics, all the syntactic formalisms can be considered as deductive systems in this sense (see [BDL+ 97]) and logic formalisms are often used in conjunction with these deductive systems to express semantics and pragmatics. The main advantage of using logic -based programming languages is the symbol processing capability and the way of abstracting from the actual implementation of needed data structures. This perspective attracted the linguists who were not skilled programmers and who needed to rapidly represent and check their linguistic theories. The main drawback to this approach is efficiency, but it is not the only one. In recent years several efforts have been done to improve efficiency of logic and functional programming languages by means of powerful abstract machines and optimized compilers. Sometimes, efficiency recovery leads to introduction of non-logical features in the language and the programmer should be aware of it in order to exploit it in the development of his or her applications (i.e. cut in logic programming). An important question to ask is: “how can computational logic contribute to robust text analysis ?”. A partial answer to this question is that currently logic-based programming languages are able to integrate in an unifying framework all or most of the techniques necessary for robust syntactic analysis. Furthermore this can be done in a rigorous “mathematical” fashion. In this sense robustness is related to correctness and provability with respect to the specifications. A NLP system developed within a logical framework has a predictable behavior which is useful in order to check the validity of the underlying theories. Definite Clause Grammars (DCG) form an interesting bridge between logic programming and natural language processing., and where the starting block in the first phase of the ROTA project. Considered from the perspective of robust NLP, however, this union reveals some deficiencies, which we will mention below. In 1995, Erbach and Manandhar [GS95b] examined the state of the art in logic programming techniques for NLP and a wish-list of features for future developments were proposed. Nowaday some of those desired features are available in actual systems and can be considered as referring to the two following activities: 1. Support for the development of linguistic models of natural language (Computational Linguistics) 2. Support for the design of real life applications (Language Engineering) Declarativeness gives an benefit to both activities, while from the perspective of robust analysis, it allows one to specify the robustness problem in a more rational way. Rather than being concerned with implementation details, robustness can be achieved through suitable composition (possibly concurrently) of logical modules (theories) which have a clear mathematical semantics. Thus robustness can be achieved by means of a cooperating linguistic architecture and suitably stated by flexible linguistic formalism. The main achievement in the first phase of ROTA project is in showing that:

A logic programming framework for NLP based on definite clause grammars can be extended fruitfully to cope with the problem of robustness (e.g. ELHIP). It is fairly easy to integrate LP-based NLP applications and tools (i.e. splitters, taggers, parsers and other consumer applications). A summary of the improvements in the ELHIP are summarized in [LB98, Lie97]. The main issues are:

enlargement of generative power by introduction of epsilon rules elimination of memoing implemented in the PROLOG clause database compilation “on the fly” of pre-terminal rules. 10

Furthermore several experiments have been carried out to integrate NLP tools (e.g. Brill Tagger see [Bri95] into an uniform framework (e.g. GATE platform [CGW95]) for robust NLP. On the other hand further investigations have put in evidence the following shortcomings and limitations of the ELHIP system (and in general of the current logic-based NLP systems): Efficiency

exponential time complexity when using unconstrained island sizes over-generation of analyses due to semantic ambiguity no exploitation of possible parallelism inefficient use of search space reduction possibilities that semantic approaches could afford Flexibility

no utilization of global information about the grammar no identification of undefined terminals (unknown words) limitations on negation no control mechanism for dynamically set thresholds Evaluation

sub-optimal ordering of parse result (according to lexical ordering of rules; alternative: based on highest coverageto-span-ration) no indication of how complete an analysis is. The above list can be reduced drastically by a suitable integration of existing logic programming techniques such as constraint logic programming over arbitrary structures (i.e. feature trees over arbitra ry structures, see [MMP99]), built-in memoing techniques, (i.e. tabled logic programming in the XSB system [xsb] and coroutining constraints [JD95]) and alternative linguistic approaches (i.e. approximate reasoning, robust unification formalism, coupling between syntax, semantics and pragmatics, etc.). Current research in logic-based natural language processing is moving towards these kind of improvements of existing implemented systems. In [GS95b] the state of the art in Logic Programming techniques for NLP is examined and a wish-list of features for future developments. The underlying idea of the ELHIP system is very attractive since it allows one to perform parsing at different arbitrary levels of “shallowness”. On the other hand, implementation is not completely satisfactory. The state of the art in the current logic programming technology provides powerful languages capable of overcoming the above shortcomings. This can be done using concurrent and/or constraint based formalism like Oz [moz] and ECLiPSe [ecl] since they support directly (e.g. at the abstract machine level) computational facilities needed to improve the efficiency and the expressive power of ELHIP. Granted that ELHIP had reached a limit in computational feasibility, it can be envisioned that its role in a NLP system would be:

Chunk extraction Implementation of semantic grammars Concept spotting Extraction of dialogue acts Robustness in ELHIP can be now considered from the following perspectives: 11

Robustness as extending coverage Robustness as improving efficiency Robustness as disambiguation process Robustness as approximate reasoning Robustness as enhancement of linguistic theories In order to design a NLP system which takes in account the above issues, it is crucial to understand that often they are in some sense orthogonal and strongly dependent to the linguistic level. Extending Coverage

There are two main approaches to enlarging the coverage of a given grammar:

1. Extending the grammar (e.g. introducing new rules modeling previously uncovered linguistic phenomena). 2. Weakening the existing grammar by constraint relaxation. The first approach has a serious drawback since it will easily lead to over-generation. In order to overcome the proliferation of analyses a probability-based selection is made. In this case we have additional computational costs derived by the increased complexity of the grammar and by the computation of the preference relation among the competing analyses. This approach is the only feasible one when there is no way of extending coverage other than enriching the grammar rule set (e.g. Context Free Grammars) and even if efficient probabilistic parsing algorithm [CR98] are available it is still not satisfactory. To follow the second direction, it is necessary to represent explicitly linguistic constraints (e.g. using feature structure) rather than compiling them into Context Free Grammars. Actual Constraint Logic Programming (CLP) systems allow us to deal with these kind of data structure giving the possibility of defining suitable constraint satisfaction/propagation techniques (e.g. Constraint Handling Rules (CHR), see [Fru98]). Successful experiments have been carried out towards this direction within the context of HPSG theory in [SHKV97] using the CLP language ECLiPSe. Starting from this framework it is reasonable to envision more sophisticated constraint relaxation techniques in order to evaluate the degree of ill-formedness by a suitable computation of penalty factor for each violated constraint like in [Men95] and [Erb93]. Improving Efficiency When moving from a formalism for representing linguistic phenomena to a more expressive one, the price to pay is often a loss of efficiency. Efficiency can be recovered by using both memoing (see [War98]) and coroutining techniques available in actual logic programming systems. As a guideline in pursuing efficiency improvement, an interesting approach can be found in [vNBKN97] where a head corner parsing is proposed that mixes top-down and bottom-up analyses. Furthermore a concurrent version of LHIP can be thought of, relying on an ask-tell model such as that available in concurrent constraint programming languages (see [Sar93, Mar98]). The disambiguation problem Constraint-based techniques seem to naturally address the problem of disambiguation among competing interpretations by their ability to implicitly represent sets of structures. Combining constraint satisfaction over a suitable structure (e.g. feature structures or finite domains) with a powerful inference system it seems to be possible to tackle the disambiguation problem better than using classical approaches (e.g. preference-based selection of explicit set of interpretation). Furthermore, in a constraint-based language like Oz where concurrency is allowed, it is possible to coordinate more than one module in order to implement a cooperating multiple-strategy analyzer. This kind of solutions in NLP has been started to appear only recently (see the Verbmobil NEGRA project in [NPR97, AKS99]) and they address typically under-specified semantic representations. Approximate Reasoning The main assumption on which probabilistic NLP is based, is that language is considered as being a random phenomenon with its own probability distribution function. Thus coverage of linguistic phenomena is often translated as expectation of those phenomena in a probabilistic sense. Changing perspective and considering language just as an uncertain and imprecise phenomenon and understanding as a perception process, it is naturally to think of “fuzzy” models of language. Fuzzy set theory and hence Fuzzy Logic applied to language processing (see [LZ69]) seems to be a promising approach (and has already been investigated in [Asv96]). Recently, fuzzy reasoning has been partially integrated into a CLP paradigm (see [Rie96]) in order to deal with so called soft constraints in weighted constraint logic grammars. 12

Enhancement of linguistic theories Robustness in linguistic formalisms can be considered as an attempt to model directly general extra-grammatical phenomena within a linguistic theory. A solution envisioned in [Fou98] proposes considering robustness as an integral property of language processing. This can be done partly using fine-grained, qualifying information such as typed feature structures (see [Car92]) and accounting for unexpected input just mapping it onto a certain level in the subsumption-based hierarchy provided by the typed feature structure logic. This level represents the degree of grammaticality and an incorrect sentence will be mapped into the top-most level and hence recognized simply as a sentence. A Hybrid Semantico-Syntactic Approach The above approaches, however, cannot provide a fundamental breakthrough in robust analysis. Rather, they will permit more efficient robust syntactic analysis. Many problems remain to be resolved. From our investigation, we believe that the source of these problems lies elsewhere. Current syntactic approaches suffer from the effect of trying to resolve problems with syntactic methods when they are not appropriate. For instance, much of the problem of over-generation of results stems from inherent semantic ambiguity in text. This ambiguity cannot be resolved by syntactic means alone (although sometimes syntactic clues do exist, such as noun-verb agreement). Therefore, the ambiguity results in multiple analyses being generated. In most cases, simple semantic and pragmatic analysis would quickly eliminate such ambiguity. Consider, as an example, the sentence “He ate the sandwich with a pickle.” There are two possible attachments of “with a pickle”, one being to “the sandwich” and the other to the verb “ate.” Semantically, however, only the former of these would normally be considered as a valid reading of the sentence. The opposite attachment would be semantically most appropriate for the sentence “He ate the sandwich with a fork.” The use of semantic and pragmatic interpretation is not limited to such disambiguation, however. Much text and spoken dialogue is composed of social conventions - greetings, introductions, platitudes, and other phrasal subparts whose role is at a dialogue or discourse level. Syntactic analysis of such data is not only futile, but leads to an explosion of the number of grammar rules necessary to analyze the data. We noted, in one project in which the proposers were involved, that the grammar grew from fifty to over four hundred rules in this way. Semantic/pragmatic analysis cannot be used in isolation, however. NLP research in artificial intelligence in the 1960’s and 1970’s has shown the inadequacy of such an endeavor. What is needed, and critically so in robust analysis, is a hybrid approach which incorporates the different methods into an integrated system that makes best use of each one of them. This is in star contrast to the general approach, which is to use a pipeline of processing, beginning with low-level morphological analysis, and proceeding to syntactic, then semantic, and finally pragmatic or discourse processing. This latter approach exaggerates the disadvantages of each of the earlier processing methods, rather than helping to eliminate them, because each module typically attempts to produce all possible analyses. An integrated hybrid approach would allow for interaction between modules to reduce the search space in an optimized manner. Previous approaches at creating a hybrid system have, we believe, been insufficient due to their concentration on specific language problems, or due to committment to a specific processing technique. We propose investigating the different hybrid approaches already mentioned, in order to characterise their advantages and disadvantages from a global, theoretic perspective, and thus to propose a more tightly-coupled multi-level processing architecture. Such an architecture might use, for instance, a white-board or dsitributed agent approach. One possible result would be an architecture in which the application of different levels of analysis in performed in parallel, with communication between the analysis processes so that they can provide each other with information that can reduce the ambiguity in the material being analyzed, or aid in the analysis of material for which no complete description (syntactic and/or semantic) exists. An alternative architecture would be a rule driven one, where each analysis level is broken up into smaller modules which can be applied in a linear (but potentially interlapping) manner. For instance, a discourse analysis module might be used initially to segment text into sections that require deeper analysis, and those which are less content-bearing (such as greetings, etc.). Simple syntactic chunking might follow on, to identify important constituent structure (noun-phrases, verbal phrases, etc.). Only further research in this area will reveal the appropriate modules, and their interaction with each other. 13

A very challenging task is to chose a suitable representation framework for semantic/pragmatic information extracted from the analyzed text. Based on the assumption that linguistic interaction can be represented by speech or dialogue acts (see [CP79]) it could be worthwhile to see how to integrate a non-monotonic temporal reasoning system such like Features and Fluents (see [San94]) within such a framework. This should be feasible, since the system Fluent Logic Programming (see [PT98]) is already designed with the same backbone as that used by many existing linguistic tools (e.g. LHIP). In conclusion, this project proposes fundamental research into the relationship between semantic and syntactic analysis of data in a robust framework. The relations between these methods will be explored with the goal of producing a theory of optimal search space reduction in the analysis process through cooperative analysis.

3.1 Timetable of Work Design Phases On the basis of the results of the ROTA project a hybrid architecture for robust semantic and pragmatic analysis of textual data will be initially proposed. After a first stage of investigation on how a hybrid architecture can be conceptually devised, computational issues will be addressed for its effective implementation.

Jan. 2000 - Mar. 2000: Hybrid architecture. Sept. 2000 - Dec. 2000: Computational-based refinement of the linguistic architecture. Implementation Phases Starting from the computational tools developed within the ROTA project a collection of autonomous and communicating software modules will be developed. This will be done following the guidelines explained in the previous sections. At the next stages modules will be composed and coordinated into a distributed architecture and finally a demonstrator will be engineered.

Mar. 2000 - Sept. 2000: Implementation of hybrid analysis modules and their interfaces. Sept. 2000 - Dec. 2000: Development of a prototype for the hybrid architecture. Jan. 2001 - July 2001: Final demonstrator implementation. Evaluation Phases In order to suitably integrate the various developed modules into the hybrid architecture, evaluation criteria will have to be proposed. Each criteria will address to those issues which have been taken into account during the design and implementation stages.

Mar. 2000 - July 2000: Proposal of criteria for the evaluation of the adequacy of the proposed model. Sept. 2000 - Dec. 2000: Evaluation of single modules. Jan. 2001 - Mar. 2001: Evaluation of the prototype. July 2001 - Sept. 2001: Evaluation of the final demonstrator. 3.1.1 Expected Results and Importance of the Project Information systems are becoming pervasive in modern society. However, the tools to help people deal with the accumulated mass of information show great deficiencies when it comes to analyzing the data. For instance, with the recent upsurge in interest in the World-Wide Web, more people than ever before are being faced with the reality of the inadequacy of current information retrieval techniques to provide pertinent results (and only those ones that are pertinent). Better text analysis methods are needed to form the basis of the next generation of systems for indexing and characterizing texts. The work proposed here is expected to be an important step in that direction. The basis of work on robust parsing is the assumption that partial results can be useful (often much more useful than no result at all), and that an approximation to complete coverage of a document collection is more useful when it comes with indications of how complete it is. This latter point is especially important in cases where a grammar must be usable to

14

some degree at a relatively early stage in its development, as is, for example, the case with the development of a grammar for a large corpus. However, current methods for analysis of large text collections are far from being adequate in terms of the coverage they can give of the corpora, and in their complexity properties. Although some systems claim to give up to 90% coverage of a corpora, this is often achieved by either restricting the corpus to conform to the structures found within a training corpus, or else is based on the average phrase being analyzed with 90% correctness (which can mean that very few phrases are in fact correctly analyzed). The first phase of this project has shown the viability of the structural analysis techniques that we are developing, but it has also shown their shortcomings. These shortcomings can best be addressed by developing and integrating semantic analysis methods, which will be a crucial and fundamental advancement in this field.

References [ABD+ 97]

Dario Albesano, Paolo Baggia, Morena Danieli, Roberto Gemello, Elisabetta Gerbino, and Claudio Rullent. Dialogos: a robust system for human-machine spoken dialogue on the telephone. In Proc. of ICASSP, Munich, Germany, 1997.

[Abn]

Steven Abney. Partial Parsing via Finite-State Cascades. casc.ps.gz.

[Abn94]

Steven Abney. Partial parsing, 1994. A tutorial presented at ANLP-94, Stuttgart, DE. http://www.sfs.nphil.uni-tuebingen.de/ãbney.

[AKS99]

Joachim Niehren Alexander Koller and Kristina Striegnitz. Relaxing underspecified semantic representations for reinterpretation. Technical report, Department of Computational Linguistics and Programming Systems Lab, Universita¨t des Saarlandes, February 1999. Submitted. Available at http://www.ps.unisb.de/Papers/abstracts/Relax99.html.

[AMRS96]

J.F. Allen, B. Miller, E. Ringger, and T. Sikorski. A robust system for natural spoken dialogue. In Proc. 34th Meeting of the Assoc. for Computational Linguistics. Association of Computational Linguistics, June 1996.

[Ao92]

A. H. Anderson and others. The HCRC map task corpus. Language and Speech, 34(4):351–366, 1992.

[ASU86]

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers – Principles, Techniques and Tools. AddisonWesley Series in Computer Science. Addison-Wesley, Reading, MA; Menlo Park, CA; Don Mills, Ontario, 1986.

[Asv96]

Peter. R.J. Asveld. Towards robustness in parsing - fuzzifying context-free language recognition. In J. Dassow, G. Rozemberg, and A. Salomaa, editors, Developments in Language Theory II - At the Crossroad of Mathematics, Computer Science and Biology, pages 443–453. World Scientific, Singapore, 1996.

[Bal93]

A Ballim. Propositional attitude framework requirements. Journal for Experimental and Theoretical Artificial Intelligence(JETAI), 5:89–100, 1993.

[BBC98]

F. Buchs, A. Ballim, and G. Coray. Content-oriented querying of shoeprints databases using graph similarity. Technical Report 113, Swiss Federal Institute of Technology, LITH, 1998.

[BC93]

E. J. Briscoe and John Carroll. Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars. Computational Linguistics, 19(1):25 – 59, 1993.

[BCdRF89] A. Ballim, S. Candelaria de Ram, and D. Fass. Reasoning using inheritance from a mixture of knowledge and beliefs. In S.R. Ramani and K. Anjaneylu, editors, Proceedings of the KBCS’89 conference on Knowledge Based Computer Systems, pages 387–396, Delhi, 1989. Narosa Publishing House. [BCLV98]

A. Ballim, G. Coray, A. Linden, and C. Vanoirbeek. The use of automatic alignment on structured multilingual documents. In Roger D. Hersch, Jacques Andre´, and Heather Brown, editors, Proceedings of EP’98: seventh International Conference on Electronic Publishing, Document Manipulation and Typography, volume 1375 of LNCS, pages 464–475, Saint-Malo, France, 1–3 April 1998. Springer.

15

[BDL+ 97]

[BGL+ 96]

Patrick Blackburn, Marc Dymetman, Alain Lecomte, Aarne Ranta, Christian Retore´, and Eric Villemonte de la Clergerie. Logical aspects of computational linguistics: an introduction. In Christian Re´tore´, editor, Logical Aspects of Computational Linguistics, volume 1328 of LNCS/LNAI, pages 1 – 20. Springer, 1997. Johan Bos, Bjo¨rn Gamba¨ck, Christian Lieske, Yoshiki Mori, Manfred Pinkal, and Karsten Worm. Compositional Semantics in Verbmobil. In Proceedings of the 16th International Conference on Computational Linguistics, volume 1, pages 131 – 136, Copenhagen, 1996.

[BHA96]

Manuela Boros, Gerhard Hanrieder, and Ulla Ackermann. Linguistic processing for spoken dialogue systems - experiences made in the syslid project -. In Proceedings of the third CRIM-FORWISS Workshop, Montreal, Canada, 1996.

[BK98]

A. Ballim and N. Karacapilidis. Modelling discourse acts in computer-assisted collaborative decision making. In Proceedings of the Second International Conference on Practical Aspects of Knowlege Management, pages 29–30, Basel, Switzerland, October 1998.

[Bla96]

Ezra Black. Evaluation of Broad-overage Natural-language Parsers. In Joseph Cole Ronald A., Mariani, Hans Uszkoreit, Annie Zaene n, and Victor Zue, editors, Survey of the State of the Art in Human Language Technology, chapter 13.4. Cambridge University Press, 1996.

[Bos96]

Johan Bos. Predicate logic unplugged. Technical Report VM 103, Universita¨t des Saarlandes, February 1996.

[BR94]

A. Ballim and G. Russell. LHIP: Extended DCGs for Configurable Robust Parsing. In Proceedings of the 15th International Conference on Computational Linguistics, pages 501 – 507, Kyoto, Japan, 1994. ACL.

[Bri95]

Eric Brill. Transformation-based Error-driven Learning and Natural Language Processing: a Case Study in Part of Speech Tagging. Computational Linguistics, 1995.

[BS96]

Rens Bod and Remko Scha. Data-Oriented Language Processing: an Overview. Technical Report LP-9613, University of Amsterdam, 1996. dop.ps.gz.

[BVC96]

A. Ballim, C. Vanoirbeek, and G. Coray. Language processing in document engineering. European Consortium for Informatics and Mathematics News, (25), April 1996.

[BW90]

A. Ballim and Y. Wilks. Relevant beliefs. In Proceedings of ECAI-90, pages 65–70, Stockholm, 1990.

[BW91a]

A. Ballim and Y. Wilks. Artificial Believers. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1991.

[BW91b]

A. Ballim and Y. Wilks. Beliefs, stereotypes and dynamic agent modelling. User Modelling and UserAdapted Interaction, 1(1):33–65, 1991.

[BWB91]

A. Ballim, Y. Wilks, and J. Barnden. Belief, metaphor, and intensional identification. Cognitive Science, 15(1):133–171, 1991.

[Car92]

Bob Carpenter. The logic of typed feature structures with applications to unification grammars, logic programs and constraint resolution. Number 32 in Cambridge Tracts in Computer Science. Cambridge - New York - Melbourne, 1992.

[CG96]

S. Chen and J. Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Meeting of the Association for Computational Linguistics. ACL, 1996.

[CGW95]

H. Cunningham, R. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE) – a new Approach to Language Engineering R & D. Technical Report CS-95-21, University of Sheffield, 1995.

[Cha93]

Eugene Charniak. Statistical Language Learning. MIT Press, Cambride, MA, 1993.

[Cha97]

Eugene Charniak. Statistical techniques for natural language parsing. AI Magazine, 1997.

[CP79]

P.R. Cohen and C.R. Perrault. Elements of a plan-based theory of speech acts. Cognitive Science, 3(3):177– 212, 1979.

[CR98]

J.-C. Chappelier and M. Rajman. A generalized cyk algorithm for parsing stochastic cfg. In 1st Workshop on Tabulation in Parsing and Deduction (TAPD98), pages 133 – 137, Paris, April 2-3 1998. 16

[CT96]

Jean-Pierre Chanod and Pasi Tapanainen. A robust finite-state grammar for french. In ESSLLI’96 Workshop on Robust Parsing, Prague, Czech Republic, August 12-16 1996.

[DG95]

M. Danieli and E. Gerbino. Metrics for evaluating dialogue strategies in a spoken language system. In Working Notes of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pages 34–39, Stanford, California, March 1995. AAAI.

[dHH98]

Hellen de Hoop and Petra Hendriks. Semantics in the absence of syntax. Lecture notes of th 10th European Summer School in Logic, Language and Computation, August 1998. Saarbru¨cken.

[DJA93]

Nils Dahlba¨ck, Arne Jo¨nsson, and Lars Ahrenberg. Wizard of Oz studies – why and how. KnowledgeBased Systems, 6(4):258–266, December 1993. Previous version published in Proceedings from the 1993 International Workshop on Intelligent User Interfaces, Orlando, Florida, pp. 193-200. 1993.

[DKIZ95]

Mary Dalrymple, Ronald M. Kaplan, John T. Maxwell III, and Annie Zaenen, editors. Formal Issues in Lexical-Functional Grammar. Stanford University: Center for the Study of Language and Information, 1995.

[DLPS97]

Mary Dalrymple, John Lamping, Fernando C. N. Pereira, and Vijay Saraswat. Quantifiers, anaphora, and intensionality. Journal of Logic, Language, and Information, 6(3):219–273, 1997.

[DWP81]

David R. Dowty, Wall, and P. Stanley Peters. Introduction to Montague Semantics. Reidel:Dordrecht, 1981.

[EBRWA90] D. Estival, A. Ballim, G. Russel, and S. Warwick-Armstrong. A syntax and semantics for feature structure transfer. In Proceeding of the 3rd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language, pages 131–143, Austin, 1990. [ecl]

The ECLiPSe constraint logic programming system. http://www.icparc.ic.ac.uk/eclipse/.

[EG94]

Martin Eineborg and Bjo¨rn Gamba¨ck. Neural networks for wordform recognition. Research Report R94005, SICS, Stockholm Sweden, February 1994.

[ENRX98]

Markus Egg, Joachim Niehren, Peter Ruhrberg, and Feiyu Xu. Constraints over lambda-structures in semantic underspecification. In Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (COLING/ACL’98), pages 353–359, Montreal, Canada, August 1998.

[Erb93]

Gregor Erbach. Towards a Theory of Degrees of Grammaticality. CLAUS-Report 34, Universita¨t des Saarlandes, 1993.

[Fel98]

Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. Language, Speech, and Communication Series. MIT press, 1998.

[Fou98]

Frederik Fouvry. Robustness in linguistic formalism. Ph.D. proposal presented at the Research Student Presentation, Department of Language and Linguistics, University of Essex, May 1998.

[Fre92]

G. Frege. On sense and reference. In Geach and Black, editors, Translation from the Philosophical Writings of Gottlob Frege. Blackwell, Oxford, 1892.

[Fru98]

Th. Fruehwirth. Theory and practice of constraint handling rules. Journal of Logic Programming, Special Issue on Constraint Logic Programming (P. Stuckey and K. Marriot Eds.)(Vol 37(1-3)):95 – 138, October 1998.

[G.G96]

G.Gazdar. Paradigm merger in natural language processing. In Ian Wand Robin Milner, editor, Computing Tomorrow: Future Research Directions in Computer Science, pages 88 – 109. Cambridge University Press, Cambdridge, 1996.

[GM89]

Gerald Gazdar and Chris Mellish. Natural Language Processing in PROLOG: An introduction to Computational Linguistics. Addison Wesley, 1989.

[GPWS96]

Louise Guthrie, James Pustejowsky, Yorick Wilks, and Brian M. Slator. The Role of Lexicons in Natural Language Processing. CACM, 39(1):63 – 72, January 1996.

17

[GS95a]

Julia Rose Galliers and Karen Sparck Jones. Evaluating Natural Language Processing Systems. Lecture Notes in Artificial Intelligence. Springer, Berlin - Heidelberg - New York, 1995.

[GS95b]

G.Erbach and S.Manandhar. Visions for logic-based natural language processing. In Proceedings of the ILPS ’95 workshop: "Visions for the Future of Logic Programming - Laying the Foundations for a Modern Successor to Prolog", Portland, Oregon, 1995.

[HG95]

G. Hanrieder and G Go¨rz. Robust parsing of spoken dialogue using contextual knowledge and recognition probabilities. In Proceedings of the ESCA Tutorial and Research Workshop on Spoken Dialogue Systems – Theories and Applications, pages 57–60, Denmark, May 1995.

[HH94]

Gerhard Hanrieder and Paul Heisterkamp. Robust Analysis and Interpretation in Speech Dialogue. In Heinrich Niemann, Renato De Mori, and Gerhard Hanrieder, editors, Progress and Prospects of Speech Research and Technology, pages 204 – 211, Sankt Augustin, 1994. Infix.

[HJZH93]

M. P. Harper, L. H. Jamieson, C. B. Zoltowski, and R. A. Helzerman. Parallel spoken language constraint parsing. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages 63–66, Minneapolis, MN, April 1993.

[HKMS98]

Johannes Heinicke, Ju¨rgen Kunze, Wolfgang Menzel, and Ingo Schro¨der. Eliminative parsing with graded constraints. In Proc. 17th Int. Conference on Computational Linguistics, pages 526–530, Montreal, Canada, 1998.

[HS76]

Jorge Hankamer and Ivan Sag. Deep and surface anaphor. Linguistics Inquiry, 7:391–428, 1976.

[Ing96]

Peter Ingels. A Robust Text Processing Technique Applied to Lexical Error Recovery. Linko¨ping University, Sweden, 1996.

[JD95]

M. Johnson and J. Doerre. Memoization of coroutined constraints. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995.

[Kat87]

S. Katz. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on ASSP, 35(3):400 – 401, 1987.

[KCGS96]

Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering. CUP Journals: Natural Language Engineering, 2(4):305–328, 1996.

[KS97]

B. Krenn and C. Samuelsson. The linguist’s guide to statistics. University of the Saarland, December 1997. compendium for a course in Statistical Approaches in Computational Linguistics.

[LB96]

A. Linden and A. Ballim. Controˆle des textes multilingues. Rapport Technique 110, Ecole Polytechnique Fe´de´rale de Lausanne, 1996.

[LB98]

C. Lieske and A. Ballim. Rethinking natural language processing with prolog. In Proceedings of Practical Applications of Prolog and Practical Applications of Constraint Technology (PAPPACTS98), London,UK, 1998. Practical Application Company.

[Lie97]

Christian Lieske. Partial Flushing of Language Models in Robust Natural Language Parsing. Ećole Polytechnique Fe´de´derale Lausanne (EPFL), 1997.

[LT93]

Alon Lavie and Masaru Tomita. GLR*: an Efficient Noise-Skipping Parsing Algorithm for Context Free Grammars. In Third International Workshop on Parsing Technologies, pages 123 – 134, Tilbury/Durbuy, 1993.

[LZ69]

E.T. Lee and L.A. Zadeh. Note on fuzzy languages. Information Science, 1:421–434, 1969.

[Mar98]

Kim Marriot. Programming with Constraints: An Introduction. MIT Press, 1998.

[Men95]

Wolfgang Menzel. Robust Processing of Natural Language. In Ipke Wachsmuth, Claus-Rainer Rollinger, and Wilfried Bauer, editors, KI-95: Advances in Artificial Intelligence, Berlin, 1995. Springer.

[MM91]

David M. Magerman and Mitchell P. Marcus. Pearl: A Probabilistic Chart Parser. In Proceedings of the Fourth DARPA Speech and Natural Language Workshop, Pacific Grove, CA, February 1991. Defense Advanced Research Projects Agency, Morgan Kaufmann. 18

PhD thesis,

[MMP99]

Joachim Niehren Martin Mu¨ller and Andreas Podelski. Ordering constraints over feature trees. Journal of Symbolic Computation, Special Issue on CP’97, Linz, Austria, 1999. to appear.

[Mon73]

R Montague. The proper tratment of quantification in ordinary english. In J Hintikka, editor, Approaches to Natural Language, pages 221–242. Reidel, 1973.

[moz]

The Mozart programming system. http://www.mozart-oz.org/.

[Mus95]

Reinhard Muskens. Meaning and Partiality. European Association for Logic, Language and Information (Folli), Studies in Logic, Language and Computation, 1995.

[MW92]

D. M. Magerman and C. Weir. Efficiency, Robustness and Accuracy in Picky Chart Parsing. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 185 – 192, University of Delaware, 1992. ACL.

[Nas]

Tetsuya Nasukawa. Robust Parsing on Discourse Information: Completing Partial Parses of Ill-formed Sentences on the Basis of Discourse Information.

[NL94]

Sven Naumann and Hagen Langer. Parsing - eine Einfu¨hrung in die maschinelle Analyse der natu¨rlichen Sprache. Leitfa¨den und Monographien zur Informatik. Teubner-Verlag, Stuttgart, 1994.

[NL95]

Sven Naumann and Hagen Langer. Einfu¨hrung in die ku¨nstliche Intelligenz, chapter Parsing natu¨rlicher Sprache, pages 408 – 430. Addison-Wesley, Bonn - Paris - Reading, MA, 1995.

[NPR97]

Joachim Niehren, Manfred Pinkal, and Peter Ruhrberg. A uniform approach to underspecification and parallelism. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics (ACL), pages 410–417, Madrid, Spain, 7–11 July 1997.

[PS87]

Carl Pollard and Ivan A. Sag. Information-Based Syntax and Semantics, volume Volume 1:Fundamentals of CSLI Lecture Notes. University of Chicago Press, Standford University:, 1987.

[PS94]

Carl Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago, 1994.

[PT98]

Vincenzo Pallotta and Franco Turini. Towards a fluent logic programming. Technical Report TR-98-03, Computer Science Department - University of Pisa, 9th March 1998.

[Rab89]

Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2):257 – 286, 1989.

[RB90]

Manny Rayner and Amelie Banks. An implementable semantics for comparative constructions. Computational Linguistics, 16(2):86–112, 1990.

[RBCWA92] G. Russell, A. Ballim, J. Carroll, and S. Warwick-Armstrong. A practical approach to multiple default inheritance for unification-based lexicons. Computational Linguistic, 18(3):311–337, 1992. [RBEWA91] G. Russell, A. Ballim, D. Estival, and S. Warwick-Armstrong. A language for the statement of binary relations over feature structures. In Proceedings of the 5th Conference of the European Chapter of the Association for Computational Linguistics, pages 287–292, 1991. [RCJ+ 98]

T. Ruland, C.J.Rupp, J.Spilker, H.Weber, and K.Worm. Making the most of multiplicity: A multi-parser multi-strategy architecture for the robust processing of spoken language. Report 230, Vermobil, August 1998.

[Rey93]

Uwe Reyle. Dealing with Ambiguities by Underspecification: Construction, Representation and Deduction. Journal of Semantics, 10:123 – 179, 1993.

[Rie96]

Stefan Riezler. Quantitative constraint logic programming for weighted grammar applications. In Logical Aspects of Computational Linguistics (LACL’96), LNCS. Springer, 1996.

[San94]

E. Sandewall. Features and Fluents. Oxford Press, 1994.

[Sar93]

Vijay Saraswat. Concurrent Constraint Programming. ACM Distinguished Dissertation Series. MIT Press, 1993.

19

[SH94]

Ronnie W. Smith and D. Richard Hipp. Spoken Natural Language Dialog Systems - a Practical Approach. Oxford University Press, New York - Oxford, 1994.

[SHKV97]

Frieder Stolzenburg, Stephan Ho¨hne, Ulrich Koch, and Martin Volk. Constraint logic programming for computational linguistics. In Christian Retore´, editor, Selected Papers of the 1st International Conference on Logical Aspects of Computational Linguistics 1996, volume 1328 of LNAI, pages 406 – 425, Nancy, 1997. Springer.

[SJ98]

Lena Stro¨mba¨ck and Arne Jo¨nsson. Robust interpretation for spoken dialogue systems. In Proceedings of Coling-ACL’98, Montreàl, Canada, 1998.

[SM98]

David A. Schneider and Kathleen F. McCoy. Recognizing syntactic errors in the writing of second language learners. In Proceedings of the 17th International Conference on Computational Linguistics. COLINGACL ’98, 1998.

[Ste92]

Manfred Stede. The Search for Robustness in Natural Language Understanding. Artificial Intelligence Review, 6(4):383 – 414, 1992.

[SV92]

T. Strzalkowski and B. Vauthey. Information Retrieval Using Robust Natural Language Processing. In 13th Annual Meeting of the Association for Computational Linguistics, pages 104 – 111, Delaware, 1992.

[Tom91]

Masaru Tomita, editor. Generalized LR Parsing. Kluwer Academic Press, Boston - Dordrecht - London, 1991.

[van95]

Diana van der Ende. Robust Parsing: an Overview. Memoranda Informatica 95-03, University of Twente, 1995.

[vNBKN97] Gertjan van Noord, Gosse Bouma, Rob Koeling, and Mark-Jan Nederhof. Robust grammatical analysis for spoken dialogue. Journal of Natural Language Engineering, 1997. [Wai96]

Alex Waibel. Interactive Translation of Conversational Speech. Computer, 29(7), July 1996.

[War98]

David S. Warren. Programming the ptq grammar in xsb. Technical report, Computer Science Department - University of Stony Brook, 1998.

[WB87]

Y. Wilks and A. Ballim. Multiple agents and the heuristic ascription of belief. In Proceedings of the 10th International Joint Conference on Artificial Inteligence, pages 118–124. Morgan Kaufmann, 1987.

[Wen93]

Fuliang Weng. Handling Syntactic Extra-Grammaticality. In Third International Workshop on Parsing Technologies, pages 319 – 332, Tilbury/Durbuy, 1993.

[WFBH89]

Y. Wilks, D. Farwell, A. Ballim, and R. Hartley. New mexico state university computing research laboratory. In Proc. of the Speech and Natural Language Workshop, pages 193–194, Philadelphia, 1989.

[Wil96]

Yorick Wilks. Natural Language Processing. CACM, 39(1):60 – 62, January 1996. http://www.acm.org/pubs/articles/journals/cacm/1996-39-1/p60-wilks/p60-wilks.pdf.

[Wor98]

Karsten L. Worm. A model for robust processing of spontaneous speech by integrating viable fragments. In Proceedings of COLING/ACL 1998, 1998.

[WS97]

H. Weber and Go¨rz G. Spilker, J. Parsing n best trees from a word lattice. In Advances in Artificial Intelligence. Proceedings of KI-97, number 1303 in LNAI, pages 279 – 288, Freiburg, 9-12 Oct. 1997. Springer.

[WSP+ 93]

Ralph Weischedel, Richard Schwartz, Jeff Palmucci, Marie Meteer, and Lance Ramshaw. Coping with Ambiguity and Unknown Words through Probabilistic Models. Computational Linguistics, 19(2):359 – 382, 1993.

[xsb]

The XSB programming system. http://www.cs.sunysb.edu/ sbprolog/xsb-page.html.

[Zec98]

Klaus Zechner. Automatic construction of frame representations for spontaneous speech in unrestricted domains. In Proceedings of COLING/ACL 98, Montreal, Canada, 1998.

[Zel95]

John M. Zelle. Using Inductive Logic Programming to Automate the Construction of Natural Language Parsers. Technical report ai96-249, Department of Computer Sciences, University of Texas at Austin, 1995. ftp://ftp.cs.utexas.edu/pub/mooney/papers/chill-dissertation-95.ps.Z. 20

[Zip35]

G. K. Zipf. The Psychology of Language. Houghton Mifflin, Boston, 1935.

[ZU96]

Annie Zaenen and Hans Uszkoreit. Language Analysis and Understanding. In Joseph Cole Ronald A., Mariani, Hans Uszkoreit, Annie Zaenen, and Victor Zue, editors, Survey of the State of the Art in Human Language Technology, chapter 3. Cambridge University Press, 1996.

21