Extracting Semantic Information through Automatic Learning ... - UPV

13 downloads 0 Views 186KB Size Report
ع4=ciudad destino. ظ4= Barcelona. ع4=destination city. The semantic sentence v for the semantic language model training is : consulta marcador ...
Extracting Semantic Information through Automatic Learning Techniques  E. Segarra, E. Sanchis, M. Galiano, F. Garc a, L. Hurtado

U. Politecnica de Valencia. Departamento de Sistemas Informaticos y Computacion Camino de Vera s/n, 46022 Valencia

Abstract

In this work we present an approach for the development of Language Understanding systems from a Transduction point of view. In particular, we describe the use of two types of automatically inferred transducers as the appropriate models for the understanding phase in dialog systems. The appropriateness of these approaches will be discussed on the basis of a preliminary evaluation over a dialog system, which answers queries about a railway timetable by telephone in Spanish.

Keywords : Language Understanding, Language Models, Grammatical and Trasducer Inference, Sequential Transduction. 1

Introduction

The development of understanding systems is an issue of growing interest, whether the input is spontaneous speech or is natural language text. In several areas of language processing, such as spoken dialog, information retrieval and translation systems, the incorporation of an understanding component is necessary. Systems of this type must be able to deal with ungrammatical text input or with spontaneous speech input. In order to model the language understanding (LU) process, many works in the literature use rule-based techniques; however, over the last years, statistical models have found their way into the LU modelling problem, mainly in the domain of database information retrieval. The BNN-HUM [6], the AT&T-CHRONUS [3] and the LIMSI-ARISE [4], are some examples of the use of Hidden Markov Models and N-gram models to stocastically model the LU process. There are also other stochastic approaches based on grammatical inference techniques [5][7].  Work partially funded by CICYT under project TIC98-0423-C06

A general de nition of a LU system is that of a machine accepting strings of words as input and producing sentences from a certain semantic language that speci es the actions to be performed. Under such a point of view, Language Understanding is a process of Transduction. In order to implement this process, inference techniques can be used to automatically learn the required transducers from a training set of input-output examples. In this work, we describe the application of two types of automatically inferred transducers to an understanding task in the framework of semantically restricted spoken dialog systems. In particular, we describe our approach to implement the understanding component of a spoken dialog system, which answers queries about railway timetable by telephone in Spanish [1]. Thus, following the most common architecture of a spoken dialog system, our understanding component will accept sequences of words as input and will output the corresponding "frames", which are also the usual way of representing the semantics of a dialog system [8].

2

A transduction approach to language understanding

As we previously mentioned, any LU system can be viewed as a transducer. From this point of view, the main problem is how to learn the "required" transducer from a training set of input-output examples. The transduction process that we present has been divided into two phases: the rst phase transduces the input sentence in a semantic sentence de ned on a sequential intermediate semantic language, and the second phase transduces the semantic sentence in its corresponding frame. Automatic learning techniques have been applied in the rst phase, and the second phase is made by a simple rule-based system. One advantage of this approach is that the semantic sentences of the intermediate semantic language are sequential with the input sentences, allowing for a sequential transduction. When the semantic language is sequential with the input language, we can perform a segmentation of the input sentence into a number of intervals which is equal to the number of semantic units in the corresponding semantic sentence. That is, let W be the vocabulary of the task (set of words), and let V be the alphabet of semantic units; the training set is a set of pairs (u,v) where: v = v v : : : vn ; vi 2 V; i = 1; : : : ; n u = u u : : : un ; ui = wi1 wi2 : : : wijui j ; wij 1

1

2

2

2 W;

i = 1; : : : ; n;

j = 1; : : : ; jui j

Each input sentence in W  has a pair (u,v) associated to it, where v is a sequence of semantic units and u is a sequence of segments of words.

Example of training pairs:

u1 u2 u3 u4

Spanish = me podr a decir = los horarios de trenes =para =Barcelona

Input pair (u,v )=(u1 u2 u3 u4 ; v1 v2 v3 v4 ) where:

v1 v2 < v3 v4

=consulta =

hora salida

>

=marcador destino =ciudad destino

u1 u2 u3 u4

English = can you tell me = the railway timetable = to = Barcelona

v1 v2 < v3 v4

= query =

departure time

>

=destination marker =destination city

The semantic sentence v for the semantic language model training is :

consulta marcador destino ciudad destino (query destination marker destination city)

When a training set of that type is available, the problem of learning the sequential transduction can be solved applying di erent approaches. 2.1

The Two-Level approach

A rst approach consists of learning two types of models from a training set of pairs (u,v): a model for the semantic language Ls  V  , and a set of models, one for each semantic unit vi 2 V . The regular model As (a stochastic nite state automaton) for the semantic language Ls is estimated from the semantic sentences v 2 V  of the training sample. The regular model Avi (a stochastic nite state automaton) for each semantic unit vi 2 V is estimated from the set of segments ui of the training sample associated with that semantic unit vi . These estimations are made through automatic learning techniques. The nal model At is obtained through the application of a regular substitution  to the semantic language Ls. Let  : V  ! P (W  ) a regular substitution such that 8vi 2 V (vi ) = L(Avi ). The regular model At is such that L(At ) = (L(As )) = (Ls ). This substitution  converts each terminal symbol vi 2 V (semantic unit) of the regular model As into the corresponding regular model Avi . One of the advantages of this approach is that we can choose the most appropiate learning technique to estimate each model (the semantic model and the semantic unit models). The only restriction is to represent these models in terms of a stochastic nite state automaton. In this work, we explored two possibilities for the estimation of such models. In both of them, a classical bigram model was estimated for the semantic model, As . In the rst one, the models for the semantic units, Avi , were estimated as classical bigrams, and, in the second one, they were estimated as stochastic nite state automata, which are automatically learned by a Grammatical Inference algorithm based on Error Correcting (ECGI) [5]. This last algorithm has been success-

fully applied in automatic learning of languages models for other speech recognition/understanding tasks. Finally, the obtained model At is used to analyze a test sentence w = w w : : : wjwj. The analysis is based on a Viterbi scheme. Let (qv1 ; qv1 ; : : : ; qv1 l ; qv2 ; qv2 ; : : :; qv2 l ; : : : ; qvn ; qvn ; : : : ; qvn ln ) be the sequence of states of the di erent models Av1 ;Av2 ; : : : ; Avn , which are associated to the maximum probability path in the analysis (notice that l + l + : : : + ln = jwj). The output sequence (the translation) is made by the concatenation of the semantic units vi 2 V , which are associated to the states in this path: vl1 vl2 : : : vnln . This output sequence gives us the translation v = v v : : : vn and the corresponding segmentation of the input sentence w, that is, w = u u : : : un where jui j = li ; i = 1; : : : ; n. 1

1

1

2

1

1

2

2

1

2

1

1

2

2

2

2

1

2

Example of transduction:

Input sentence (9 words): me podr a decir los horarios de trenes para Barcelona

Output sentence (9 semantic units): consulta




hora salida


>


query




>

depart time

depart time

destina-

tion marker destination city

marcador destino

dad destino

ciu-




depart time

destination marker destina-

tion city

Segmentation: me podr a decir: consulta los horarios de trenes:




hora salida

can you tell me: query the railway timetable:




The MGGI approach

The Morphic Generator Grammatical Inference (MGGI) methodology [2] [7] is a grammatical inference technique that allows us to obtain a certain variety of regular languages. The application of this methodology implies the de nition of a renaming function, that is, each symbol of each input sample is renamed following a given function g. Di erent de nitions of the function g will produce di erent models (stochastic regular automata). An extension of the MGGI methodology to the regular sequential transducer inference problem has been made [9]. In this extension, the renaming function g is de ned in such a way that the input symbols are renamed with the corresponding symbols vi 2 V , which are associated to them by the implicit segmentation of the sample (u,v). The analysis process of an input sample w is made through a Viterbi scheme.

Each state in the maximum probability path has a semantic unit from the renaming process associated to it. The output sequence is the concatenation of the semantic units which are associated to the states in this path: vl1 vl2 : : : vnln . That is, the inferred transducer can be seen as a Moore machine, where the output function assigns a semantic symbol to each state. This output sequence gives us the translation v = v v : : : vn and the corresponding segmentation of the input sentence w, that is, w = u u : : : un where jui j = li ; i = 1; : : : ; n. 1

1

1

3

2

2

2

Experimental results

The language understanding models obtained are applied to an understanding task which is integrated into a spoken dialog system answering queries about a railway timetable by telephone in Spanish [1]. From the orthographic transcription of a set of 215 dialogs, obtained through a Wizard of Oz technique, and using only the user utterances, we de ned a training set of 175 dialogs with 1,141 user utterances, and a test set of 40 dialogs with 268 user utterances. The number of words in these two sets was 11,987 and the medium length of the utterances was 10.5 words. We de ned three measures to evaluate the accuracy of the models. The rst one was the percentage of correct sequences of semantic units (%cssu), the second one was the percentage of correct semantic units (%csu), and third one was the percentage of correct frames (%cf). With these three measures we evaluated the segmentation accuracy and the correct interpretation of the user utterances. In Table 1, we show the experimental results with three approaches: the MGGI approach, the Two-Level approach with bigram models in the two levels (BIGRBIGR), and the Two-Level approach with a bigram model for the semantic language and ECGI models for semantic unit languages (BIGR-ECGI). MGGI

BIGR-BIGR

BIGR-ECGI

% cssu

64.6

66.9

65.4

% csu

80.7

84.7

84.6

% cf

75.0

76.9

75.0

Table 1: Experimental results From Table 1 it can be observed that there is a big di erence between the %cssu and the %cf measures. This di erence is due to the fact that, although the obtained semantic sentence is not the same as the reference sentence, their corresponding frame is the same. It can also be observed that the Two-Level approach gives a slightly better performance than the one-level MGGI approach. This small di erence

can be explained by taking into account that one-level MGGI transduction does not take advantage of the training samples as well as the Two-Level transduction.

4

Conclusions

In view of the obtained results, the transduction framework seems to be quite appropiate for the development of language understanding systems. The Two-Level approach give us better results than the one-level MGGI approach; however, we can improve the performance of the latter by using other renaming functions. In the framework of dialog systems, we hope increase the performance of the understanding component using the dialog act information.

References [1] A. Bonafonte, P. Aibar, N. Castell, E. Lleida, J.B. Mari~no, E. Sanchis and M.I. Torres. \Desarrollo de un sistema de dialogo oral en dominios restringidos", I Jornadas en Tecnologa del Habla, Sevilla, 2000. [2] P. Garca, E. Segarra, E. Vidal, and I. Galiano. \On the use of the morphic generator grammatical inference (MGGI) methodology in automatic speech recognition". International Journal of Pattern Recognition and Arti cial Intelligence, 4(4):667-685, 1990. [3] E. Levin and R. Pieraccini."Concept-Based Spontaneous Speech Understanding System". Proc.of EUROSPEECH '95, 555{558, 1995. [4] W. Minker. \Stocastically-Based Semantic Analysis for ARISE - Automatic Railway Information Systems for Europe". Grammars, 2(2): 127{147, 1999. [5] E.Sanchis, N.Prieto, and J.Bernat. \A decoupled bottom-up continuous speech understanding system directed by semantics". In International Workshop Speech and Computer (San Petersburgo, Rusia) Proc., 12{15, 1996. [6] R. Schwartz, S. Miller, D. Stallard and J. Makhoul. \Language understanding using hidden understanding models". ICSLP :997{1000, 1996. [7] E. Segarra and L. Hurtado. \Construction of Language Models using Mor c Generator Grammatical Inference MGGI Methodology", Eurospeech,5: 2695{2698, 1997. [8] E. Segarra, V. Arranza, N. Castell, I. Galiano, F. Garca, A. Molina, E. Sanchis. \Representacion Semantica de la Tarea". Internal Report. UPV DSIC-II/5/00, Marzo 2000. [9] E. Vidal, P. Garca, and E. Segarra. Inductive Learning of Finite-State Transducers for the Interpretation of Unidimensional Objects. Structural Pattern Analysis. R.Mohr, T.Pavlidis, A.Santafeliu (eds.) pp 17-35. World Scienti c, 1990.

Suggest Documents