A Combined Method Based on Stochastic and ... - Semantic Scholar

5 downloads 108 Views 2MB Size Report
Based on a dual approach, the system adapts the idea of stochastic approach to the probabilistic context ... our combined approach can stand a comparison with concept spotters on larger application .... I want to reserve a ticket to Korbos city.
A Combined Method Based on Stochastic and Linguistic Paradigm for the Understanding of Arabic Spontaneous Utterances Chahira Lhioui, Anis Zouaghi, and Mounir Zrigui ISIM of Medenine, Gabes University, Road Djerba, 4100 Medenine Tunisia, ISSAT of Sousse, Sousse University, Taffala city (Ibn Khaldoun), 4003 Sousse, FSM of Monastir, Monastir University, Avenue of the environnement 5019 Monastir, LATICE Laboratory, ESSTT Tunis, Tunisia [email protected], [email protected], [email protected]

Abstract. ASTI is an Arabic-speaking spoken language understanding (SLU) system which carries out two kinds of analysis which are relatively opposed. It is designed for touristic field to tell trippers about something that interests them. Based on a dual approach, the system adapts the idea of stochastic approach to the probabilistic context free grammar (PCFG) (approach based on rules). This paper provides a detailed description of ASTI system as well as well as results compared with several international ones. The observed error rates suggest that our combined approach can stand a comparison with concept spotters on larger application domains. Keywords: Hidden Markov Model (HMM), Probabilistic Grammar free Context PCFG, corpus, Wizard of Oz.

1

Introduction

This work is part of Arabic automatic spoken understanding language (SLU) and in the context of highly spontaneous speech and human-machine (HM) communication relatively opened (travel’s field information). In fact, the automatic SLU is an essential step in Oral Dialogue Systems. It consists of extracting the meaning of utterances that are in the most of time ambiguous and uncertain. Hence a certain level of robustness in the analysis of utterances is needed to overcome spontaneous difficulties of oral communication. Most SLU systems follow one of the two main approaches (not mutually exclusive): a rule-based approach [6] or probabilistic approach [8]. The rule-based parsing requires a grateful work to analyze corpus by experts in order to extract concept spotting and their predicates. This method is limited to specific fields using restrictive language. Thus, it leads to many difficulties like portability and extension. However, this formal approach is rapid due to ATN and RTN [2] encouraging precision when it is a limited language and where the words used in the utterances are known as in the case of systems implementing the guided A. Gelbukh (Ed.): CICLing 2013, Part II, LNCS 7817, pp. 549–558, 2013. © Springer-Verlag Berlin Heidelberg 2013

550

C. Lhioui, A. Zouaghi, and M. Zrigui

dialogue strategy. Semantic voice interaction analyzer systems, in case of ATIS, MASK, and RAILTEL ARISE [6] developed at LIMSI-CNRS and implement the grammar uses cases as rules, are a good example of rule-based parsers. Stochastic analysis has many encouraging points among them such as: the decrease and the acceleration of experts work due to the training techniques. This reduces development time. However, this approach suffers from noticeable limitations. Indeed, because of the number of parameters, this method has difficulties in estimating small probabilities accurately from limited amounts of training data. In addition, it doesn’t support infrequently phenomena unlike the rule-based approach. Thus, the choice of technology depends much more on the considered application. It is common in the automatic processing of natural language (NLP), to oppose two approaches. From an historical perspective, the first is based on formal rules, grammars constructed by linguistic experts. The second (currently the most used in the speech recognition) is based on n-grams models. The gap between these two schools is diminishing. Several trends tried to combine the best of both approaches. This is the case for example of [4], [5] or [9]. It is in this context that this hybrid language model, based on the integration of linguistic rules (local grammars) in a statistical model, is proposed. As an application, the case of an Interactive Voice Server services for travel information and hotel reservation was chosen. The aim of this server is to enable tourists to communicate with the machine via the standard Arabic spontaneous word, to get information about a city staying (restaurants, hotels, location houses etc.), a route, a touristic event, or a price constraint or date, etc. Note that there are no voice server is able to communicate with tourists in Arabic in the field of tourism.

2

Problems of Parsing Spontaneous Speech

Currently, formal languages are a good mean of communication between Human and machine. However, they have significant differences compared to natural human language. As a result, many researchers are working to reduce these differences. The oral NLP is a field of multidisciplinary researches involving electronics, computer sciences, artificial intelligence, linguistics, cognitive sciences, etc. However, techniques have been developed for understanding written language but do not adapt well to oral problems. This is due to: -

Intrinsic characteristics of spontaneous speech: ellipses, anaphora, hesitations, repetitions, and repairs. Here is an example of a tourist who hesitates, does apologize and repairs his utterance: ‫ ﺑﻴﻦ اﻟﺴﺎﻋﺔ‬17 ‫ﺁﻩ أﺣﺐ اﻟﺬهﺎب إذا آﺎن ﻣﻤﻜﻨﺎ ﺑﻌﺪ ﻋﻔﻮا ﻗﺒﻞ اﻟﺴﺎﻋﺔ‬ ‫ إﻟﻰ ﺗﻮﻧﺲ‬17‫ و‬16

(Euh, I would like to go if it’s possible after sorry before 17 O’clock between 16 and 17 O’clock to Tunis)

A Combined Method Based on Stochastic and Linguistic Paradigm

-

551

Errors related to the non language fluency (‫أرﻳﺪ ﺗﺮﺳﻴﻢ )ﻣﺘﻌﺪ ﻓﻌﻞ‬

(I want to register (transitive verb)) Oral speech is characterized by ungrammatical appearances. Therefore, face to the problem of ungrammaticality and oral language, it would be absurd to reject a false syntactically utterances because the goal is not to check the conformity of user’s utterances to syntactic rules but to extract rather the semantic content. In fact, to master a language (natural language, a programming language, etc.), both syntax and semantic should have to be controlled in order to don’t cause understanding problems. Hence, semantic level is important to control meaning of utterances. Indeed, a syntactic grammar leads to some rigidity at the analysis of a sentence. Thus, it will reject such a word or phrase does not belong to the language (i.e. you cannot produce the grammar defining this language).

3

Formal Grammars vs. Stochastic Language Models

The two main approaches to automatic NLP, which are rule-based approach and the stochastic approach, have different qualities and limitations. In this section, similarities and differences between both of the mentioned approaches is presented. 3.1

Coverage

Grammars, as complete as they are, do not describe a natural language in its entirety. This aspect is even more pronounced for spoken language processing as many grammatically incorrect phrasing can be used in an oral conversation. Stochastic models do not have this coverage problem: they accept all the sentences of a language. Even incorrect sentences are accepted. Stochastic models are more permissive than formal grammars, which is useful for processing spontaneous speech despite the acceptance of erroneous recognition hypotheses. 3.2

Construction

In terms of construction, both of the mentioned approaches are very different: the formal approach is based on language expertise that is to say on the linguists’ skills. The stochastic approach is, in principle, completely automated. However, it should be noted that the amount of necessary corpus to train a robust stochastic model language is not always available. The qualities and weaknesses of these approaches seem to be complementary: this observation is the starting point of this work which aims at combining the formal and stochastic approach.

552

4

C. Lhioui, A. Zouaghi, and M. Zrigui

The Used Methodology

To understand the problem of understanding Arab oral utterances recognized by the Automatic Recognition System (ASR), a hybrid method combining the syntactic and the stochastic is proposed. The decision taken for a combination of these two approaches is guided by combining the fruits of both approaches to improve further the performance of systems for automatic NLP. 4.1

Architecture of the Hybrid Model

The principle of this work is to design a system based on a stochastic method for determining the meaning of user’s queries in a syntactic context. This new approach was evaluated in the tourism domains relatively opened. The problem of extracting meaning is solved in steps. Like any stochastic method, semantic analysis is performed in two basic steps which are training and decoding. Both techniques are applied in most semantic parsers and are quite similar [7]. Before training and decoding steps, two other interesting ones are required. The two steps are preprocessing and syntactic parsing. It is in these modules that the difference between analyzers exists. 4.2

Principle of the Used Method

Fig. 1 below illustrates the architecture of the system proposed of an automatic understanding of the Arabic spontaneous utterances. Learning or Training The estimation of parameters is to establish a Hidden Markov Model (HMM) (Fig. 2) if a pretreated and transcribed sequence of words (this words are obviously the output of recognition module) and their annotated corresponding sequences was taken. These sequences were generated during the annotation process data. Decoding The decoding step provides the most likely sequence when a test query is taken. Parsing Noting that a detailed parsing becomes essential for the proper treatment of utterances, including certain phenomena, such as ellipses. It relies on the use of a rule base: context free grammar augmented with probabilities associated to the rules (see at section 4.3). These grammars are a refinement of formal grammars.

A Combined Method Based on Stochastic and Linguistic Paradigm

553

Sequences of words Training Annotated sequences Pretreatment

Sequences of words

Pretreated sequences

Pretreatment

Parameter estimation

Stochastic model

Syntactic Parsing

Parsed Utterances

Decoding step

Decoded sequence Extraction of informations

Generation of schemas

Database response

Fig. 1. Overall architecture of the hybrid model

Each rule for producing a probabilistic grammar is associated to a probability. This additional information aims at reducing the syntactic ambiguities that may arise during parsing sentence. The advantage of this statistical information increases with the number of production rules which constitute the whole grammar. The probability of branch (that is to say the application of a sequence of production rules ri) can be written as follows: P(S

r ,r ,…,r 1 2 n



x) = P(r1)P(r2)…P(rn)

Probabilistic grammars are an extension of formal grammars. Their construction is done in two phases. Firstly, a set of production rules had to be retained, as in a formal grammar. From a corpus containing sentences already parsed, the simplest approach to calculate probabilities of occurrence of rewrite rules is to count the number of times of each used rule. The probability of applying a grammar’s rule type A→α may be denoted by P(A→α |G) or P(r|G). The following example provides a context-free grammar for the following sentence using successive derivations of production rules. ‫أرﻳﺪ ﺣﺠﺰ ﺗﺬآﺮة إﻟﻰ ﻣﺪﻳﻨﺔ ﻗﺮﺑﺺ‬ I want to reserve a ticket to Korbos city The grammar generated by this sentence is as follows:

554

C. Lhioui, A. Zouaghi, and M. Zrigui

G: S→ GN GV COMP S→ GV COMP GN→ pronoun | ε COMP → GNominal GNominal → prep GNominal | noun GNominal | noun GV → vloc verb pronoun → I (‫ | )أ‬ε noun → ‫( ﻣﺪﻳﻨﺔ‬city) | ‫( ﻗﺮﺑﺺ‬korbos) | ‫( ﺗﺬآﺮة‬ticket) prep → ‫(إﻟﻰ‬to) vloc → ‫( أرﻳﺪ‬want) verb → ‫(ﺣﺠﺰ‬to reserve)

Fig. 2. View of a 1-level HMM modeling

Syntactic and Semantic Annotation The purpose of this step consists on associating each word in a sentence to a grammatical label (or tag), depending on the context, as ADJ (adjective) NOMP (Proper Name) NOMC (Common Name ), DET (determinant)…. For example: ‫أرﻳﺪ اﻟﺬهﺎب إﻟﻰ ﺳﻮﺳﺔ‬ I want to go to Sousse This will be easier if: -

First of all, an automatic reduction step to canonical form of words can be used (‫( اﻟﺬهﺎب‬edhaha:ba) to ‫( ذهﺐ‬dhahaba)); Second, information is available in the dictionary

A semantic information which is useful for decoding later like (DC: destination_city, TD: departure_time...) is added. This step can be automated through Brill’s tagger based rules [13]. Pretreatment of Transcribed Utterances An oral statement is inherently rigid and difficult to control. This is mainly due to the spontaneous nature of the statement that contains various types of dysfluency

A Combined Method Based on Stochastic and Linguistic Paradigm

555

(i.e. repetitions, hesitations, self-corrections, etc), which are frequently phenomena of spontaneous speech. Here is an example of hesitation and self correction statement: ‫هﻞ ﻳﻮﺟﺪ ﻣﻄﻌﻢ ﺧﺎص ﺑﺎﻟﻜﺒﺎب هﻨﺎ ﺁﻩ ﺑﺎﻟﺒﻴﺘﺰا ﻋﻔﻮا‬ Is there a restaurant special kabab here, ah pizza sorry? These phenomena lead to ambiguities that can produce analysis errors. The pretreatment step is required to facilitate the processing of utterances transcribed by the step below. This step removes duplication and unnecessary information, to convert numbers written in all letters, and to determine the canonical forms [2] of words. To achieve this, the statement undergoes standardization [10], a mo pho-lexical parsing and repetition processing [1]. 4.3

PCFG and Probabilistic Grammar

A grammar rich enough to accommodate natural language, including rare and sometimes even ‘ungrammatical’ constructions, fails to distinguish natural from unnatural interpretations. But a grammar sufficiently restricted so as to exclude what is unnatural fails to accommodate the scope of real language. These observations led to a growing interest in probabilistic approaches to natural language. Obviously natural language is rich and diverse, broad coverage is desirable, and not easily held to a small set of rules. But it is hard to achieve broad coverage without massive ambiguity (a sentence may have tens of thousands of parses), and this of course complicates applications like language interpretation, language translation, speech recognition and speech understanding. This is the dilemma of coverage that we referred at section 3.1, and it sets up a compelling role for probabilistic and statistical methods. A probabilistic context-free grammar (PCFG; also called stochastic context-free grammar, SCFG) is a Context-Free Grammar. The key idea in the PCFG is to extend a context free grammar (CFG) definition to give a probability distribution over possible derivations. That is, we will find a way to define a distribution over parse derivations. For example:

‫أﻧﺎ ﻻ أرﻳﺪ أن أﺳﺎ ﻓﺮ‬ 1.0 0.5 0.3 0.6

S → PV PN PN → ‫أﻧﺎ‬ vloc → ‫ﻻ أرﻳﺪ أن‬ verb → ‫أﺳﺎ ﻓﺮ‬

‫ﻧﺤﻦ ﻧﺮﻳﺪ أن ﻧﺤﺠﺰ‬ 1.0 0.5 0.7 0.4

PV → vloc verb PN → ‫ﻧﺤﻦ‬ vloc → ‫ﻧﺮﻳﺪ أن‬ verb → ‫ﻧﺤﺠﺰ‬

556

C. Lhioui, A. Zouaghi, and M. Zrigui

The probabilistic context-free grammar is formally defined as follows: 1. 2.

A context-free grammar G (N, ∑, S, R) having rules of the form, A → α , α + (N ∑) A parameter q(α→β), for each rule α→β R. The parameter q(α→β) can be interpreted as the conditional probability of choosing rule α→β in a rightmost derivation, given that the non terminal being expanded is α. For any X N , we have the constraint

q α :

β .

X

Having defined PCFGs, we derive a PCFG from a corpus. We will assume a set of training data, which is simply a set of parse derivations. The maximum-likelihood [12] parameter estimates are:

q α

β

count α β . count α

Where count(α→β) (resp. count(α) ) is the number of times that the rule α→β (resp. the non terminal α) is seen in corpus training derivations. The EM algorithm can also estimate PCFGs from a corpus of utterances.

5

Corpus Establishments

This used corpus is dedicated to the study of touristic applications accessing to databases. It is composed of an Arabic spontaneous dialogues stemming from the simulation of tourist information server and hotel reservations. Dialogues aimed at booking one or more rooms in one or more hotels are performed in the context of organizing a weekend, holiday or business trips. Thus, the dialogue may be about different themes: choice of living city, finding a route or a tourist event, a satisfaction of a price or date constraint. The system had to provide information on transportation as well as hotels, restaurants, shops and cinemas around hotels, museums and monuments, the services enjoying the tourist, cities of staying, tourist events and staying days. Indeed, a tourist can learn about the following details: -

Hotels (price, address, services, classes, path), Restaurants (price, address, benefits, types, path), Monuments (address, opening hours, description, path), Museums (address, hours, prices, description, path), Stores (address, hours, prices, description, path), Cinemas (address, hours, prices, description, path), Services (information, coffe_wifi, pharmacie, gym, ...),

A Combined Method Based on Stochastic and Linguistic Paradigm

5.1

557

Stay_city (location, transportation, reservation, route), Touristic_event (tour, festival ...), Period (weekend, holiday, business trip). Collection of Corpus

This corpus was collected by asking ten different people to make written utterances relating to tourist information, using the method of the Wizard of Oz. The following table provides information about the complexity of this task. Table 1. Statistics from touristic corpus. Complexity indices Number of utterance Number of speakers Queries types

6

Value 140 10 14

Tests and Results

Some languages such as English, French, and German have platforms for evaluation understanding modules of dialogue systems. These platforms give to the community a large set of corpus of real annotated dialogues. However, this is not the case for the Arabic language where these resources are absents, with the exception of a few corpus distributed by ELDA/ELRA [1]. Thus, a proper evaluation corpus using the same technique of Wizard of Oz used to build test corpus have to be built. The evaluation of corpus involves 100 queries of different types (negation, affirmation, interrogation and acceptance), uttered spontaneously and manually transcribed. These requests correspond to scenarios dealing with information on the tourism fields. These scenarios are inspired from corpus MEDIA [3] and try to cover the input space The evaluation of the understanding module, with this evaluation corpus showed that this system generates 20 errors (average one error by 5 items). Measures of recall, precision and Fmeasure are respectively 70.00%, 71.00% and 73.79% and the average time to execute an utterance of 12 words is 0.279 seconds. Comparing these results with results obtained by other understanding modules [6], ASTI system has provided fewer errors than many official sites such as UNISYS and MITRE. Table 2. Comparison of ASTI system results with official sites

AT&T CMU %ERROR

3.8

3.8

BBN

UNISYS

MYTRE

ASTI

9.4

23.6

30.6

20

In fact, as it is shown in Table 2, the error response rate, obtained by the 445 transcript requests, was reached 20% in the case of ASTI system which is less than CMU-PHOENIX system.

558

7

C. Lhioui, A. Zouaghi, and M. Zrigui

Conclusion and Perspectives

When the ASTI system was implemented, one of the supervised objectives was to achieve robust parsing of spontaneous spoken Arabic language while making the application domain much wider than is currently done. Syntactic formalisms are not usually viewed as efficient tools for pragmatic applications. That’s the two interesting approaches (syntactic and stochastic) are combined. Another objective was to have a rather generic system, despite the use of a domain-based syntactic knowledge. This constraint is fulfilled through the definition of generic rules as well as their probabilities training the HMM model which makes it possible to estimate efficiently its parameters. The performances of ASTI show that a combination of the two divergent approaches can bear comparison with international system.

References 1. Bahou, Y., Belguith, H.L., Ben Hamadou, A.: Towards a Human-Machine Spoken Dialogue in Arabic. In: 6th Language Resources and Evaluation Conference (LREC 2008),Workshop HLT Within the Arabic World. Arabic Language and Local Languages Processing Status Updates and Prospects, Marrakech, Morocco (2008b) 2. Hadrich Belguith, L., Bahou, Y., Ben Hamadou, A.: Une méthode guidée par la sémantique pour la compréhension automatique des énoncés oraux arabes. International Journal of Information Sciences for Decision Making (ISDM) 35 (Septembre 2009) 3. Bonneau Maynard, H., Rosset, S., Ayache, C., Kuhn, A., Mostefa, D.: Semantic Annotation of the French Media Dialog Corpus. In: 9th European Conference on Speech Communication and Technology (2005) 4. Chelba, Jelinek, F.: Structured language modeling. Computer, Speech and Language 14(4), 283–332 (2000) 5. El-Bèze, Sprit, T.: Stratégie mixte d’étiquetage syntaxique : Statistiques et connaissances Revue TAL 36(1-2), 47–66 (1995) 6. Minker, W.: Compréhension Automatique de la Parole Spontanée. L’Harmattan (1999) 7. Minker, W., Bennacef, S.: Speech And Human-Machine Dialog. Kluwer Academic Publishers Group, Pays-Bas (2004) 8. Riccardi, Gorin, A.L.: Stochastic language adaptation over time and state in natural spoken dialogue systems. IEEE Transactions on Speech and Audio Processing 8(1), 3–10 (2000) 9. Sallomaa: Probabilistic and weighted grammars. Information and Control, 15, 529–544 (1969) 10. Zouaghi, A., Zrigui, M., Antoniadis, G.: Compréhension Automatique de la Parole Arabe Spontanée. Une Modélisation Numérique, Traitement Automatique des Langues (TAL 2008) 49(1), 141–166 (2008) 11. Zouaghi, A., Zrigui, M., Ben Ahmed, M.: Un Étiqueteur Sémantique des Énoncés en Langue Arabe. In: Actes de la 12ème Conférence sur le Traitement Automatique des Langues Naturelles (TALNRECITAL 2005), Dourdan, France (2005) 12. Collins, M.: Probabilistic Context-Free Grammars (PCFGs), http://www.cs.columbia.edu/ 13. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of speech Tagging. Computational Linguistics 21, 4 (1995b)

Suggest Documents