Modeling Dictionaries for Composite Access Services ...

1 downloads 0 Views 315KB Size Report
Modeling Dictionaries for. Composite Access Services: A Language Grid Perspective. Yoshihiko Hayashi. Osaka University and NICT. Japanese-Germany ...
Modeling Dictionaries for Composite Access Services: A Language Grid Perspective Language Grid in Tompa characters

Yoshihiko Hayashi Osaka University and NICT Japanese-Germany Workshop on NLP 2006 May 31-June 1, 2006, Tokyo Copyright 2006, Yoshihiko Hayashi

1

Outline of the Talk {

The Language Grid Project z

{

Combined Dictionary Access

An Abstract Dictionary Data Model (Hayashi and Ishida, LREC 2006) z z

z

{

Dictionary and Lexicon Instances The Model { overview { ingredients { a modeling example { derived relations Related work: LMF

Conclusions and Future Work Copyright 2006, Yoshihiko Hayashi

2

Language Grid {

A project carried out at NICT z

z

z

{

Project leader: Prof. Toru Ishida (Kyoto University) One line summary: "Connecting world's language services to support intercultural collaboration" URL: http://langrid.nict.go.jp/

The goal is: z

z

to provide a language infrastructure on the Internet that will be useful in overcoming the language barriers encountered in various inter-cultural communication situations Copyright 2006, Yoshihiko Hayashi

3

Language Grid: Objectives {

What we’d like to provide on the infrastructure is: z

z

{

a way to combine existing language resources/services tailored to the users' needs (horizontal grid) capability to help create and deploy a new language resources/services, particularly emerged from some intercultural activities (vertical grid)

This talk will concentrate on: z

the first one, especially combined dictionary access services Copyright 2006, Yoshihiko Hayashi

4

Virtually Combined Dictionary Access Japanese Monolingual 預金の受入 ・・・ を主たる業務とする金融機関. ”お金を預けに銀行へ行った”, ... Princeton WordNet

銀行 die Bank

{depository financial institution#1, bank#1, banking concern#1, ...}: “a financial institution that accepts deposits and channels the money into lending activities” 提供されたものを蓄積・保管し,求めに応じて提供する組織. ”人材銀行”, ... {bank#3}: “a supply or stock held in reserve for future use” Copyright 2006, Yoshihiko Hayashi

5

1. LR/LS ontology for meta-descriptions 2. Tech. Elements from semantic Web service

Language Grid: Key technologies metadata

Language Grid

wrapper LS: z

composite service metadata for x

service element

wrapper for x language resource:x

service composition & deployment wrapper generation

Copyright 2006, Yoshihiko Hayashi

metadata wrapper LR:y

metadata for z wrapper for z language service:z 6

The Model Overview {

The model is a three-layered model: z

{

word form, word sense and lexical concept

for uniformly modeling z z

MRDs: Machine Readable Dictionaries { monolingual, bilingual CCLs: Computational Concept Lexicons sense concept

form (lemma)

Copyright 2006, Yoshihiko Hayashi

7

Japanese Dictionary (Sanseido Daijirin)

word senses (financial bank, supply/stock)

Copyright 2006, Yoshihiko Hayashi

8

Japanese-to-English Dictionary (Sanseido Exceed) translation "a bank" phrases

compounds

Copyright 2006, Yoshihiko Hayashi

9

English-to-Japanese Dictionary (Sanseido Exceed)

noun

verb

Copyright 2006, Yoshihiko Hayashi

10

English Dictionary (LDOCE ONLINE)

POS:noun

Copyright 2006, Yoshihiko Hayashi

11

English Concept Lexicon (WordNet Online)

Copyright 2006, Yoshihiko Hayashi

12

Japanese/English Concept Lexicon (EDR Dictionaries) Japanese word dictionary •銀行[ギンコウ] / 3bc999 •バンク[バンク] / 3bc999 super: 0ed4e3

bilingual synset

English word dictionary •Bk. / 3bc999 •bnk. / 3bc999 •bank / 3bc999

Concept Identifier

3bc999 in Head-concept dictionary •E-headword: bank •J-headword: 銀行[ギンコウ] •E-explication: a financial institution, called bank •J-explication: 銀行という金融機関 sub: 0e7b82 / 0e828f / 0eaa61 / ... Copyright 2006, Yoshihiko Hayashi

13

Requirements to Dictionary Model {

{

should be able to incorporate as many dictionaries/lexicons as possible should be simple yet effective z

{

{

as far as sufficient for the human uses in the Language Grid environment

should be based on word senses (lexical concepts) should be able to represent secondary (derived) relations z z

discovered on-demand/on-the-fly can be inter-dictionary/cross-lingual Copyright 2006, Yoshihiko Hayashi

14

Schematic Overview of the Model lemma

sense

concept

dotted arrows: derived links

Copyright 2006, Yoshihiko Hayashi

15

Ingredients of the Model {

Nodes: three classes z

z

lemma node: defined by a 3-tuple of: { sense node: MRD





(bilingual)

(monolingual)



CCL z

concept node (for CCLs) {

{

, ,

Links: classified by the source/target node types z z z z

lemma-to-sense (1:n): MRDs and CCLs sense-to-concept (1:1): CCLs concept-to-concept (1:n): CCLs derived relations: discussed later Copyright 2006, Yoshihiko Hayashi

16

A Modeling Example MRD: English-to-Japanese

[e;j]:synset, {e;j}:gloss

EDR Dictionary Bk,/n/-

[;銀行] bnk,/n/-

bank/n/[;貯金箱]

[Bk., bnk., bank; 銀行, バンク] {a financial institution, called bank; 銀行という金融機関}

bank/n/-

[;貯蔵所] 銀行/n/ギンコウ [;コンピュータ・バンク]

Princeton WordNet

MRD:Japanese

銀行/n/ギンコウ

バンク/n/バンク

bank/n/{;預金の受入,資金の貸付 などを主たる業務とする 金融機関}

coin bank/n/-

{;提供されたものを蓄積・ 保管し,求めに応じて供給 する組織} Copyright 2006, Yoshihiko Hayashi

[depository financial institution, bank, banking concern, banking company;] {a financial institution that accepts deposits and channels the money into leading activities;} [saving bank, coin bank, money box, bank;] {a container for keeping money at home;} 17

Derived Relations {

Derived relations should be properly annotated: z z

{

relation label (as in original relations) scores from the computational process { degree of the relation { reliability

An inventory of relation labels should be defined: z

semantic equivalence relations { in

EuroWordNet: eq_synonym, eq_near_synonym, ... { problem: lexical gaps z

z

MWE: Multi-Word Expressions

other (lemma-to-lemma) relations: form-variant, see-also Copyright 2006, Yoshihiko Hayashi

18

Related Work: LMF { {

ISO 24613 (ISO TC37/SC 4) LMF Models z z

{

Core model: Extensions: MRD, NLP systems

Comparison z

Objective {

{

z

Basic model structure { {

z

LMF: integrating lexical resources for NLP applications Our model: representing and combining "on-thenet" lexical resources for human uses LMF core: two-layered model (form, sense) Our model: three-layered model (lemma, sense (MRDs), concept (CCLs))

... the degree of detailedness Copyright 2006, Yoshihiko Hayashi

19

Conclusions {

{

We proposed an abstract dictionary data model for uniformly modeling MRDs and CCLs in the context of the Language Grid. The model is primarily based on the Princeton WordNet model with some extensions to incorporate MRDs and derived inter-dictionary relations. z z

{

nodes (lemma/sense/concept) links: intra-dictionary, inter-dictionary (derived)

A modeling example with MRDs, Princeton WordNet, and EDR dictionaries was shown. Copyright 2006, Yoshihiko Hayashi

20

Current Status and Future Work {

hand-created wrappers for a number of dictionary/lexicons z

{

detail the model by considering more dictionaries/lexicons z

{

while, at least, avoiding conflicts with the ongoing standardization efforts, such as LMF

develop and deploy a wrapper generation tool, that might be based upon: z z

{

providing a uniform API that is based on the model

PBD: Programming By Demonstration Scheme-guided wrapper generation

develop/incorporate a variety of linguistic processes for sense/concept-based entry linking z

mechanism for perpetuation in the Language Grid Copyright 2006, Yoshihiko Hayashi

21

Thank you for your attention! {

{

We will welcome your participation to the project! References z z

http://langrid.nict.go.jp/ Hayashi, Y. and Ishida, T. (2006). A Dictionary Model for Unifying Machine Readable Dictionaries and Computational Concept Lexicons. LREC 2006, pp.1-6.

Copyright 2006, Yoshihiko Hayashi

22

Copyright 2006, Yoshihiko Hayashi

23

Wrapper generation Dictionary model as Schema lemma

Example entries from the target dictionary

•written form •*part of speech •+pointer to sense nodes

sense

•definition •example •...

dictionary data Wrapper Wrapper generator generator

already learned wrappers

Annotation by using GUI

This is the definition part, and ...

Programming By Demonstration

Wrapper program (for the target dictionary) Copyright 2006, Yoshihiko Hayashi

24