Modeling Dictionaries for. Composite Access Services: A Language Grid Perspective. Yoshihiko Hayashi. Osaka University and NICT. Japanese-Germany ...
Modeling Dictionaries for Composite Access Services: A Language Grid Perspective Language Grid in Tompa characters
Yoshihiko Hayashi Osaka University and NICT Japanese-Germany Workshop on NLP 2006 May 31-June 1, 2006, Tokyo Copyright 2006, Yoshihiko Hayashi
1
Outline of the Talk {
The Language Grid Project z
{
Combined Dictionary Access
An Abstract Dictionary Data Model (Hayashi and Ishida, LREC 2006) z z
z
{
Dictionary and Lexicon Instances The Model { overview { ingredients { a modeling example { derived relations Related work: LMF
Conclusions and Future Work Copyright 2006, Yoshihiko Hayashi
2
Language Grid {
A project carried out at NICT z
z
z
{
Project leader: Prof. Toru Ishida (Kyoto University) One line summary: "Connecting world's language services to support intercultural collaboration" URL: http://langrid.nict.go.jp/
The goal is: z
z
to provide a language infrastructure on the Internet that will be useful in overcoming the language barriers encountered in various inter-cultural communication situations Copyright 2006, Yoshihiko Hayashi
3
Language Grid: Objectives {
What we’d like to provide on the infrastructure is: z
z
{
a way to combine existing language resources/services tailored to the users' needs (horizontal grid) capability to help create and deploy a new language resources/services, particularly emerged from some intercultural activities (vertical grid)
This talk will concentrate on: z
the first one, especially combined dictionary access services Copyright 2006, Yoshihiko Hayashi
4
Virtually Combined Dictionary Access Japanese Monolingual 預金の受入 ・・・ を主たる業務とする金融機関. ”お金を預けに銀行へ行った”, ... Princeton WordNet
銀行 die Bank
{depository financial institution#1, bank#1, banking concern#1, ...}: “a financial institution that accepts deposits and channels the money into lending activities” 提供されたものを蓄積・保管し,求めに応じて提供する組織. ”人材銀行”, ... {bank#3}: “a supply or stock held in reserve for future use” Copyright 2006, Yoshihiko Hayashi
5
1. LR/LS ontology for meta-descriptions 2. Tech. Elements from semantic Web service
Language Grid: Key technologies metadata
Language Grid
wrapper LS: z
composite service metadata for x
service element
wrapper for x language resource:x
service composition & deployment wrapper generation
Copyright 2006, Yoshihiko Hayashi
metadata wrapper LR:y
metadata for z wrapper for z language service:z 6
The Model Overview {
The model is a three-layered model: z
{
word form, word sense and lexical concept
for uniformly modeling z z
MRDs: Machine Readable Dictionaries { monolingual, bilingual CCLs: Computational Concept Lexicons sense concept
form (lemma)
Copyright 2006, Yoshihiko Hayashi
7
Japanese Dictionary (Sanseido Daijirin)
word senses (financial bank, supply/stock)
Copyright 2006, Yoshihiko Hayashi
8
Japanese-to-English Dictionary (Sanseido Exceed) translation "a bank" phrases
compounds
Copyright 2006, Yoshihiko Hayashi
9
English-to-Japanese Dictionary (Sanseido Exceed)
noun
verb
Copyright 2006, Yoshihiko Hayashi
10
English Dictionary (LDOCE ONLINE)
POS:noun
Copyright 2006, Yoshihiko Hayashi
11
English Concept Lexicon (WordNet Online)
Copyright 2006, Yoshihiko Hayashi
12
Japanese/English Concept Lexicon (EDR Dictionaries) Japanese word dictionary •銀行[ギンコウ] / 3bc999 •バンク[バンク] / 3bc999 super: 0ed4e3
bilingual synset
English word dictionary •Bk. / 3bc999 •bnk. / 3bc999 •bank / 3bc999
Concept Identifier
3bc999 in Head-concept dictionary •E-headword: bank •J-headword: 銀行[ギンコウ] •E-explication: a financial institution, called bank •J-explication: 銀行という金融機関 sub: 0e7b82 / 0e828f / 0eaa61 / ... Copyright 2006, Yoshihiko Hayashi
13
Requirements to Dictionary Model {
{
should be able to incorporate as many dictionaries/lexicons as possible should be simple yet effective z
{
{
as far as sufficient for the human uses in the Language Grid environment
should be based on word senses (lexical concepts) should be able to represent secondary (derived) relations z z
discovered on-demand/on-the-fly can be inter-dictionary/cross-lingual Copyright 2006, Yoshihiko Hayashi
14
Schematic Overview of the Model lemma
sense
concept
dotted arrows: derived links
Copyright 2006, Yoshihiko Hayashi
15
Ingredients of the Model {
Nodes: three classes z
z
lemma node: defined by a 3-tuple of: { sense node: MRD
(bilingual)
(monolingual)
CCL z
concept node (for CCLs) {
{
, ,
Links: classified by the source/target node types z z z z
lemma-to-sense (1:n): MRDs and CCLs sense-to-concept (1:1): CCLs concept-to-concept (1:n): CCLs derived relations: discussed later Copyright 2006, Yoshihiko Hayashi
16
A Modeling Example MRD: English-to-Japanese
[e;j]:synset, {e;j}:gloss
EDR Dictionary Bk,/n/-
[;銀行] bnk,/n/-
bank/n/[;貯金箱]
[Bk., bnk., bank; 銀行, バンク] {a financial institution, called bank; 銀行という金融機関}
bank/n/-
[;貯蔵所] 銀行/n/ギンコウ [;コンピュータ・バンク]
Princeton WordNet
MRD:Japanese
銀行/n/ギンコウ
バンク/n/バンク
bank/n/{;預金の受入,資金の貸付 などを主たる業務とする 金融機関}
coin bank/n/-
{;提供されたものを蓄積・ 保管し,求めに応じて供給 する組織} Copyright 2006, Yoshihiko Hayashi
[depository financial institution, bank, banking concern, banking company;] {a financial institution that accepts deposits and channels the money into leading activities;} [saving bank, coin bank, money box, bank;] {a container for keeping money at home;} 17
Derived Relations {
Derived relations should be properly annotated: z z
{
relation label (as in original relations) scores from the computational process { degree of the relation { reliability
An inventory of relation labels should be defined: z
semantic equivalence relations { in
EuroWordNet: eq_synonym, eq_near_synonym, ... { problem: lexical gaps z
z
MWE: Multi-Word Expressions
other (lemma-to-lemma) relations: form-variant, see-also Copyright 2006, Yoshihiko Hayashi
18
Related Work: LMF { {
ISO 24613 (ISO TC37/SC 4) LMF Models z z
{
Core model: Extensions: MRD, NLP systems
Comparison z
Objective {
{
z
Basic model structure { {
z
LMF: integrating lexical resources for NLP applications Our model: representing and combining "on-thenet" lexical resources for human uses LMF core: two-layered model (form, sense) Our model: three-layered model (lemma, sense (MRDs), concept (CCLs))
... the degree of detailedness Copyright 2006, Yoshihiko Hayashi
19
Conclusions {
{
We proposed an abstract dictionary data model for uniformly modeling MRDs and CCLs in the context of the Language Grid. The model is primarily based on the Princeton WordNet model with some extensions to incorporate MRDs and derived inter-dictionary relations. z z
{
nodes (lemma/sense/concept) links: intra-dictionary, inter-dictionary (derived)
A modeling example with MRDs, Princeton WordNet, and EDR dictionaries was shown. Copyright 2006, Yoshihiko Hayashi
20
Current Status and Future Work {
hand-created wrappers for a number of dictionary/lexicons z
{
detail the model by considering more dictionaries/lexicons z
{
while, at least, avoiding conflicts with the ongoing standardization efforts, such as LMF
develop and deploy a wrapper generation tool, that might be based upon: z z
{
providing a uniform API that is based on the model
PBD: Programming By Demonstration Scheme-guided wrapper generation
develop/incorporate a variety of linguistic processes for sense/concept-based entry linking z
mechanism for perpetuation in the Language Grid Copyright 2006, Yoshihiko Hayashi
21
Thank you for your attention! {
{
We will welcome your participation to the project! References z z
http://langrid.nict.go.jp/ Hayashi, Y. and Ishida, T. (2006). A Dictionary Model for Unifying Machine Readable Dictionaries and Computational Concept Lexicons. LREC 2006, pp.1-6.
Copyright 2006, Yoshihiko Hayashi
22
Copyright 2006, Yoshihiko Hayashi
23
Wrapper generation Dictionary model as Schema lemma
Example entries from the target dictionary
•written form •*part of speech •+pointer to sense nodes
sense
•definition •example •...
dictionary data Wrapper Wrapper generator generator
already learned wrappers
Annotation by using GUI
This is the definition part, and ...
Programming By Demonstration
Wrapper program (for the target dictionary) Copyright 2006, Yoshihiko Hayashi
24