AN OVERVIEW OF THE INFRASTRUCTURE FOR

AN OVERVIEW OF THE INFRASTRUCTURE FOR STORING LARGE SCALE KNOWLEDGE RESOURCES Takashi Kobayashi†

Haruo Yokota†‡

† Global Scientific Information and Computing Center, Tokyo Institute of Technology ‡Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology Ookayama 2–12–1, Meguro-ku, Tokyo, 152–8552 Japan. [email protected], [email protected] ABSTRACT Presently, a variety of multimedia information is explosively increasing and many knowledge bases have been built in various areas. It will be necessary to construct and use large-scale knowledge resources in every domain of research. Although various individual knowledge bases already exist, they are not easy to manage, extend and utilize, since they are developed with inconsistent concepts and dispersed over individuals, research organizations and research domains. In this paper, we discuss how to store large-scale knowledge resources when we build and exploit largescale knowledge resources in interdisciplinary researches such as the Tokyo Institute of Technology 21st Century COE Program titled “Framework for Systematization and Application of Largescale Knowledge Resources” [1]. Now we are developing an advanced information storage system “KnowledgeStore” [2, 3] as the infrastructure for storing large scale knowledge resources. This paper presents the functions and configurations of KnowledgeStore together with some examples of the external application systems. 1. INTRODUCTION There are a great number of knowledge resources around us, taking various forms such as historical documents, classical literature, presentation materials, spontaneous speech, and recorded video streams. The Tokyo Institute of Technology 21st Century COE Program of “ Framework for Systematization and Application of Largescale Knowledge Resources” (or Titech 21COE LKR Program for short) aims to conduct a wide range of interdisciplinary research combining humanities and technology to develop the framework for systematization and application of the large-scale knowledge resources [4]. In the project, therefore, it is significant to utilize these knowledge resources efficiently under the current Internet environment. For this purpose, we have to store these various knowledge resources into information storages in electronic forms, and provide useful search functions to retrieve them. Considering the information storage for knowledge resources, One of the challenges lies in the difficulty in managing and sharing these knowledge resources. For example, although various dictionaries [5,6] are used within research organizations and projects, as

basic tools for natural language processing and spoken language processing, it can be difficult to establish equivalent relationships between them [7]. Furthermore, the flexibility and extensibility are key issues to provide such services. Since the usages of the stored knowledge resources are assumed to have wide variety and will be changed, the configurations and functions of the system should be flexible. The system should also be extensible because the number of knowledge resources and access requests for them are expected to increase rapidly. To resolve this problem, now we have been developing a largescale knowledge resourcesstoring system [2,3] to handle the largescale knowledge resources in various forms. An essential point of our approach is preparing an advanced information storage system named KnowledgeStore, which can store any knowledge resources as contents and provide useful common functions for managing these various knowledge resources. Moreover, for flexibility and extensibility of the system, we take the following approaches: • As software functions, we prepare web-service APIs to external systems executing applications specified for some knowledge resources as well as ordinary web interfaces for interactive users. We currently assume several external application systems such as e-learning systems treating multimedia contents [8, 9] and advanced research-paper retrieval systems [10, 11]. The KnowledgeStore provides common functions for these external applications systems via the web-service APIs. It enables to develop specific applications using appropriate combination of knowledge resources through the common functions. • To store the variety of knowledge resources, we introduce a data model which allows flexibly defines knowledge resources and prepare function for storing various data format. • As a hardware configuration, we adopt a Fiber Channel (FC) switch as a storage area network (SAN) to connect a number of servers with FC-RAIDs having the storage virttualization mechanism and additional RAIDs for data backup. The configuration easily accommodates the requirements to enable to scale up storage capacity and processing performance by changing the number of disks and servers.

User User of of #2 #2 && #3 #3

User User of of #3 #3 User User of of #3 #3

User User of of #2 #2 User User of of #1 #1

System System using using #2 #2 && #3 #3

Generic Generic Interface Interface

KR KR #1 #1

KR KR #2 #2

System System using using #3 #3 System System Creating Creating #3 #3

Complicate Complicate Feature Feature for for #3 #3

KR KR #3 #3

KR KR #n #n

Advanced Information Storage System

Fig. 1. The Overview of Large-scale Knowledge Resources Storing System • To enhance the freedom in the extensibility of servers, we assign service functions to a number of servers: video stream servers, web servers, relational database and XML management servers, single sign-on servers, and contents creation servers. It allows adjusting the processing performance for single service. • It is also essential to control user accesses in this type of system. However, when there are a number of servers having own user authorization mechanism and they require password whenever user enter some service, it should be a trouble for users. To avoid it, we adopt single sign-on mechanism for the services. This paper reports the functions and configurations of the KnowledgeStore to manage the large-scale knowledge resources. At first, backgrounds and construction strategies are described in section 2. Next, we overview the core system of our large-scale knowledge resourcesstoring system named ’KnowledgeStore’ and explain software and hardware configurations of KnowledgeStore in section 3. We then introduce some examples of the external application systems and discuss their relationship with functions of the KnowledgeStore in section 4. Finally we summarize our work in section 5. 2. CONSTRUCTION STRATEGIES We can easily find many information components regarded as knowledge resources if the components are well structured. It means that it is important to tidy up them systematically or unify various types of them to derive the synergetic effects. The information technology is capable of helping to structure these information components as knowledge resources if they are preserved in electronic forms. Moreover, if sufficient functions for accessing them, including search methods, are prepared, the expansion of the Internet enable us to propagate the knowledge resources widely. Therefore, it is significant to store the information components into a system providing powerful search functions via the Internet. Since the information components take various forms, the system has to handle these forms. Written texts and drawn figures are typical information components, and are frequently combined in

documents. Many data formats are commonly used to represent the documents: flat text, postscript, pdf, ppt, and so on. Sometimes we have to scan printed materials, such as historical documents. In these cases, bitmap data or some other compressed formats, such as JPEG and gif, are used. Recorded video streams and spontaneous speech are also common information components. There are many formats to handle these stream information, such as MPEG, RealMedia, WindowsMedia, and QuickTime. Treatments for each data formats and media are completely different. Moreover, we may soon have other data formats and even other media. To treat these wide and dynamic variety of information components in a system, the flexibility in system configuration is quite important. Figure 1 illustrate the overview of Large-scale knowledge resources storing system. As you can see in this figur, we devise an approach to combine two types of systems: • An advanced information storage system providing common functions for handling the variety of data formats • External systems dedicated for special applications based on the knowledge resources At first, we focus on the functionalities of the information storage system named KnowledgeStore in section 3. Some external systems we currently assume are introduced in section 4. Another important requirement for the information storage system is the extensibility. The number of information components to be stored in the system must increase rapidly because targets of application will expand widely. Moreover, the number of video stream contents tends to increase because approaches using videos, e.g. video based e-learning, become very active recently. Since a video stream especially requires a large amount of disk space, the storage capacity has to be extensible without difficulty. On the other hand, the number of access requests to the system should also increase when the service become known. Therefore, it is significant to make the system scalable for the storage capacity and processing performance. We describe our approach to realize the extensibility regarding to the hardware configuration in section 3.5. 3. KNOWLEDGE STORE In this section, we will explain overview of KnowledgeStore. At first, we will explain key features prepared for handling various knowledge resources. In 3.2, our data model of contents and how our data model allows users to define knowledge resources flexibly is described. Then software configurations of KnowledgeStore is described in 3.3. Finally we will show you user interfaces of KnowledgeStore. 3.1. Features of KnowledgeStore As described in Section 2, KnowledgeStore is designed as a advanced information storage system for user and external systems and it provides useful common functions for managing these various knowledge resources. Figure 2 illustrate an overview of functions of KnowledgeStore. By using these function, KnowledgeStore has following key features. 1. Flexible Content Definition To storing various knowledge resources, we introduce advanced data model which allows very flexible definition of contents. In our data model, user can define a content as

User defined

Users Users

Content Definition An An External External System System 11

An An External External System System nn

…

Knowledge Store

Metadata Definition

- ContentName: Unicode String - MetadataList: List

1

DataType Web Service API

Web Interface

Web Service API

- Name: Unicode String

0..*

Single Sign-On

Web Interface

1..*

Web Interface

Web Service API

Contents Management

Contents Retrieval

Content Distribution

Contents definitions

Contents & Metadata

Authorization data

1

Extra Metadata

- Base: Primitive Type - Name: Unicode String

0..*

1

System prepared

Primitive Type

Default Metadata

Fig. 2. The Overview of Functions of KnowledgeStore

a set of metadata which presents an attribute of the content. Moreover, user can flexibly define the type of metadata. Our data model will be explained in more detail in 3.2. 2. Contents Management The content management feature is to manage all metadata of contents and definitions of contents. It allows user to create, edit and delete contents and to edit attribute of contents defined as metadata such as a title of contents, file in contents and so on. Moreover, owner information and access privileges of each content are managed by this feature. These information are used for access validation when a user view or retrieve or edit the contents. 3. Various Data Formats There are a variety of data formats of knowledge resources. To handle the variety of data formats, KnowledgeStore support following type of data: • Document File (Text, MS Office Documents, etc) • Binary File (Images, etc) • Streaming Media (Movie, Voice, etc) • Tabular data (RDB) • XML data 4. Various Contents Retrieval Functions One of retrieval function is the metadata search looking up metadata of contents corresponds to all data formats mentioned above. The other retrieval functions are SQL queries for RDB data, XPath queries for XML data and full-text search. We use a full-text search engine to find given keywords in document files such as text files, MS Office documents and PDF files with an advanced index. Moreover, text data in metadata, specific columns in RDB data, attributes and values of specific nodes in XML data can be searched by the engine. All retrieval results are filtered out by verifying access privilege in these functions.

Fig. 3. A Class Diagram of Our Data model

5. Content Distribution As we describe below, KnowledgeStore support many data formats including streaming video and voice, xml and so on. Content distribution feature is to distribute these multimedia with appropriate method. For example, user can receive static file such as images, text and pdf file via HTTP. Data in RDB is distributed as csv file or table data written in HTML and Video and voice data is distributed as RTSP stream from a streaming server interface. In each case, access privilege for the contents are checked. 6. Single Sign On Service Since features of KnowledgeStore are divided into several subsystem, there are several authentication procedures for using KnowledgeStore. Single Sign On (SSO) Service provide integrated interface and act for authentication procedure to each subsystems. As well as Web interfaces, it works for access of external systems using web service API. By using SSO, user can access all of authorized system without regard to system boundary with just one authentication procedure. 7. Authorization Management All authorization information such as list of user and group, access privileges for each contents are managed by this function. These information is used by authentication procedures in Single Sign On service and several feature such as content retrieval and content distribution as we explain above. 3.2. Data Model of Contents As we mentioned in 3.1, we introduce advanced data model which allows very flexible definition of contents to storing various knowledge resources. In our data model, user can define a content as a set of metadata which presents an attribute of the content. Moreover, user can flexibly define the type of metadata. Figure 3 is a UML class diagram of our data model. A content definition is defined as name of the content and several Metadata definition which represented its attribute and data.

Primitive Type

DataType

Content Def.

Content

String

Title

Paper

Haruo Yokota: “Information Storage …”, Proc. of Intl. Symp ….

Integer

Author

Document

PDF File

Manuscript : PDF File

Lecture

Adv. Data Engineering, 2004/5/19, Haruo Yokota

PPT file Movie

Lecture Video

Date

Created Date

Lecture Material

Adv. Data Engineering, 2004/5/26, Haruo Yokota

Material : Lect. Mat.

Lecture note : PDF File

System Prepared

Defined by Content-Administrators

Created by Publishers

Fig. 4. An Example of Relationships between Each Objects

This stand for ’Class’ of the content in object-oriented paradigm. Note that we don’t discriminate between attribute and data in content definitions. Since we treat data of content as one of attributes, we can define a complicated content which has several data such as a stream media with its MPEG7 XML data. A Metadata definition is defined by its name and DataType. For flexibility of content definition, we have two subtypes of Metadata Definition. One is system-prepared metadata (DefaultMetadata), the other is user-defined metadata (Extra Metadata). We implement DefaultMetadata for basic attributes of a content by referring Dublin Core [12], such as ’Title’, ’Creator’, ’Abstract’, and so on. We can define any other attributes and data as Extra Metadata including DataType object in a content definition. In our data model, user can define type of data as well as type of content. DataType is represented type of data in our system. User can define DataType as a composition of its name and a PrimitiveType, which is system-prepared data type. We prepare various PrimitiveType for multimedia data such as Document, Stream Media, RDB data, XML data besides basic primitive type such as Boolean, Integer, String, and so on. Furthermore, we prepare a data type representing a reference of other content definition named ’Reference’. User can define a contents aggregating other contents by using Reference. It allows us to define complicated definition of contents. Figure 4 illustrate an example of relationship between each objects. In this example, a content definition ’Paper’ has three metadata. Datatypes of them are Title, Author and PDF File. Note that several DataType are shared by some Content definitions. Since other content definition ’Lecture Material’ also has PDF File as metadata named Lecture Note, we can retrieve both type of contents including string of ‘LKR’ if we search “The contents which has PDF File including string of ‘LKR’ ”. Base type of PDF File data type is “Document”. It is one of the Primitive Type. KnowledgeStore treat it as static file and target of full-text indexing engine described below. Moreover, you can see a definition of Lecture has a reference of Lecture Material which is other content definition. Defining

Lecture as we illustrated in figure 4, system can treat Lecture Material as a content as well as Lecture. 3.3. Software Configuration In this section, we explain software configuration of KnowledgeStore. As we describe above, KnowledgeStore is not only an advanced information storage system for external systems, but also an advanced information storage web system for users. KnowledgeStore is designed with 5 layers architecture, as shown in figure 5. Detail of each layer are described below. 3.3.1. Middleware Layer This layer is an abstraction of back-end systems such as RDBMS and OS. All data includes meta-data of contents are stored to subsystems in this layer. This layer contains file storage for static files, streaming server for stream media, RDBMS, XMLDB and full-text indexing engine. In current version of KnowledgeStore, we use OS file system as file storage, RealNetworks Inc. Helix Universal Server as streaming server, Oracle 9i as RDBMS and XMLDB and GETA [13, 14] as full-text indexing engine. 3.3.2. Data Access Layer To eliminate vender specific implementation for each middleware, we insert data access layer between middleware layer and application logic layer. Now we use RDB, XMLDB and Full-text indexing engine via component in this layers. 3.3.3. Application Logic Layer In this layer, we implement core functions for information storage system. Content definition management component provide features to manage content definitions and data type definitions such as described in 3.2. All of contents are managed by content management component. User information and access privileges are managed in

prepare web service API with all features of KnowledgeStore.

Integration Web SSO

Session Management

Streaming SSO

3.3.5. Integration Layer Presentation Web UI

Web Browser UI

Web UI

Content Publishers Interface

Web Service API

Streaming IF

File Transmit IF

External System Interface

Streaming Server Interface

File Transmit Interface

Web UI

Administrator Interface

3.4. User Interfaces

Application Logic Content Retrieval

Content Definition Management

Content Management

Authorization Management

Data Access Full-text index Engine Interface

To integrate sub systems such as web application servers and streaming servers, we adopt single sign on (SSO) mechanism in this layer. Moreover, user authentication and access log are managed in this layer.

XML DB Interface

RDBMS Interface

XML DB Server

RDBMS

Now we explain web interfaces of KnowledgeStore with three screenshots. Figure 6 is a screenshot during content definition. You can see the list of data types in the center of browser. In content definition procedure, we can select data types from this list and define it as metadata with name. If you cannot find appropriate data type, you can create new data type.

Middleware Middleware Full-text indexing Engine

Streaming Server

Static File Storage

Fig. 5. A Software Architecture of KnowledgeStore

authorization management component. The responsibility of this component is content encapsulation of metadata set which is defined in content definition management component, and content management by content identifier. To encapsulate metadata as contents, content management component compose appropriate metadata as contents and store each metadata to appropriate middleware via data access layer. Content retrieval component provide three type retrieval methods: metadata search, datatype search and full-text search. Metadata search is retrieval method to find contents of designated content type by several metadata values. On the other hand, datatype search is retrieval method to find contents which have designated datatype metadata by metadata values. Since we can define userdefined datatype and share it between any content definition as shown in 3.2 we can retrieve any contents across all type of contents. Moreover, we can use SQL query for RDB data and XPath for XML data. Full-text search provide retrieval contents which have Document type metadata by using full-text indexing engine. We can designate range of retrieval to specific type of contents or datatype. If we need common functions between different contents processing or external systems, we can easily extend for those functions to implement them in this layer. For supporting development of external systems using several knowledge resources, we are implementing common advanced function in several knowledge resources processing in this layer. 3.3.4. Presentation Layer Presentation Layer provide interfaces of function in Application logic layer for users and external systems. For users, we prepare Web interfaces which can easily use all features via web browser. We can also administrate KnowledgeStore via administrators Web interfaces. Moreover, for external systems and expert users, we

Fig. 6. Content Definition Next screenshot is a scene of viewing a content typed “Manual”. You can find a description of content definition of this contents and several set of metadata name and its value in figure 7. Since a manual content has PDF file type metadata named “ManualDocument”, a hyper link to the pdf file is displayed in next row of identifier of this content. Figure 8 shows you the interface of searching contents by Data-Type. As we described in 3.2, user can retrieve multiple type contents having metadatas which are a designate data type and value. In this interface, the list of datatype is displayed. User can input given conditions as arbitrary number of set of data type and value. 3.5. Hardware Configuration The extensibility is another essential requirement of the information storage system for the large-scale knowledge resources, because the number of large information components and the number of access requests readily increase. We have to inhibit the system having any bottlenecks when we scale up the performance and storage spaces of the system. To meet the requirements, we adopt a storage area network (SAN) configuration with a number of servers, which is capable to adjust the storage capacity and processing performance easily by

client client The Internet

SSO Server SSO Server SSOサーバ SSOサーバ SSOサーバ SSOサーバ

LAN LAN (Gigabit (Gigabit Ether Ether Switch) Switch) Application Application // Index Index Server Server

Streaming Streaming Server Server

RDB/XMLDB RDB/XMLDB Server Server

Contents Contents Creation Creation Server Server Scanner Scanner Camera Camera etc. etc.

SAN SAN (2Gbps (2Gbps FC FC Switch) Switch) 4*2Gbps Video/ Video/ Sound Sound Contents Contents

Documents Documents Contents Contents

RDB/XML RDB/XML Data Data

3*2Gbps Metadata Metadata

Backup Backup ATA-RAID ATA-RAID (3*3TB) (3*3TB)

Index Index

FC-RAID FC-RAID (8.76TB) (8.76TB)

Real time Replication

Fig. 7. Conent View Fig. 9. Hardware Configuration Overview

performance point of view. Now we use two Sun Fire V440 for a streaming server, an application server and full-text indexing server, and four IBM x345 for SSO servers and RDB and XMLDB servers and JCS VC83060 for a contents creation server. 4. EXTERNAL SYSTEMS

Fig. 8. Search by DataType

adding new storage or new servers, or by detaching the storage or servers, to/from the network. Since the access frequencies should differ from service to service, we assign service functions to different servers to enable to the adjustment of the performance of each service by changing the number of servers. We prepare video and sound stream servers, web servers, relational database and XML management servers, single sign-on servers, and contents creation servers. Figure 9 illustrates the hardware configuration of the information storage system. Recently many vendors offer storage virtualization products in the SAN configuration to cut costs for managing storage space by sharing a pooled logical storage volumes. Since the storage virtualization is effective to add or detach storage devices dynamically, we connect FC-AL RAIDs, which contain 160 FC-AL disks, adopting the storage virtualization to a 16-port Fiber Channel (FC) switch. The total physical capacity of the RAIDs is 8.76TB or logical one is roughly 7TB by the RAID5 configuration. The reliability is also significant for the storage system. Therefore, we connect three additional 3TB ATA-RAIDs to the FC switch for data backup, and prepare uninterruptible power supply (UPS) to tolerate power failures. Nowadays, cheap RAIDs tend to be used as backup devices instead of magnetic-tape drives from cost-

The combination of the KnowledgeStore and external application systems is suitable for realizing the flexibility in services of preparing knowledge resources. Here we consider several examples of the external application systems capable of being connected to the KnowledgeStore: the UPRISE [8], Asunaro [9], PRESRI [11], Publication listing supports for external web sites, Research Mining [10], a historical information system and Tele-Synopsis [15]. The UPRISE and Asunaro are related to learning materials. Next three systems are related research papers retrieval. Details of these external systems are described below. a historical information system is a archaeological library system such as [16] and planed to develop by a research group of Prof. Hiroyuki KAMEI in this COE program. Tele-Synopsis is tool to break parallel texts into hybrid units more suitable for genealogical analysis. We also plan to store several corpus, such as CSJ [5], EDR [6], and materials related foreign language educations such as German and French studies. • Publication listing supports for external web sites. To efficiently use information of publications stored in KnowledgeStore. Now we are developing a program which retrieve publications in KnowledgeStore with author name or some metadata by using web service API and generate lists with several data format such as HTML, LaTeX bibliography formats and so on. For example, user easily create the list of publications stored in KnowledgeStore on their web site by using this program. Figure 10 illustrate an overview of publication listing supports for external web sites. • PRESRI [11, 17]

http://… http://…/pub.{php, pub.{php, jsp} jsp} List of Publications Prof YYY . ........… ........…. . .........… .........… ............ ............ Dr. ZZZ ………….. …………..

Web Web Server Server List of Publications 2005 . ........… ........…. . .........… .........… .........… .........…….. ............ ............ 2004 ………….. …………..

Publication PublicationListing Listing Support Support

1) 1) Get Get Contents Contents form form KS KS 2) 2) HTML HTML Generation Generation

Web WebServer Server

Web Stub -Web Service Service Stub

Result User

List of Publications Aritcles . ........… ........…. . .........… .........… .........… .........…….. ............ ............ Int. Conf. ………….. …………..

Query params Author=“X”, …

Information of Information of Information Publication Informationofof Publication Publication #1 Publication

Web Web Service Service API API Download via link in lists

WebUI xx

Author WebUI

xx

Create and Edit Contents of Publications

xx

xx

Publications Knowledge Store

Fig. 10. An overview of Publication Listing Supports

Web Crawler

Paper Paper

Paper Distribution

User Search Interfaces

Paper cache Contents Paper Contents

Bibliography Extraction Engine Refferences Extraction Engine

Replica of PRESRI DB Replication daemon

PRESRI Frontend

Knowledge Store PRESRI DB

Only privileged user can get all of data of paper cache contents. (Others get bibliography data only)

For sharing DB with other projects, Async replica is created

PRESRI Backend

Fig. 11. Data Flow between the PRESRI and KnowledgeStore

The PRESRI is a multi-lingual citation index using databases of research papers. It collects research papers from the the World Wide Web written in English and Japanese, and extract citation information by analyzing the papers to create the citation index. The graphical user interface helps to traverse related papers. The KnowledgeStore can support the PRESRI by storing the research papers efficiently and providing search functions for them. The PRESRI can also use the relational database in the KnowledgeStore to store the citation information. Figure 11 illustrate a data flow between the PRESRI and KnowledgeStore. Now we are developing an external system version of PRESRI. This system provide advanced search functions for research papers in KnowledgeStore. • Research Mining [10, 18] The Research Mining is another tools to help researchers by discovering the macro-flow of research. There are already tools for analyzing bibliographical relationship and co-citation, but they cannot illustrate macro-flow of research. The Research Mining clusters research papers and analyses

trends of research or researchers by applying the Apriori algorithm for mining association rules to cited papers. The Research Mining can share the research papers and citation information for the PRESRI, stored in the KnowledgeStore. • UPRISE [8, 19] UPRISE (Unified Presentation Slide Retrieval by Impression Search Engine) is storing and retrieval method for metadata based loosely-coupled contents consisting of slides and video streams, with the time and sequence information. An experimental system of the UPRISE have been also proposed to demonstrate and evaluate functions of the methods. It has advanced retrieval functions, whose retrieval unit is the scene of the movie to display the distinct the portion of the movie which user wants. Now we are integrating UPRISE as an external system of KnowledgeStore. In this integration, images of each slides of presentation material and video stream and xml file of synchronization information between slide and video are stored as contents in KnowledgeStore. To create metadata for UPRISE retrieval methods, the external system get presentation materials and synchronization information via web service API of KnowledgeStore. The external system provide services for retrieving looselycoupled contents consisting of slides and video streams using metadata in local database. When user use the contents, external system distribute a video stream and slide images in KnowledgeStore to user by using web service API and distribution interfaces of KnowledgeStore. Moreover, we are developing UPRISE contents generation workflow by using KnowledgeStore as intermediate data repository and plan to extend to use more information, such as voice stream of a lecturer and movie stream of writing on the blackboard in the lecture. • Asunaro [9] The Asunaro is a Japanese reading system of multi-lingual environment developed by members of the Foreign Students Center in Tokyo Institute of Technology. It can used as an e-learning system for foreign students who want to study Japanese. The KnowledgeStore can manage the multi-lingual manuscripts and databases for language translation used in the Asunaro. The related materials, such as voice and video stream for foreign language study can also be stored in the KnowledgeStore. 5. CONCLUSIONS We proposed a configuration method of combining an advanced information storage system and external systems by web-service APIs to provide services of managing the large-scale knowledge resources. The information storage system named KnowledgeStore provides common functions for handling the variety of data formats, while the external systems execute special applications based on the knowledge resources. This paper reports the software and hardware configurations and functions of the information storage system and the external systems. The configuration can guarantee the flexibility and extensibility required for managing the large-scale knowledge resources.

We are currently implementing several external systems, and some external system expecting to start service soon. We are also considering the enhancement of its functions such as managing versions.

Acknowledgment The authors would thank Dr. Taizan Suzuki of Duo Systems Co. Ltd. to this work. This work is partially supported by Tokyo Institute of Technology 21COE Program ”Framework for Systematization and Application of Large-Scale Knowledge Resources”, a Grant-in-Aid for Scientific Research of MEXT Japan(#16700023, #16016232) and CREST of JST (Japan Science and Technology Agency). 6. REFERENCES [1] Tokyo Institute of Technology 21st Century COE Program, “Framework for systematization and application of large-scale knowledge resources,” http://www.coe21lkr.titech.ac.jp/. [2] Haruo Yokota, “An information storage system for largescale knowledge resources,” in Proc. of International Symposium on Large-scale Knowledge Resources LKR2004, Tokyo, Japan, Mar. 2004, pp. 87–90. [3] Takashi Kobayashi, Taizan Suzuki, and Haruo Yokota, “A configuration of an information storage system for largescale knowledge resources (in Japanese),” Technical Report of IEICE DE2004-09, IEICE, Jun 2004, (Vol.104, No.102, pp.49–54). [4] Sadaoki Furui, “Overview of the 21st century coe program “framework for systematization and application of large-scale knowledge resources”,” in Proc. of International Symposium on Large-scale Knowledge Resources LKR2004, Tokyo, Japan, Mar. 2004, pp. 1–8. [5] National Institute for Japanese Language, “The corpus of spontaneous japanese,” http://www2.kokken.go.jp/˜csj/ public/. [6] National Institute of Information and Communications Technology, “EDR home page,” http://www2.nict.go.jp/kk/e416/ EDR/. [7] Sadaoki Furui, “Introduction of the 21th century coe program, “framework for systematization and application of large-scale knowledge resources”,” in SCIS & ISIS 2004, Yokohama, Japan, Sep. 2004, number THP-5-2. [8] Haruo Yokota, Takashi Kobayashi, Taichi Muraki, and Satoshi Naoi, “Uprise: Unified presentation slide retrieval by impression search engine,” IEICE Transactions on Information and Systems, vol. E87-D, no. 2, pp. 397–406, Feb 2004. [9] Kikuko Nishina, Yumiko Yoshimura, Izumi Saita, Kikuo Maekawa Yoko Takai, Nobuaki Minematsu, Seiichi Nakagawa, Shozo Makino, and Masatake Dantsuji, “Speech database construction for japanese as second language learning,” in Proceedings of SNLP-Oriental COCOSDA 2002, 2002, pp. 187–192.

[10] Makoto Yoshida, Takashi Kobayashi, and Haruo Yokota, “Comparison of the research mining and the other methods for retrieving macro-information from an open researchpaper db (in japanese),” IPSJ Transactions on Databases, vol. 45, no. SIG7(TOD22), pp. 24–32, 2004. [11] Hidetsugu Nanba, Noriko Kando, and Manabu Okumura, “Classification of Research Papers using Citation Links and Citation Types: Towards Automatic Review Article Generation,” in Proc. of the 11th SIG Classification Research Workshop, Classification for User Support and Learning, 2000, pp. 117–134. [12] Dublin Core Initiative, “Dublin core,” http://dublincore.org/. [13] Innovative Inforamtion Technology Incubation Project of Information-technology Promotion Agency of Japan, “Generic engine for transposable association (geta),” http://geta.ex.nii.ac.jp/e/. [14] Akihiko Takano, Yoshiki Niwa, Shingo Nishioka, Makoto Iwayama, Toru Hisamitsu, Osamu Imaichi, and Hirofumi Sakurai, “Information access based on associative calculation,” in SOFSEM 2000: Theory and Practice of Informatics: 27th Conference on Current Trends in Theory and Practice of Informatics (LNCS Vol.1963), Milovy, Czech Republic, Nov./Dec. 2000, p. 187, Springer. [15] Maki Miyake, Hiroyuki Akama, Migaku Sato, Masanori Nakagawa, and Nobuyasu Makoshi, “Tele-synopsis for biblical research: Development of nlp based synoptic software for text analysis as a mediator of educational technology and knowledge discovery,” in Intl. Conf. on Educational Technology in Cultural Context in conjunction with (ICALT2004), Sep. 2004, pp. 931–935. [16] Takeshi Sannomiya, Mitsuhiko Okayasu, Masatoshi Yoshikawa, and Shunsuke Uemura, “Data model and implementation of an archaeological database system (in japanese),” Journal of computer archaeology, vol. 6, no. 2, pp. 11–18, 2000. [17] H. Nanba, T. Abekawa, M. Okumura, and S. Saito, “Bilingual presri: Integration of multiple research paper databases,” in Proceedings of RIAO 2004, Avignon, France, Apr. 2004, pp. 195–211. [18] Makoto Yoshida, Takashi Kobayashi, and Haruo Yokota, “Guide for deciding a clustering threshold to retrieve macroinformation from a research-paper database (in japanese),” DBSJ Letters, vol. 3, no. 2, pp. 73–76, Oct. 2004. [19] Takashi Kobayashi, Taichi Muraki, Satoshi Naoi, and Haruo Yokota, “An implementation of experimental system for storing and retrieving unified presentation contents (in japanese),” IEICE Transactions on Information and Systems, vol. J88-D-I, no. 3, Mar. 2005, (to be appeared).

AN OVERVIEW OF THE INFRASTRUCTURE FOR

AN OVERVIEW OF THE INFRASTRUCTURE FOR

Suggest Documents