A Chinese Geological Time Scale Ontology for ...

3 downloads 0 Views 595KB Size Report
A Chinese Geological Time Scale Ontology for. Geodata Discovery. Zhiwei Houl,2, Yunqiang Zhul*, Xing Gaol, Kan Luol,2, Dongxu Wangl,2, Kai Sunl,2.
A Chinese Geological Time Scale Ontology for Geodata Discovery Zhiwei Houl,2, Yunqiang Zhul*, Xing Gaol, Kan Luol,2, Dongxu Wangl,2, Kai Sunl,2 I State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China; *Corresponding author, e-mail: [email protected]

Abstract-It is crucial in scientific research to discover and obtain specified and relevant data efficiently and accurately, while the ambiguity

of

keywords,

imperfectness

and

inexactness

of

descriptions of datasets make it a hard task to improve the recall ratio and precision ratio in geodata retrieve. Aims at providing a possible solution to these problems, this paper studied the design and construction of the Chinese Geological Time Scale Ontology (CGTO) regarding the temporal features within datasets. CGTO, which is built on the basis of the Time Ontology constructed before, is divided into 4

subsystems and 2

supplementary systems,

including the Chinese and international version of geochronologic and chronostratigraphic systems together with GSSP (Global Stratotype Section and Point). Classes and instances in CGTO are related to each other by using predicates defined in this study such as

"geologicAge"

and

"hasGSSP",

or

temporal

topological

relations derived from the basic Time Ontology, like "intAfter" and "intMeets" which represent the time order of interval time entities. At the end of this paper, some future works are discussed in order to make CGTO more applicable. Keywords: Geologic Time Scale; Ontology; Geodata Discovery; International Chronostratigraphic Chart; Global Stratotype Section and Point (GSSP);

I.

INTRODUCTION

High quality and quantity geodata form an essential part of geoscience research. However, these demands have puzzled scientists for a long time due to the heterogeneity of terms, keyword mismatch, and implicit or broken links between related geodata resources. An ontology is a formal, explicit specification of a shared conceptualization [1]. Ontologies have been increasingly discussed and applied to harmonize distributed geodata, overcome differences in geoscience terms and improve the functionalities of data services [2]. Specifically, ontologies have been utilized to formalize geological time, describing time successions and their correlation [3], improving discovery, search and use of heterogeneous geoscience data [4, 5]. To name a few, based on GML (Geography Markup Language), GeoSciML provides a framework for application­ neutral encoding of geoscience thematic data and their related data [6], and the GTS ontology [7], which includes concepts about temporal topology and hierarchical ordinal reference systems, links to manifestations in the geologic record required to calibrate the timescale. Additionally, the OneG-E has applied Supported by the National Natural Science Foundation of China (No. 413 71381), the Science & Technology Basic Research Program of China (20I3FYII0900), and the National Data Sharing Platform of Earth System Science in China (2005DKA32300).

ontologies to build multilingual annotation and translation services of geodata contents [8], and X. Ma et al. [9] developed the GTS ontology by using a Resource Description Framework (RDF) model to represent and encode the ordinal hierarchical structure and collected annotations of the geological time scale. Problems still exist for that above ontologies are either not applicable to China or are short of corresponding objects necessary to link up more resources, stratigraphic point or section for instance. Additionally, it is difficult to reason specified terms or measure the time distance with temporal expressions in those studies. In consideration of those problems, this paper focuses on the design and construction of the Chinese Geological Time Scale Ontology (CGTO) for geodata discovery, aims to link up observations, records, and any other kinds of data from different disciplines of geosciences, such as geology, paleontology, stratigraphy or global change through eras in geologic time scale and their corresponding properties like sequence of rock records and geologic events. Relationships (category relations, temporal and correlational relations, etc.) and differences between the Chinese geologic timescale and international geologic timescale have been taken into account both. In addition, terms in the ontology are labeled both in Chinese and English, therefore it can deal with bilingual searches. II. THE FRAMEWORK OF CGTO A.

Basic Time Ontology

The conventional geologic time scale is a reference system defined by a contiguous sequence of time intervals, each identified with a name [10]. So the temporal relations and attributes, such as before, start time and geologic age, are crucial to CGTO through which classes and instances can be linked directly or indirectly to each other. Thus, this study uses the Time Ontology [11] as the basic time ontology and extracts temporal topological relations, date entities, time description properties, etc" from it to fulfill the construction of CGTO, The Time Ontology is a 5-tuple model represented by (1). It consists of temporal concepts, relations, temporal metric, time description, and formalization. To=

(1)

TABLE T.

Super Relation

intBefore

intAfter

disjoint

intBefore(T1,T2)

intAfter

intBefore

disjoint

intAfter(T 2, T I)

intContains

intDuring

contains

intContains(T I, T2)

intDuring

intContains

contains

intDuring (T2, T1)

intFinishs

intFinishedBy

contains

intFinishs(hT2)

intFinishedBy

intFinishs

contains

intFinishedBy(T2, T1)

intMeets

intMetBy

meet

intMeets(T1,T2)

intMetBy

intMeets

meet

intMetBy(T2, T1)

intOverlaps

intOverlapedBy

overlap

intOverlaps (T1,T2)

intOverlapedBy

intOverlaps

overlap

intOverlapedBy(T2, T1)

intStarts

intStartedBy

contains

intStarts(TJ,T2)

intStartedBy

intStarts

contains

intStartedBy(T2, T I)

disjoint

intEquals(T1, T2)

intEquals

C represents shared temporal concepts (classes and instances) such as instant and interval, calendar, chronology, year and month, etc., and their conceptual hierarchy structure.



D stands for descriptions of temporal information, including date and duration descriptions. A time description is an object which is composed of several properties. In date description, properties may be million year (Ma), month and day, and properties in duration descriptions could be million years, months or days. Each field of time description is separate so that it's easier to extract the value of some fields and to reason about [12].



R means the relations among classes and instances, including general relations (is-a and part-of relation, for example) and temporal topological relations which were derived from Allen's Interval Algebra [13] and redefined by Hou et al. in [11]. Topological relations between time intervals are shown in Tab.I.





Expression

Inverse Relation

Relation



TOPOLOGICAL RELATIONS BETWEEN TIME TNTERVALS[IO]

M represents the measurement of temporal information, including the time coordinate system, functions of time position, time distance, and conversion of time granularity, etc. F means the formalization and representation of temporal concepts and relations using First-Order Predicate Logic or Description Logics based ontology language, RDF, or OWL (Web Ontology Language).

In CGTO, temporal relations are used to represent relative ages of geologic time scale, while time descriptions are used to specify absolute ages.

B.

Graphic TI

,.L., T2

TI I

TI �T2 TI

T2

I

e-------Ir

r?b T2

T2

ITI















The Framework ojCG T O

As illustrated in Fig.l, the current version of Chinese Geologic Time Scale Ontology (CGTO) is divided into 6 parts: 4 subsystems, including the CGS (Chinese Geochronologic System), the CCS (Chinese Chronostratigraphic System), the IGS (International Geochronologic System) and the CLS (Chinese Lithostratigraphic System), 2 supplementary subsystems including the ICS (International Chronostratigraphic System), and GSSP (Global Stratotype Section and Point). Instances of the classes in the CGS, CCS, and IGS have been processed as time entities in which startTime and endTime, or geologic age, are basic attributes. Thus, instances can be related through temporal topological relations defined in the basic time ontology as argued previously. TABLE IT.

MAIN PROPERTIES IN CGTO

Property

Domain

Range GeochronologicTi meDescription

startTime/endTime

CGS/IGS

hasMillionYear

GeochronologicTimeDescr iption

MaYearEntities

intBefore/intAfter

CGS/lGS

CGS/lGS

hasGSSP

IGS

GSSP

corJGU

CGS

IGS

cor.CLU

CGS

CLS

cor.CGU

IGS/CCS/CLS

CGS

errorValue

GeochronologicTimeDescr iQtion

float

Figure 1

The Framework of CGTO

As shown in Tab.2, main properties (predicates) in CGTO like "corJGU" (corresponding International Geochronologic Unit), "cor.CLU" (corresponding Chinese Lithostratigraphic Unit), "cor.CCU" (corresponding Chinese Chronostratigraphic Unit) and their inverse predicate "cor.CGU" (corresponding Chinese Geochronologic Unit) build bridges between the 4 subsystems so that geochronologic eras and their corresponding data resources can be linked up. Furthermore, predicates such as "hasGSSP", "hasStratigraphicPoint" and "biomarker" etc. relate terms of ICS to tenns of IGS and GSSP. As a result, retrieval and reasoning of terms of different systems in CGTO becomes possible. Therefore, geodata sharing platforms can provide better data services and geoscientists can easily and precisely discovery more geodata than ever before. I

Protege

I

COS

t--�

I

t-IExlraC\

Charts '--

Papers

1 1 "I 1

.

Tenns

Anribu\cs

1 1I I

building

Vahle

Relations

ccs CLS

,-- lillkill

f-o

lOS

'cs GSSP

I

Time Ontology

Figure 2 Construction Procedure of CGTO



,--C_O_TO--,

III.

CONSTRUCTION OF

CGTO

To research and develop the Chinese Geologic Time Scale Ontology, every term in the CGTO is in Chinese and English both, and Protege has been used to create OWL files. SPARQL (Simple Protocol and RDF Query Language) and SWRL (Semantic Web Rule Language) are adopted to query and reason terms and data annotations. The construction procedure is shown in Fig.2. Firstly, this study uses the time ontology as the fundamental time reference system as mentioned above; then, the construction mainly referenced to the Stratigraphic Chart of China (2014) [14], the English and Chinese version of ICS International Chronostratigraphic Chart (2015) [15], the Geologic Time Scale 2012 [16] and GSSP [17] to extract terms, attributes, value, and relationships among geochronologic eras themselves and their corresponding resources. In the next step, by the usage of Protege, we built the subsystems and supplementary systems of CGTO respectively due to that it could be more convenient to construct and maintain the ontology, and each of them can be used by any other researchers to develop geoscientific ontologies, who may not need the whole CGTO. At the end of the construction, the systems have been combined together and links have been built based on the general relations (namely part­ of, kind-of, instance-of and attribute-of), temporal relations and properties mentioned above. As depicted in Fig.3, the CGS has 2 subclasses: CGE (Chinese Geochronologic Entities, :I:-&M:q::1��1* in Chinese), and GTD (Geochronologic Time Descriptions, :I:-&M:q::1�BtrS] m:i£ in Chinese). Further, CGE has 4 subclasses: Eon em in

Chinese), Era (1� in Chinese), Period (�c. in Chinese), Epoch (tft in Chinese), and Age (;itA in Chinese). Those classes and their individuals (instances) are related directly through temporal relations such as intAfter and intContains. For example, "5tr±lm" (Proterozoic Eon) "intAfter" "�r±lm" (Archean Eon) and "intContains" "r±l5tr±l1�" (Paleoproterozoic Era). Each CGE subclass has corresponding subclass in GTD which described the properties such as dateType (type of time coordinate system), value and error of value of start time and end time of individuals in CGE with a MaYearEntity individual (e.g. "0.0118Ma" which stands for O.oI 18 million years) and BP (Before Present, years before 1950 A.D., a kind of dateType in this study) etc. defined in the basic Time Ontology. Individuals are related with predicates like "intFinishs" and "intStarts" etc. Note that not all individuals have to set the "intAfter" relation for the reason that when individual T,'s value of endTime is equal to individual T/s value of startTime, "intMeets" (see (2)) is recommended to relate T, to T2, and "intAfter" can be inferred.

Figure 3 Classes of CGTO built with Protege REFERENCES

[1]

('If T" T2) {intMeets(T"T2) ::J(3t,,12)[ timeFinish (t"T,)

A timeStart (t2,T2)A insEquals(h,t2)]} S (2) The rest of CGTO share the same method in ontology constructing, and "hasGSSP", "cor.IGU" or other properties have been used to combine geochronologic eras and their corresponding terms together. IV.

[2]

[3]

CONCLUSIONS AND DISCUSSION

Temporal data covered the past and present events, and they can indicate what will happen in future. The more data are discovered about a research topic along its timeline, the deeper geoscientists can research. This study of CGTO not only provides bilingual terms and more detailed information to make it capable to search eras of Chinese and international geologic time scale more easily and quickly, but also links more concepts which have explicit or implicit relationships with the geologic time scale through temporal topological relations and properties up so that the geodata sharing platforms can perform much better in discovering, reasoning, and sorting the Geoscience datasets. The construction of CGTO is not easy work, needing more time and investment along with experiments in order to make it reasonable, extensible, and capable in a production environment. The further steps of this study include the researches of rules of reasoning, the index system to measure the semantic relatedness, and applications to take CGTO into practice and collect feedbacks to improve it at the same time. ACKNOWLEDGMENT

The authors would like to express appreciations to all research team members for their valuable comments and other helps, and also grateful to anonymous referees and associate editors for their insightful comments and thoughtful suggestions on the manuscript, which has led to improvements in this paper. The authors would also like to thank all other people for their proofreading work on the English writing of this article.

[4]

[5]

[6]

[7]

[8]

R. Studer, V. R. Benjamins, and D. Fensel, "Knowledge engineering: principles and methods, " Data & knowledge engineering, vol. 25, no. 1, pp. 161-197, 1998. X. Ma, and P. Fox, "Recent progress on geologic time ontologies and considerations for future works, " Earth Science Informatics, vol. 6, no. 1, pp. 31-46, 2013. M. Perrin, L. S. Mastella, O. Morel, and A. Lorenzatti, "Geological time formalization: an improved formal model for describing time successions and their correlation, " Earth Science Informatics, vol. 4, no. 2, pp. 81-96, 2011. S. Dong, H. Yin, and G. Xu, "Heterogeneous Data Searching Based on Geologic Time Ontology, " Journal ofGeo-Information Science, vol. 12, no. 2, pp. 194-199, 2010. (In Chinese) R. G. Raskin, and M. J. Pan, "Knowledge representation in the semantic web for Earth and environmental terminology (SWEET), " Computers & geosciences, vol. 31, no. 9, pp. 1119-1125, 2005. S. M. Richard, E. Boisvert, B. Brodaric, S. Cox, T. Duffy, 1. Holmberg, B. Johnson, 1. Laxton, T. Lindberg, and S. Richard, "GeoSciML - A GML Application for Geoscience Information Interchange, " Philadelphia Annual Meeting, pp. 47-59, 2007. S. 1. D. Cox, and S. M. Richard, "A geologic timescale ontology and service, " Earth Science Informatics, vol. 8, no. 1, pp. 5-19, 2015. J. Hendler, and D. Allemang, Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL: Morgan Kaufmann, 2011.

[9]

[10]

X. Ma, E. 1. M. Carranza, C. Wu, and F. D. van der Meer, "Ontology-aided annotation, visualization, and generalization of geological time-scale information from online geological map services, " Computers & geosciences, vol. 40, pp. 107-119, 2012. S. 1. Cox, and S. M. Richard, "A formal model for the geologic time scale and global stratotype section and point, compatible with geospatial information transfer

[11]

[12]

[13]

standards, " Geosphere, vol. 1, no. 3, pp. 119-137, 2005. Z. Hou, Y. Zhu, X. Gao, P. Pan, K. Luo, and D. Wang, "Time-Ontology and its Application in Geodata Retrieval, " Journal of Geo-Information Science, vol. 17, no. 4, pp. 379-390, 2015. (In Chinese) J. R. Hobbs, and F. Pan. "Time Ontology in OWL, " http://www.w3.org/TR!2006/WD-owl-time20060927/. J. F. Allen, "Maintaining knowledge about temporal intervals, " Communications of the ACM, vol. 26, no. II, pp. 832-843, 1983.

[14]

[15]

[16]

[17]

National Commission on Stratigraphy of China, "The Stratigraphic Chart of China(Illustration 1), " Acta Geoscientica Sinica, vol. 35, no. 3, 2014. K. M. Cohen, S. C. Finney, and P. L. Gibbard. "The ICS International Chronostratigraphic Chart, " Episodes 36; http://www.stratigraphy.org/ICSchartiChronostratCh art2015-0l.pdf. F. M. Gradstein, J. G. Ogg, M. D. Schmitz, and G. M. Ogg, "The Geologic Time Scale 2012, " The Geologic Time Scale 2012, 2012. International Commission on Stratigraphy. "Global Boundary Stratotype Section and Point (GSSP), " https:llengineering.purdue.edu/Stratigraphy/gssp/.