Accepted Manuscript Ontology-driven data integration and visualization for exploring regional geologic time and paleontological information Chengbin Wang, Xiaogang Ma, Jianguo Chen PII:
S0098-3004(17)30518-6
DOI:
10.1016/j.cageo.2018.03.004
Reference:
CAGEO 4103
To appear in:
Computers and Geosciences
Received Date: 6 May 2017 Revised Date:
27 February 2018
Accepted Date: 5 March 2018
Please cite this article as: Wang, C., Ma, X., Chen, J., Ontology-driven data integration and visualization for exploring regional geologic time and paleontological information, Computers and Geosciences (2018), doi: 10.1016/j.cageo.2018.03.004. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT 1 2 3
Ontology-Driven Data Integration and Visualization for Exploring Regional Geologic Time and Paleontological Information
4
Chengbin Wang1, 2, Xiaogang Ma2*, Jianguo Chen1
5
1
6
Earth Resources, China University of Geosciences, Wuhan 430074, China
7
2
8
* Corresponding Author E-mail:
[email protected]
RI PT
State Key Laboratory of Geological Processes and Mineral Resources & Faculty of
SC
Department of Computer Science, University of Idaho, Moscow ID 83844, USA
9
Abstract: Initiatives of open data promote the online publication and sharing of large
11
amounts of geologic data. How to retrieve information and discover knowledge from
12
the big data is an ongoing challenge. In this paper, we developed an ontology-driven
13
data integration and visualization pilot system for exploring information of regional
14
geologic time, paleontology, and fundamental geology. The pilot system
15
(http://www2.cs.uidaho.edu/~max/gts/) implemented the following functions:
16
modeling and visualization of a geologic time scale ontology of North America,
17
interactive retrieval and display of fossil information, geologic map information query
18
and comparison with fossil information. A few case studies were carried out in the
19
pilot system for querying fossil occurrence records from Plaeobiology Database and
20
comparing them with information from the USGS geologic map services. The results
21
show that, to improve the compatibility between local and global geologic standards,
22
bridge gaps between different data sources, and create smart geoscience data services,
23
it is necessary to further extend and improve the existing geoscience ontologies and
24
use them to support functions to explore the open data.
AC C
EP
TE D
M AN U
10
1
ACCEPTED MANUSCRIPT 25 26
Key Words: Ontology; Local Geologic Time Standard; Paleontology; Geologic Map;
27
Open Data
RI PT
28 1 Introduction
30
In the past decade, the approaches of open data and big data have attracted increasing
31
attention, and they are now widely used in different knowledge domains. Many
32
governmental agencies and scientific organizations have published data on the
33
Internet for others to reuse. In the domain of geosciences, data of various subjects (e.g.
34
geologic map, mineral deposit, fossil, geochemistry, minerology and petrology, etc.)
35
are already made open and accessible online. Due to their disciplinary background,
36
those open geoscience data are usually stored in repositories each with its focused
37
subject and are short of inter-connections. There exist both challenges and
38
opportunities to make connections among those ‘silo’ data sources, develop
39
interactive data services, detect patterns in assembled datasets, and propose new
40
topics for knowledge discovery.
M AN U
TE D
EP
AC C
41
SC
29
42
To address challenges caused by the high volume, variety and velocity of big data,
43
Sheth (2014a, 2014b) utilized the semantic perception, agreement and continuous
44
semantics methods to transform big data into structured smart data, that is, data of
45
smaller volume and actionable information. He presented successful case studies in
46
personalized and actionable health information. Similar research topics of building
2
ACCEPTED MANUSCRIPT smart data and making knowledge discovery also exist in geosciences. The standards
48
developed by the World Wide Web Consortium (W3C) and the Open Geospatial
49
Consortium (OGC), such as XML (eXtensible Markup Language)1, RDF (Resource
50
Description Framework)2, RDFS (RDF Schema)3, SKOS (Simple Knowledge
51
Organization System)4, OWL (Web Ontology Language)5, WFS (Web Feature
52
Service)6, WMS (Web Map Service)7 and WCS (Web Coverage Service)8, provide
53
the fundamental building blocks for adding structures and meanings into online
54
datasets. In recent years, the geoscience community has made remarkable progress on
55
semantic models, ontologies and open data frameworks (Buccella et al., 2009; Ma and
56
Fox, 2013; Sen and Duffy, 2005; Zheng et al., 2015). Several geoscience data
57
frameworks, such as OneGeology9, USGIN (U.S. Geoscience Information Network)10,
58
AuScope11 and USGS (U.S. Geological Survey) Mineral Resources On-Line Spatial
59
Data12, have adopted semantic technologies or similar approaches to enrich the
60
discoverability, accessibility and interoperability of data services. Those efforts create
61
the space for exploring methods towards smart geoscience data and conducting
62
studies on information retrieval and knowledge discovery.
63
1
AC C
EP
TE D
M AN U
SC
RI PT
47
https://www.w3.org/XML/ https://www.w3.org/RDF/ 3 https://www.w3.org/TR/rdf-schema/ 4 https://www.w3.org/2004/02/skos/ 5 https://www.w3.org/OWL/ 6 http://www.opengeospatial.org/standards/wfs 7 http://www.opengeospatial.org/standards/wms 8 http://www.opengeospatial.org/standards/wcs 9 http://www.onegeology.org/ 10 http://usgin.org/ 11 http://www.auscope.org.au/ 12 https://mrdata.usgs.gov/ 2
3
ACCEPTED MANUSCRIPT Each ontology is the formal specification of the shared conceptualization of a domain
65
of study (Gruber, 1995). In the approach from big data to smart data, ontologies can
66
take an effective role to deal with the data heterogeneity through semantic enrichment
67
and concept mapping (Buccella et al., 2009; Duong et al., 2017; Sheth, 2014b;
68
Sotnykova et al., 2005; Su and Gulla, 2004). Mark et al. (2001) discussed that
69
ontologies can be categorized into geographic and conventional topics from the point
70
of view of geographers. In the domain of geosciences, we would say the focus of
71
geographic ontologies in Mark et al. (2001) is spatial information (cf. Buccella et al.,
72
2009; Buccella et al., 2011; Ma et al., 2011; Visser, 2005), and the conventional
73
ontologies are about domain-specific topics in geosciences, such as geologic time
74
scale (Cox, 2011; Cox and Richard, 2005; Ma et al., 2012; Ma and Fox, 2013),
75
geological modelling (Mastella et al.,2007; Perrin et al., 2005), geological structure
76
(Zhong et al., 2009), and rock deformation (Babaie and Davarpanah, 2018). Detailed
77
information about concepts and relationships within a focused domain can lead to
78
innovative functions in smart geoscience data, and will provide solid support to
79
geoscience researchers in data discovery and analysis.
SC
M AN U
TE D
EP
AC C
80
RI PT
64
81
The objective of this research is building an ontology for a local geologic time
82
standard and using it as a middle-ware to integrate and present information from
83
multiple sources that is hard to retrieve by using existing methods. In this work, we
84
first built an ontology for the local geologic time scale in North America. Second, we
85
developed an interactive visualization for the ontology. Third, we deployed the
4
ACCEPTED MANUSCRIPT ontology to integrate multi-subject information of fossils, geologic time and geologic
87
backgrounds, and we conducted case studies of focused topics. We developed
88
functions to implement the visualized ontology for interactive information query and
89
browsing on the user interface. To our knowledge, the work is the first example of
90
using a local geologic time scale ontology and data visualization to conduct efficient
91
and smart information retrieval. The presented research not only benefits the
92
integration and comparison of geologic time and fossil information, but also provides
93
practical experience on how to analyze and address the gaps between cross-domain
94
open data.
M AN U
SC
RI PT
86
95
The remainder of this paper is organized as follows: Section 2 describes the key
97
methods and technologies deployed in this research. Section 3 describes the
98
implementation of a pilot system and results. Section 4 analyzes the advantages and
99
limits of this study, and proposes a few topics for the future work. Finally, Section 5
EP
101
gives a brief conclusion.
AC C
100
TE D
96
102
2 Methods and Technologies
103
Geoscience is a discipline with heterogeneous terminologies being used in its various
104
sub-domains (Reitsma and Albrecht, 2005; Ma, 2015). This situation is also reflected
105
in the geoscience data. If a user is not familiar with the terminology used in a dataset,
106
it is hard for him to understand and use the data. There are already international
107
efforts on coordinating schemas, ontologies and vocabularies in geoscience, such as
5
ACCEPTED MANUSCRIPT the Commission for the Management and Application of Geoscience Information
109
within the International Union of Geological Sciences (CGI-IUGS)13. Nevertheless,
110
there is limited work on ontologies and vocabularies of local and regional standards.
111
On the other hand, legacy geoscience datasets are increasingly made accessible online
112
and many of them contain conceptual models or terminologies that are not part of
113
global standards. There are implicit connections between the local and global
114
standards, but in most cases we are short of a machine-readable model to represent
115
and describe such connections.
M AN U
SC
RI PT
108
116
Seeing both challenges and opportunities in the situation described above, we
118
designed and implemented a pilot study of ontology-driven data integration and
119
exploration focused on the local geologic time scale in North America. Technological
120
components included an ontology for the local geologic time scale, visualization of
121
the ontology, interfaces for accessing fossil occurrence records and geologic map
122
services, and interactive functions that use the ontology to support users to understand
123
and use datasets retrieved from multiple sources.
AC C
EP
TE D
117
124 125
2.1 An ontology for the local geologic time scale of North America
126
The geologic time scale, a contiguous framework of time intervals, is a system using
127
knowledges from stratigraphy, chronostratigraphy and paleontology to study the
128
Earth’s planet history (Gradstein et al., 2012). A global geologic time scale standard is
13
http://www.cgi-iugs.org 6
ACCEPTED MANUSCRIPT established and published by the International Commission on Stratigraphy (ICS)14. In
130
that standard, the lower boundaries of time intervals are in the process of being
131
defined by the Global Boundary Stratotype Section and Points (GSSP). In the global
132
geologic time scale, stage/age is the unit at the lowest level. The global stages are
133
defined by a continuous rock unit that contains biologic, geochemical, magnetic or
134
other methods for global correlation. Once a GSSP is defined, it will be marked as a
135
“Golden Spike”.
SC
RI PT
129
M AN U
136
Although the global standard has been accepted by geoscience researchers across the
138
world, national and regional geological surveys also refer to local geologic time
139
standards of tectono-stratigraphic divisions. For example, the local geologic time
140
scale in North America is established according to the specific strata sequences and
141
geologic evolution history of the region. There are three major differences between
142
them: (1) Different nomenclature and definition of intervals on the levels of Epoch
143
and Age. The geologic time scale in North America inherits the global standard on the
144
levels of Eon, Era, and Period, and establishes its unique standard on the levels of
145
Epoch and Age. Therefore, the divisions of Eon, Era and Period are identical in the
146
global and the North America standards. (2) Different start and end boundaries
147
between high- and low-level intervals in the local standard. In the global standard,
148
boundaries are coordinated, so high-level geologic time intervals usually share the
149
start and end boundaries with their low-level intervals (Fig. 1a). Some geologic time
AC C
EP
TE D
137
14
http://www.stratigraphy.org 7
ACCEPTED MANUSCRIPT intervals of Epoch and Age in the North America standard do not match exactly with
151
the Period intervals, which results in “cross-boundary” patterns in the local geologic
152
time scale (Fig. 1a, Fig. 4). (3) There are some unnamed intervals in the geologic time
153
scale of North America (Fig. 1b, Fig. 3). There could be many reasons for this
154
situation. For example, one reason could be the strata absence caused by sediment
155
hiatus in a period of geologic time, such as the absence of Triassic and Jurassic strata
156
(Brenner and Peterson, 1994). In another situation, although strata developed in the
157
geologic time, the limited work on nomenclature could also result in the unnamed
158
intervals. For example, the Devonian Period has named intervals on the Age level but
159
lacks intervals on the Epoch level. (Fig. 1b, Fig. 4).
M AN U
SC
RI PT
150
161
AC C
EP
TE D
160
162
Fig. 1 Comparison between global and North America geologic time standards. (a)
163
shows the “cross-boundary” pattern. In the global standard, intervals at the Period,
164
Epoch and Age levels share start and end boundaries in a coordinated framework. The
165
divided intervals of Epoch and Age in the local standard of the North America are
166
outside of such a framework and do not share boundaries with intervals in the global 8
ACCEPTED MANUSCRIPT 167
standard. (b) shows the interval without named intervals (missing data) on the Epoch
168
level in the local standard of the North America. The data used in this figure are from
169
Gradstein et al., (2012).
RI PT
170 The geologic time scale has both a hierarchal conceptual structure and an ordinal
172
temporal sequence (Cox and Richard, 2005; Michalak, 2005). Previous studies of
173
geologic time ontologies were mostly relevant to the semantic representation of the
174
global standard recommended by the ICS (Cox, 2011; Cox and Richard, 2015; Ma
175
and Fox, 2013). In those works, Semantic Web languages and schemas such as OWL,
176
RDF and SKOS were used to encode the hierarchal structure and ordinal sequence
177
(Cox and Richard, 2015; Ma et al., 2011; Ma and Fox, 2013). In this paper, we used
178
the JavaScript Object Notation for Linked Data (JSON-LD)15, a lightweight
179
data-interchange format based on the JavaScript Object Notation (JSON) to serialize
180
Linked Data and encode the hierarchal structure and temporal sequence of the local
181
geologic time scale of North America (Fig. 2). We referred to three major sources
182
(Haq, 2007; Rohde, 2005; TSCreator, 2017) for the list of local geologic time
183
intervals, their time boundaries, and the global-local geologic time interval mappings.
184
In particular, for Triassic and Jurassic there are no recorded intervals at Epoch and
185
Age levels in the North America standards. To avoid a big gap in the time scale, we
186
used those intervals of Triassic and Jurassic from the global standards. For the
AC C
EP
TE D
M AN U
SC
171
15
https://www.w3.org/TR/json-ld/ 9
ACCEPTED MANUSCRIPT 187
encoding part, we wrote the JSON-LD file of the ontology manually. The JSON-LD
188
file is accessible through the GitHub repository of this research16.
M AN U
SC
RI PT
189
190
Fig. 2 Part of JSON-LD code for representing hierarchal structure and temporal
192
sequence in the local geologic time scale of North America. (a) shows the code of a
193
“cross-boundary” pattern. The geologic time interval that crosses the boundary of two
194
parent nodes was divided into two sub-objects by the boundary. The two sub-objects
195
both inherit the same properties of the geologic time interval. (b) shows the code of an
196
unnamed geologic time interval. It was represented in a record without a node name.
197
Meaning of keywords: oid: object ID; name: node name of a geologic time interval;
198
rank: era rank; base: the start time boundary; top: the end time boundary; mid: the
199
middle of a geologic time interval; interval: time duration of geologic time interval. (c)
AC C
EP
TE D
191
16
https://github.com/xgmachina/geotimeNam/blob/master/Northamerica.json 10
ACCEPTED MANUSCRIPT 200
shows the definition of “context” in the JSON-LD encoding, which maps the
201
keywords to defined concepts in other existing ontologies and schemas.
202 As shown in Fig. 2, the braces in the JSON-LD file were used to tag the node records.
204
The properties were recorded in the braces and divided by commas. We employed the
205
bracket, colon, and default keyword “children” of JSON-LD to encode the hierarchal
206
structure of the geologic time scale. In the JSON-LD grammar, every node only has
207
one parallel parent node. In this research, a geologic time interval crossing the
208
boundary of two parent nodes was divided into two sub-objects by the boundary, and
209
the two sub-objects both inherit the same properties of that interval (Fig. 2a). To
210
create a complete framework for the ordinal-hierarchical structure, the unnamed
211
intervals were encoded by records with an empty node name (Fig. 2b), and they have
212
properties such as the start and end boundaries. To improve the interoperability of the
213
developed ontology, the “context” (Fig. 2c) maps the keywords to IRIs
214
(Internationalized Resource Indicator) of defined concepts in existing ontologies and
215
schemas.
AC C
216
EP
TE D
M AN U
SC
RI PT
203
217
2.2 Retrieving multi-source and multi-disciplinary geoscience data
218
The Paleobiology Database (PBDB)17 is an open database of paleontological data
219
(Peters and McClennen, 2015; Peters et al., 2014; Varela et al., 2015), which includes
220
184, 259 collections, 350, 487 taxa and 1,325,725 occurrences in early March, 2017.
221
It provides two types of data services: (1) explore the fossil information through a 17
https://paleobiodb.org/ 11
ACCEPTED MANUSCRIPT web browser or a mobile application and (2) retrieve fossil information through the
223
PBDB API (Application Program Interface). The former is for human users and the
224
latter is for machines. In this work, we developed functions to use information of the
225
named intervals in the local geologic time ontology of North America to retrieve
226
fossil information (e.g. fossil occurrence location , accepted name, taxonomy,
227
formation, reference, etc.) through the PBDB API. To enable the integration of more
228
information, the retrieved fossil information was displayed together with geologic and
229
geographic base maps. We developed functions to set up connections between the
230
ontology and those data sources so they can be queried and compared interactively.
M AN U
SC
RI PT
222
231
The Web Map Service18 (WMS) is a standard protocol released by the Open
233
Geospatial Consortium (OGC) for building geospatial data services on the Web. It is
234
widely used in the web GIS application development for setting up geospatial data
235
services, including geologic data. The Mineral Resources On-Line Spatial Data is an
236
open database developed by the USGS mineral program. It provides data services of
237
mineral resource, geology, geochemistry, geophysics, and more. WMS is one of the
238
many standards applied by that database. In this work, we utilized the WMS “GetMap”
239
and “GetFeatureInfo” functions to obtain the geologic map and feature properties
240
from the USGS database, respectively. The retrieved information was used as a
241
background map for the fossil information retrieved from PBDB. By integrating all
242
those datasets, the developed system enables interested researchers to explore further
AC C
EP
TE D
232
18
http://www.opengeospatial.org/standards/wms 12
ACCEPTED MANUSCRIPT 243
geologic information of a focused area.
244 2.3 Ontology as an interface between geologic time scale, paleontology and
246
fundamental geology
247
A specific function enabled by ontologies is the semantic inference which can reveal
248
new information and knowledge in a cross-domain context (Katifori et al., 2007). For
249
each specific science domain, there are characteristic entities, properties and
250
organizational structures that can be used in semantic reasoning and inference.
251
Paleontology covers topics of taxon, position, location, age, fossil strata, source
252
reference, and more. Fundamental geology, as revealed by a geologic map, usually
253
contains geologic units, boundary, age, lithology, strata, color, location and reference
254
information. Geologic time scale includes topics of age, early age, late age, duration
255
time, as well as the hierarchal conceptual structure and the temporal sequence
256
reflected in those concepts.
SC
M AN U
TE D
EP AC C
257
RI PT
245
258 13
ACCEPTED MANUSCRIPT Fig. 3 The relationship network of geologic time scale, paleontology and geologic
260
map service. Paleo: paleontology; GTS: geologic times-scale; GM: geologic map
261
service. The bold lines show the path used in this work to connect the Paleo, GM and
262
GTS together.
RI PT
259
263
With the help of the developed ontology, the common concepts among geologic time
265
scale, paleontology and geology were used to set up connections among the several
266
resources in this study (Fig. 3). Although there are several paths to link them together,
267
we selected the GTS-Age-Paleo-Location-GM route (bold solid line in Fig. 3) to build
268
the pilot system for information retrieval and knowledge discovery in this work. The
269
chosen route contains topics of time, location, fossil and fundamental geology, which
270
make it and is a good case study to make use of all the data resources and conduct
271
further exploration.
M AN U
TE D
272
SC
264
3 Implementation, Prototype System and Results
274
3.1 Interactive visualization for the local geologic time ontology of North America
275
To visualize a geologic time ontology, a good way is needed for representing both the
276
hierarchical conceptual structure and the ordinal time sequence (Cox, 2011; Cox and
277
Richard, 2005; Ma et al., 2012; Ma and Fox, 2013). In previous works, both
278
ActionScript and JavaScript languages have been used to visualize the global geologic
279
time scale ontology (Ma et al., 2012; Ma et al., 2016). To get a human-friendly
AC C
EP
273
14
ACCEPTED MANUSCRIPT interaction, we adapted the open code of John Czaplewski19 and visualized the
281
geologic time scale ontology of North America as an interactive partition displaying
282
the hierarchical structure, temporal sequence, and annotation of geologic time
283
intervals. D320 (Data-Driven Documents), an open-source JavaScript library to
284
produce dynamic, interactive data visualization, was used as the basic library in the
285
visualization. The resulting visualization was in JavaScript-driven SVG (Scalable
286
Vector Graphics) format, which provided a functional tool to deploy the JSON-LD
287
file of the ontology in an interactive web application (cf. Stefani et al., 2014).
M AN U
SC
RI PT
280
288
The resulting visualization shows the hierarchical and ordinal relationship of geologic
290
time intervals (Fig. 4). The nodes laying out from the left to right follow the Earth’s
291
history from the earlier to later. The width of every node represents the time duration
292
of the corresponding geologic time interval, which is read from the JSON-LD records.
293
From top to bottom, the hierarchical structure represents the node levels decrease
294
from the Eon in the first layer to Age in the fifth layer. The name of a node is
295
annotated on the node partition, which will be displayed when the node partition
296
zooms in and gets enough space for the text of the name, and will be hidden when the
297
node partition zooms out (Fig. 4).
AC C
EP
TE D
289
298
19 20
http://bl.ocks.org/jczaplew/7546689 https://d3js.org/ 15
RI PT
ACCEPTED MANUSCRIPT
299
Fig.4 Interactive visualization of the local geologic time scale ontology of North
301
America.
SC
300
M AN U
302 3.2 System interface, interaction and exploration
304
We developed a user-friendly pilot system21 that connects elements from geologic
305
time scale, paleontology, and WMS geologic map service together. The interface is
306
showed in Fig. 5. It includes the visualized geologic time scale ontology at the bottom,
307
a main window in the center displaying geologic and geographic base maps and fossil
308
locations, radio buttons at the top left corner for choosing layers for further
309
information query, a dropdown list to change the map window to different states in
310
U.S., and dynamic pop-up windows for displaying query results.
AC C
EP
TE D
303
311
21
http://www2.cs.uidaho.edu/~max/gts/ 16
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
312
Fig.5 User interface of developed pilot system. (a) shows the blank area in a geologic
314
map layer; (b) “undefined” formation indicates there is no information about the strata
315
that contains the fossil; (c) and (d) show the information of the same area from the
316
map layer and the fossil occurrence records, respectively. A topic of research interest
317
here is the differences between the information in (c) and (d), which may lead to new
318
studies about the area.
EP
AC C
319
TE D
313
320
The general workflow in the pilot system includes the following steps. First, a user
321
can navigate in the ontology visualization to find an interval of interest. Second, the
322
user can double click the node of the selected time interval, the system will retrieve
323
the base and top time boundaries of that interval and send them to PBDB for
17
ACCEPTED MANUSCRIPT retrieving relevant fossil occurrence records within that time coverage, and display the
325
records in the map window. There is also an information window on the top left
326
corner of the map window to show the selected interval and the base and top time
327
boundaries of it. Third, the user can use the radio buttons on the top left corner to
328
choose the object layer (USGS geologic map or Fossil) for querying attribute
329
information. For example, when the ‘USGS’ layer is selected, the user can retrieve the
330
geologic information of a place on the map by a mouse click. When the ‘Fossil’ layer
331
is selected, the user can click spots in the fossil occurrence layer to see attribute of
332
fossils. The retrieved information is displayed in a mini pop-up window at the mouse
333
click point. During the process, the user can also change the center of the map
334
window to different states in the U.S. by making selections in a dropdown list at the
335
top left.
SC
M AN U
TE D
336
RI PT
324
There are a few specific settings in the current pilot system. Through the PBDB API,
338
abundant attributes of fossil occurrence records could be retrieved, but in the current
339
system we only displayed a short list of information to compare with the geologic
340
information from the USGS open dataset. In the map window, the geologic maps of
341
different states will be automatically loaded when the zoom-in level of the window is
342
8 or higher. The user can also zoom out to see the nationwide distribution of fossil
343
occurrences in U.S., but the geologic map will disappear. To see the geologic map in
344
the background, the user need to zoom in to a focused area to ensure the zoom-in
345
level is higher than 8. To make the operation convenient, we created a dropdown list
AC C
EP
337
18
ACCEPTED MANUSCRIPT 346
of states in U.S to help users select areas of interest. The geologic map of the selected
347
state will be loaded automatically in the window.
348 Besides retrieving and integrating information from multiple sources, a more specific
350
goal in this study is using the ontology for semantic connection, reasoning and
351
knowledge discovery. In other words, we extracted the relevant datasets and then
352
structured and connected them as “smart data” rather than listing them as separate
353
items. Based on the retrieved information, we can use the semantic reasoning to
354
obtain further information of an entity and its related entities. For example, we
355
retrieved the fossil information within a period of time by using the start and end
356
boundaries of named intervals from the geologic time ontology, rather than just using
357
label match. Then, the fossil location information was used to retrieve the geologic
358
map information from the USGS Mineral Resources On-Line Spatial Data, such as
359
geologic unit, geologic age, petrology, and more. We were also able to compare the
360
information from the different sources to find the similarities and differences (Fig. 5c
361
and d).
SC
M AN U
TE D
EP
AC C
362
RI PT
349
363
In the geologic time scale ontology, each geologic time interval is an entity linked
364
with other entities. We can use the logical relationships in the ontology to find
365
relevant entities within the geologic time scale. For example, Permian is not only a
366
time interval from 298.9 Ma to 252.2 Ma (Gradstein et al., 2012), but is also a child
367
node of Paleozoic, and the parent node of Wolfcampian, Leonardian, Guadalupian
19
ACCEPTED MANUSCRIPT and Ochoan Epochs. Those relationships are useful in the development of functions
369
for information retrieval on the Web. For example, a user wants to find some typical
370
records whose occurrence time are within the interval of Permian. If the time
371
information is only recorded with the literal labels of geologic time intervals, then the
372
search cannot be only done with the label ‘Permian’ but also the labels of the child
373
intervals of Permian. The geologic time scale ontology can quickly provide all the
374
child intervals of Permian. Such functions were developed in one of our previous
375
studies (Ma et al., 2012).
M AN U
SC
RI PT
368
376
In this pilot system, the data source used for query is PBDB API, which provides
378
several channels for querying fossil occurrence records within a period of time22. The
379
first is through the input of the names of one or more geologic time intervals. The
380
PBDB system has a collection of geologic time terms and their corresponding start
381
and end time boundaries. If more than one label is input, then the time range used in
382
the query will be the contiguous period from the start of the earliest interval to the end
383
of the latest interval. The second query method in PBDB API is through the input of a
384
maximum age and/or a minimum age. A key part of our work is the local geologic
385
time scale of the North America. If we query with the names of time intervals in the
386
built ontology, for some intervals (e.g. Wolfcampian or Leonardian) there will be no
387
results. A possible reason is that those intervals are not included in the time term
388
collection of PBDB or alternative names (e.g. the label ‘Wolfcamp’ in Fig. 5d) are
AC C
EP
TE D
377
22
https://paleobiodb.org/data1.2/occs_doc.html 20
ACCEPTED MANUSCRIPT used. To address this issue, we used the second method enabled by PBDB to develop
390
the query function in the pilot system. When an interval is selected by the user (i.e.
391
double click in the ontology visualization), a function will obtain the base and top
392
boundaries of that interval in the ontology and then use them as input of maximum
393
and minimum ages to query fossil occurrence records. For example, we could retrieve
394
records from PBDB for Wolfcampian or Leonardian through this function.
RI PT
389
SC
395 4 Discussion
397
In this study, we used JSON-LD to encode the geologic time scale ontology of North
398
America, which proves the functionality of JSON-LD as a Human-Machine readable
399
and lightweight data-interchange format. We used the JSON-LD syntax, such as
400
braces, brackets and default keywords to encode the hierarchal structure and temporal
401
sequence of a geologic time ontology. More specifically, we used the “context” to
402
map the keywords in our ontology to IRIs of defined concepts in other existing
403
ontologies. In previous works, XML, OWL and SKOS have been used to encode the
404
geologic time scale ontologies (Raskin and Pan, 2005; Cox and Richard, 2005; Ma et
405
al. 2011; Ma et al., 2012; Ma and Fox, 2013). Compared with these works, the
406
JSON-LD code for geologic time scale ontology is a concise format for data
407
visualization, and can be used directly in the development of the user interface of
408
applications.
AC C
EP
TE D
M AN U
396
409
21
ACCEPTED MANUSCRIPT To address the needs of ontology visualization on the user interface, several specific
411
objects were added in the developed JSON-LD file. For example, as described in
412
section 2.1, two sub-objects of a geologic time interval were used to describe the
413
“cross-boundary” pattern and they both inherited the same properties of that interval.
414
While this helps layout the ordinal-hierarchical structure in the resulting visualization,
415
the two sub-object intervals are identical, which may lead to potential issues in logic
416
reasoning. For example, in this research the operations on the visualized ontology
417
were limited to single click for zooming into an interval and double click for querying
418
fossil occurrences within the time coverage of an interval. The data structure and the
419
content of the JSON-LD file met the needs of those operations. But for some other
420
operations, such as querying the number of intervals at the Epoch level, the
421
above-mentioned specific objects in the JSON-LD file will lead to incorrect results.
422
One approach for addressing this issue is to have separate ontologies to store precise
423
information, and then build the JSON-LD file for visualization as a compatible
424
adaptation from those ontologies (cf. Ma et al. 2016; Ma 2017).
SC
M AN U
TE D
EP
AC C
425
RI PT
410
426
As shown in the context of the JSON-LD file (Fig. 2c), this ontology was built on the
427
top of a few other existing ontologies developed by Cox and Richard (2015). The
428
focus of this research was the intervals in the local geologic time scale of the North
429
America and their mappings to the global geologic time scale. The resulting
430
JSON-LD file contained just the key information that addresses the needs of ontology
431
visualization and fossil information retrieval in the pilot system. In our previous
22
ACCEPTED MANUSCRIPT works, we had conducted logic reasoning through the ordinal-hierarchical structure of
433
the geologic time scale (Ma et al., 2012), and we had planned similar works in this
434
research. For example, in the developed pilot system if a user selects (i.e., double
435
click) ‘Permian’ to find fossil occurrence records in PBDB, there could be a function
436
to find all the child intervals of ‘Permian’ and use them together with ‘Permian’ as
437
input for query. However, in this research we found that some intervals in the local
438
geological time scale of the North America could not be recognized by PBDB.
439
Therefore, instead of using the logic relationships between the time intervals, we just
440
obtained the based and top time boundaries of a selected interval from the ontology
441
and then use them in the query sent to PBDB.
M AN U
SC
RI PT
432
442
A driving force of this study is the feedbacks (personal communication) from
444
geoscientists on our previous works (Ma et al., 2012; Ma et al., 2016). From
445
conversations with colleagues in the field of paleontology, geobiology and
446
stratigraphy, we found that Web applications with visualized geoscience knowledge
447
and map services are useful in their research and teaching work. A suggestion from
448
Prof. Miriam Katz on the visualization was that a horizontal or vertical layout is more
449
familiar to geoscientists than the sunburst layout in our previous work, and this pilot
450
system realized that layout. The system and a tutorial document were accessible
451
online (see footnote 21), and the source code of the system was also published23. We
AC C
EP
TE D
443
23
https://github.com/xgmachina/geotimeNam 23
ACCEPTED MANUSCRIPT 452
will continuously collect feedbacks and suggestions from colleagues, especially
453
geoscientists, and improve the functions of the system.
454 A well-organized ontology model is an efficient interface to link different research
456
domains and this poses a lot of potential for geoscience research. In this study, we
457
used the ontology model to link the structured data between geologic time scale,
458
paleontology and fundamental geology, and set up the functions of information
459
retrieval and inference. It was not our intention to redesign the wheel of the existing
460
functions in PBDB, but the collected intervals for the local geologic time scale of the
461
North America could complement the time term collection in PBDB. We also have a
462
plan to encode more local geologic time standards such as those listed in Haq (2007)
463
into formats for the Semantic Web and use them to build applications for querying
464
open data on the Web.
SC
M AN U
TE D
465
RI PT
455
Among the various open geoscience data on the Web, a topic of interest is to explore
467
the background of gaps among different data sources and study ways to bridge the
468
gaps in data integration. In this research, several gaps in the cross-domain open data
469
were found: (1) Data missing - there are empty properties for the fossil occurrences
470
and blank areas for the WMS geologic map (Fig. 5a and b). (2) Synonym expression -
471
there are significant differences in the Epoch and Age division and nomenclature of
472
geologic time scale between global and North America standards. (3) The
473
paleontology research and geologic mapping service refer to different geologic time
AC C
EP
466
24
ACCEPTED MANUSCRIPT scale standards. In this study, we found that there are different records describing the
475
same object (e.g. rock age, lithology) between PBDB and USGS databases (Fig. 5c
476
and d). There could be many reasons for such differences, and a further study of the
477
background information may lead to new topics for research. For example, in Fig. 5c
478
and d, the point on the geologic map is at the edge of eroding Permian sediments, so
479
the fossils in that alluvial can include those from Permian.
SC
480
RI PT
474
Some data gaps could be caused by the differences between local and global standards.
482
Geological investigation and interpretation is a scientific domain with subjectivity.
483
Geologic standards are designed to reduce the subjectivity and enhance the objectivity
484
at a national or local scale. Nevertheless, massive separately developed standards
485
raise challenges against data exchange at a global scale. To address this challenge, the
486
standards could be made open and published in machine-readable formats. We urge
487
more researches to encode local geologic data standards in Web-compatible formats
488
and publish them online for reuse. Once a big number of those local standards are
489
made openly accessible, calibrations and connections among those local standards as
490
well as between local and global standards can be made to promote the
491
interoperability of datasets.
TE D
EP
AC C
492
M AN U
481
493
There are several concepts can be used to explore the relationships between geologic
494
time scale, paleontology, and fundamental geology, as shown in Figure 3. They lay
495
out the space for extending and improving the developed system in the future. For
25
ACCEPTED MANUSCRIPT 496
example, in the developed pilot system, we used GTS-Age-Paleo-Location-GM route
497
to implement the functions of information retrieve and reasoning. We also can use
498
other routes to implement similar functions and conduct bidirectional query.
RI PT
499 The current system (http://www2.cs.uidaho.edu/~max/gts/) is in its prototype stage,
501
and several future works can be proposed. The first is to collect more local geologic
502
time standards, match them with the global geologic time standard, and use them to
503
enrich the built ontology. The pilot study in this paper is an example about the local
504
geologic time standard of North America. If the ontology is enriched with more local
505
geologic time standards, it will be more useful in data search and integration on the
506
Web. The second is to enrich the interactive functions on the user interface. This
507
include the ontology content, the visualization, the map window operations, and the
508
rendering and display of multisource information, such as geologic maps and fossil
509
occurrence records. We will invite colleagues in the geoscience community to use the
510
pilot system and provide feedbacks, then we will revise the system for its next version.
511
Beside user feedback, we also plan a few other updates. For example, in addition to
512
the visualized ontology, we can also build a free text and time boundary search, so a
513
user can input a time term or two numeric time boundaries to search fossil occurrence
514
records.
AC C
EP
TE D
M AN U
SC
500
515 516
5 Conclusion
26
ACCEPTED MANUSCRIPT Geologic data have been increasingly made open online, while methods and tools to
518
obtain knowledge from them are underdeveloped. Based on the open geologic data,
519
we employed an ontology model to design and implement a pilot system crossing the
520
domains of local geologic time scale of North America, paleontology, and
521
fundamental geology. The pilot system (http://www2.cs.uidaho.edu/~max/gts/)
522
realized the following functions: visualization of the geologic time ontology of North
523
America, interactive fossil information retrieving and displaying, and query and
524
comparison of fossil information and geologic map information. It is an interactive
525
and integrated system for fossil, geologic map service, and geologic time scale of
526
North America, and is proved useful in helping researchers explore information of
527
interest and propose new research topics.
M AN U
SC
RI PT
517
TE D
528
The compatibility between local and global geoscience standards and gaps between
530
different data sources will be a long-term challenge for the application of ontology
531
models in geoscience knowledge discovery. To address these challenges, it is
532
necessary to extend and improve the existing geologic ontology models to address
533
broad compatibility, and call for more ontologies of local data standards (e.g. those
534
listed in Haq, 2007 and Rohde, 2005) being developed and connected to improve the
535
interoperability of datasets from different sources.
AC C
EP
529
536 537
Acknowledgement
538 539
This work was partly supported by the National Science Foundation (NSF) through the NSF Idaho EPSCoR Program (award number IIA-1301792) and the W. M. Keck 27
ACCEPTED MANUSCRIPT 540 541 542 543 544 545
Foundation through the grant “The Co-Evolution of the Geo- and Biospheres: An Integrated Program for Data-Driven Abductive Discovery in Earth Sciences”. We thank USGS and Geophysical Laboratory at Carnegie Institution for Science for financial supports to our attendance at the 2017 USGS-DTDI workshop at Reston, VA. We also thank two anonymous reviewers and the editor Prof. Gregoire Mariethoz for their constructive comments on an earlier version of the manuscript.
RI PT
546 547
References
548
Babaie, H.A., Davarpanah, A., 2018. Semantic modeling of plastic deformation of
550
polycrystalline rock. Computs & Geosciences, 111, 213-222.
SC
549
Brenner, R. L., Peterson, J.A.,1994. Jurassic sedimentary history of the northern portion of the Western Interior Seaway, USA. In: Caputo, M.V., Peterson, J.A.,
552
Franczyk, K. J. (eds.) Mesozoic Systems of the Rocky Mountain Region: The
553
Rocky Mountain Section SEPM, Denver, pp. 217–232.
554
M AN U
551
Buccella, A., Cechich, A., Fillottrani, P., 2009. Ontology-driven geographic information integration: A survey of current approaches. Computs & Geosciences,
556
35, 710-723.
TE D
555
Buccella, A., Cechich, A., Gendarmi, D., Lanubile, F., Semeraro, G., Colagrossi, A.,
558
2011. Building a global normalized ontology for integrating geographic data
559
sources. Computs & Geosciences, 37, 893-916.
561 562
AC C
560
EP
557
Cox, S., 2011. OWL representation of the geologic timescale implementing stratigraphic best practice, Proceedings of AGU 2011 Fall Meeting, San Francisco,
abstract IN31B-1440.
563
Cox, S.J.D., Richard, S.M., 2005. A formal model for the geologic time scale and
564
global stratotype section and point, compatible with geospatial information
565
transfer standards. Geosphere 1(3), 119-137. 28
ACCEPTED MANUSCRIPT
567 568 569 570
Cox, S.J.D., Richard, S.M., 2015. A geologic timescale ontology and service. Earth Science Informatics, 8(1), 5-19. De Donatis, M., Bruciatelli, L., 2006. MAP IT: The GIS software for field mapping with tablet pc. Computs & Geosciences, 32, 673-680.
RI PT
566
Duong, T.H., Nguyen, H.Q., Jo, G.S., 2017. Smart Data: Where the Big Data Meets
the Semantics. Computational Intelligence and Neuroscience, 2017: 6925138. doi:
572
10.1155/2017/6925138
574 575 576
Gradstein, F.M., Ogg, J.G., Schmitz, M., Ogg, G., 2012. The Geologic Time Scale.
M AN U
573
SC
571
Elsevier, Kidlington, UK, 1176 pp.
Gruber, T.R., 1995. Toward principles for the design of ontologies used for knowledge sharing. International Journal Human–Computer Studies, 43 (5–6), 907–928. Haq, B.U., 2007. The Geological Time Table, 6th ed. Amsterdam: Elsevier. 1 p.
578
Ma, X., 2015. Geoinformatics in the Semantic Web. In: Schaeben, J., Delgado, R.T.,
TE D
577
van den Boogaart K.G., van den Boogaart, R. (eds.) Proceedings of the IAMG
580
2015 Annual Conference, Freiberg, Germany, 9 pp.
582 583 584
Ma, X., 2017. Linked Geoscience Data in practice: where W3C standards meet
AC C
581
EP
579
domain knowledge, data visualization and OGC standards. Earth Science
Informatics, 10(4), 429-441.
Ma, X., Carranza, E.J.M., Wu, C., van der Meer, F.D., 2012. Ontology-aided
585
annotation, visualization, and generalization of geological time-scale information
586
from online geological map services. Computs & Geosciences, 40, 107-119.
587
Ma, X., Carranza, E.J.M., Wu, C., van der Meer, F.D., Liu, G., 2011. A SKOS-based
29
ACCEPTED MANUSCRIPT 588
multilingual thesaurus of geological time scale for interoperability of online
589
geological maps. Computs & Geosciences, 37, 1602-1615.
591
Ma, X., Fox, P., 2013. Recent progress on geologic time ontologies and considerations for future works. Earth Science Informatics, 6, 31-46.
RI PT
590
Ma, X., Fu, L., Fox, P., Liu, G., 2016. An integrated golden spike information portal
593
enabled by data visualization and semantic web technologies. In: Raju, N.J. (ed.)
594
Geostatistical and Geospatial Approaches for the Characterization of Natural
595
Resources in the Environment. Springer, Cham, Switzerland, pp. 829-833.
596
Mastella, L.S., Abel, M., De Ros, L.F., Perrin, M. and Rainaud, J.F., 2007. Event
M AN U
SC
592
ordering reasoning ontology applied to petrology and geological modelling. In:
598
Castillo, O., Melin, P., Ross, O.M., Cruz, R.S., Pedrycz, W., Kacprzyk, J. (eds.)
599
Theoretical Advances and Applications of Fuzzy Logic and Soft Computing,
600
Springer, Berlin/Heidelberg, pp. 465-475.
601
TE D
597
Mark, D.M., Skupin, A., Smith, B., 2001. Features, objects, and other things: Ontological distinctions in the geographic domain. In: Montello, D.R. (ed.)
603
Proceedings of the International Conference on Spatial Information Theory, Morro
605 606 607
AC C
604
EP
602
Bay, CA, pp. 489-502.
Michalak, J., 2005. Topological conceptual model of geological relative time scale for geoinformation systems. Computs & Geosciences, 31, 865-876.
Perrin, M., Zhu, B., Rainaud, J.F. and Schneider, S., 2005. Knowledge-driven
608
applications for geological modeling. Journal Petroleum Science and Engineering
609
47(1), 89-104.
30
ACCEPTED MANUSCRIPT 610
Peters, S.E., Zhang, C., Livny, M., Re, C., 2014. A machine reading system for
611
assembling synthetic paleontological databases. PLoS One 9, e113523. doi:
612
10.1371/journal.pone.0113523 Raskin, R.G., Pan, M.J., 2005. Knowledge representation in the semantic web for
RI PT
613 614
Earth and environmental terminology (SWEET). Computs & Geosciences, 31,
615
1119-1125.
Reitsma, F. and Albrecht, J., 2005. Modeling with the Semantic Web in the
SC
616
Geosciences. IEEE Intelligent Systems, 20(2), pp.86-88.
618
Rohde, R.A., 2005, Introduction to the GeoWhen Database.
M AN U
617
619
http://www.stratigraphy.org/bak/geowhen/index.html (Accessed on December 15,
620
2017)
622 623
Sen M, Duffy T.,2005. GeoSciML: development of a generic geoscience markup
TE D
621
language. Computs & Geosciences, 31(9),1095-103. Sheth, A., 2014a. Smart Data - How you and I will exploit Big Data for personalized digital health and many other activities. In: Proceedings of the 2014 IEEE
625
International Conference on Big Data, Washington, DC, pp. 2-3.
AC C
EP
624
626
Sheth, A., 2014b. Transforming big data into smart data: Deriving value via
627
harnessing volume, variety, and velocity using semantic techniques and
628 629 630 631
technologies, In: Proceedings of the 30th IEEE International Conference on Data
Engineering (ICDE), Chicago, IL, pp. 2-2. Sotnykova, A., Vangenot, C., Cullot, N., Bennacer, N., Aufaure, M.-A., 2005. Semantic mappings in description logics for spatio-temporal database schema
31
ACCEPTED MANUSCRIPT 632
integration. In: Spaccapietra, S., Zimanyi (eds.) Lecture Notes in Computer
633
Science (vol. 3534). Springer, Berling/Heidelberg, pp. 143-167. Stefani, C., Brunetaud, X., Janvier-Badosa, S., Beck, K.v., De Luca, L., Al-Mukhtar,
635
M., 2014. Developing a toolkit for mapping and displaying stone alteration on a
636
web-based documentation platform. Journal of Culture Heritage, 15, 1-9.
RI PT
634
Su, X., Gulla, J.A., 2004. Semantic enrichment for ontology mapping. In: Meziane, F.,
638
Metais, E. (eds.) Proceedings of the 9th International Conference on Application
639
of Natural Language to Information Systems (NLDB 2004), Salford, UK, pp.
640
217-228.
M AN U
641
SC
637
TSCreator, 2017. Time Scale Creator.
https://engineering.purdue.edu/Stratigraphy/tscreator/index/index.php. (Accessed
643
on February 22, 2018)
TE D
642
Varela, S., González-Hernández, J., Sgarbi, L.F., Marshall, C., Uhen, M.D., Peters, S.,
645
McClennen, M., 2015. paleobioDB: an R package for downloading, visualizing
646
and processing data from the Paleobiology Database. Ecography, 38, 419-425.
EP
644
Visser, U., 2005. Intelligent information integration for the Semantic Web. Springer.
648
Zheng, J.G., Fu, L., Ma, X., Fox, P., 2015. SEM+: tool for discovering concept
649 650 651
AC C
647
mapping in Earth science related domain. Earth Science Informatics, 8, 95-102.
Zhong, J., Aydina, A. and McGuinness, D.L., 2009. Ontology of fractures. Journal of Structural Geology, 31(3), 251-259.
652
32
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Build and visualize an ontology for the local geologic time scale of North America Ontology-driven retrieval and display of fossil occurrences and geologic maps Multi-source information query and comparison to enable exploratory analysis A successful case study towards smart geoscience data service