Zhishi.schema Explorer: A Platform for Exploring ... - Springer Link

4 downloads 66725 Views 1MB Size Report
paper, we present Zhishi.schema Explorer for exploring Chinese linked ... It consists of a directed graph where nodes represent con- cepts, and edges .... to provide more APIs as a programming interface of Zhishi.schema, thereby fostering ...
Zhishi.schema Explorer: A Platform for Exploring Chinese Linked Open Schema Tianxing Wu1 , Guilin Qi1 , and Haofen Wang2

2

1 Southeast University, China {wutianxing,gqi}@seu.edu.cn East China University of Science and Technology, China [email protected]

Abstract. Knowledge on schema level is vital for the development of Semantic Web, but the number of schema information in Linking Open Data (LOD) is limited. We approach this problem by contributing to the complementary part of LOD, that is, Linking Open Schema (LOS), which helps close the gap between lightweight LOD and expressive ontologies by adding more expressive ontological axioms between concepts. In this paper, we present Zhishi.schema Explorer for exploring Chinese linked open schema - Zhishi.schema. Zhishi.schema Explorer provides Lookup Service and SPARQL Endpoint, which respectively allow querying with concept labels and the SPARQL language.

Keywords: Linking Open Schema, Zhishi.schema, Zhishi.schema Explorer

1

Introduction

After the Linked Data [1] project initiated the efforts to connect the semantic data across the Web, there have been more than 200 datasets within the Linking Open Data (LOD)3 cloud, which is the largest community effort for semantic data publishing. While LOD contains billions of triples describing millions of entities, their attributes and relationships, the number of schemas in current LOD is limited, let alone the schemas having labels in Chinese. Yago [5,6] defines explicit schema to describe concept subsumptions as well as domains and ranges of properties. However, the quality of the schema is not always satisfactory. Freebase [3] has a very shallow taxonomy with domains and types. If we consider the schemas having labels in Chinese, the number is even smaller. DBpedia Ontology [2] (DBPO) enables users to define mapping rules to generate high-quality schema from ill-defined raw RDF data, but DBPO does not have the Chinese version. Zhishi.me [4] is the first effort to publish Chinese Linking Open Data, but it does not define an ontology to describe the schema information of the published semantic data. We approach the problem of schema sparseness by contributing to the complementary part of LOD, that is, Linking Open Schema (LOS). LOS aims at 3

http://linkeddata.org/

Fig. 1. The Overview of Current Chinese Social Web Sites

closing the gap between lightweight LOD and expressive ontologies by adding more expressive ontological axioms between concepts. Links in LOS are created between concepts from different sources and are not limited to equivalence relations. In this paper, we present Zhishi.schema Explorer, a platform for exploring Zhishi.schema [7], which is the first effort of publishing Chinese linked open schema. In the following, we first introduce the Zhishi.schema dataset in Section 2. Then, Section 3 describes Zhishi.schema Explorer, which provides Lookup Service and SPARQL Endpoint that allow querying with concept labels and the SPARQL language respectively. Finally, we conclude this paper and outline future work in Section 4.

2

Zhishi.schema

Zhishi.schema is not only an integrated concept taxonomy, but also a large semantic network. It consists of a directed graph where nodes represent concepts, and edges stand for semantic relations between them. These concepts and relations are harvested from navigational categories as well as dynamic tags in more than 50 various most popular Web sites in China, which cover all kinds of current Chinese social Web sites summarized in Figure 1. Zhishi.schema comprises 408,069 concept labels in which 328,288 are categories and 79,781 are

tags, while the semantic relations include 1,560,725 subclassOf relations, 22,672 equal relations and 229,167 relate relations. The navigational categories are organized in a hierarchical way. In a category hierarchy, a category might be associated with zero or several parent categories as well as child categories. A navigational category is called a static category as it is relatively stable and predefined by the Web site. The tags are organized in a flat manner and called dynamic tags because they are created on the fly by Web users. In fact, a tag can be treated as a single node category with no parents or children. Typical examples of navigational (static) categories and dynamic tags are given in Figure 2.

(a) Navigational (Static) Categories

(b) Dynamic Tags

Fig. 2. Typical Examples of Navigational (Static) Categories and Dynamic Tags

According to the Linked Data principles4 , Zhishi.schema creates URIs for all concepts. The URI pattern http://zhishi.schema/[site]/[concept type]/ [label] comprises of fours parts. http://zhishi.schema/ is the default namespace. The second part tells the provenance of the concept. For a category, the third part of its URI is static. If it is a tag, the part is dynamic. The last part is the label of a concept. In order to unify coding, the second and last parts are encoded into UTF-85 . Totally, Zhishi.schema contains 6 types of data: labels, resource site labels, links, subclassOf relations, equal relations and relate relations. They are explained in detail as follows: – Labels: All concepts in Zhishi.schema have a name, which is used as a rdfs:label for the corresponding Zhishi.schema resource. 4 5

http://www.w3.org/DesignIssues/LinkedData.html/ http://www.utf-8.com/

Table 1. Detailed Information of HAG Depth of HAG Number of Provenances for Concepts Concept Number 1 51 18,925 2 51 14,280 3 49 97,997 4 43 95,342 5 43 37,423 6 38 18,986 7 36 9,725 8 35 6,684 9 28 5,148 10 17 3,287 11 12 2,026 12 8 589 13 6 152 14 4 44 15 2 17 16 1 6

– Resource Site Labels: Since concepts are extracted from different Web sites, each of them has its own provenance. These provenances are represented with the predicate zhishi.schema:resource site label. – Links: For each concept, Zhishi.schema provides a link to allow users to access the original Web page. These links are extracted and represented using zhishi.schema:site url. – SubclassOf Relations: One concept is a subclass of another if and only if the former is a child node of the latter. In Zhishi.schema, subclassOf relations are denoted as rdfs:subClassOf. – Equal Relations: Two concepts are equal if and only if they refer to the same meaning and this relation is represented using owl:equivalentClass. – Relate Relations: Compared with subclassOf and equal relation, relate relation is the weakest semantic relation. Two concepts are related if their meanings are close but not the same. skos:related is used for representing relate relations. In the Zhishi.schema dataset, all the equal and relate relations construct a large semantic network while all the subclassOf relations form an integrated concept taxonomy, which can be regarded as a a hierarchical acyclic graph (HAG). The root depth is 1 and the maximal depth is 16. Since a concept may have one or more parents, we can traverse to the concept from the roots via different paths. These paths might have different lengths so that each concept could exist at multiple depths of HAG. On average, the depth of each concept is 3.479. The detailed information of concept (including static categories and dynamic tags) number and the number of and provenances for concepts at each depth of HAG is given in Table 1.

Fig. 3. The Interface of Lookup Service

3

Zhishi.schema Explorer

Zhishi.schema Explorer is a platform that allows users to explore the dataset of Zhishi.schema with Lookup Service and SPARQL Endpoint. Lookup Service: Lookup service helps users to query with concept labels. It is available at http://los.linkingopenschema.info/LookUp.jsp and its interface is shown in Figure 3. After submitting a user query, all concepts whose labels exactly match the query are returned. Since some of the concepts are equal, Zhishi.schema Explorer merges them and presents an integrated view for browsing. In contrast to the keyword search on the whole Web, query over the Zhishi.schema dataset can offer productive and useful knowledge on schema level directly rather than a large amount of texts or Web sites. Figure 4 gives an example of the Lookup service. If one user searches for “Water Purifier ”, a page which integrates two equivalent concepts from two e-commerce Web sites (360buy6 and DangDang7 respectively) is returned. The provenance information, other equivalent concepts, parent concepts, child concepts, related concepts, and links to Web pages of original Web sites are shown in the returned page. These information are organized in the Resource Site Label, EqualClass, SuperClass, SubClass, RelatedClass and Link section respectively. Any parent concept or child concept can be clicked to switch to another page view. Such an interaction stands for navigation in the integrated concept taxonomy of Zhishi.schema. In addition, users can click on any related concept or equivalent concept and this interaction corresponds to traversal on the semantic 6 7

http://www.360buy.com/ http://www.dangdang.com/

Fig. 4. An Example of Lookup Service

network of Zhishi.schema. SPARQL Endpoint: Zhishi.schema dataset can also be explored with a SPARQL endpoint, which is available at http://los.linkingopenschema.info/ SPARQL.jsp. Figure 5 gives the interface of SPARQL Endpoint. This application is appropriate for the users who know in advance exactly what information is needed. These users can submit customized queries to this endpoint over the SPARQL protocol8 . A query used to search for parent concepts of “Water Purifier ” from DangDang is shown as follows: SELECT ?product WHERE { ?product } where UTF-8 in the URI of the subject represents encoded Chinese translations of “DangDang” and “Water Purifier ”. After submitting this query, 48 concepts are returned from Zhishi.schema. Table 2 gives part of the query results, including “Water Purifying Plant” from Amazon9 , “Small Household Appliance” from 8 9

http://www.w3.org/TR/sparql11-protocol/ http://www.amazon.cn/

Fig. 5. The Interface of SPARQL Endpoint

Taobao10 and “Household Appliance” from DangDang and Taobao. All the RDF triples are stored in AllegroGraph RDFStore11 which also provides querying capabilities.

4

Conclusion

In this paper, we first introduced the Zhishi.schema dataset containing an integrated concept taxonomy with subclassOf relations and a large semantic network composed of equal relations as well as relate relations. More information concerning Zhishi.schema can be found in [7]. Then, we presented Zhishi.schema Explorer with Lookup Service and SPARQL Endpoint, a platform for exploring Zhishi.schema, which is the first effort of publishing Chinese linked open schema. It allows users to query with concept labels and the SPARQL language. As for future work, we consider two aspects. First, Zhishi.schema will be linked to other datasets in LOD in order to bulid a global LOS. Second, we plan to provide more APIs as a programming interface of Zhishi.schema, thereby fostering more research with respect to analysing or mining the knowledge on schema level leveraging Zhishi.schema. 10 11

http://www.taobao.com/ http://franz.com/agraph/allegrograph/

Table 2. Sample query results from the SPAQRL endpoint product Text in brackets, which is English translation of decoded UTF-8, does not appear in the original query results.

References 1. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International journal on semantic web and information systems 5(3), 1–22 (2009) 2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semantics: science, services and agents on the world wide web 7(3), 154–165 (2009) 3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD 2008), pp. 1247–1250 (2008) 4. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi. me-weaving chinese linking open data. In: Proceedings of the 10th International Semantic Web Conference (ISWC 2011), pp. 205–220 (2011) 5. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web (WWW 2007), pp. 697–706 (2007) 6. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A large ontology from wikipedia and wordnet. Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 203–217 (2008) 7. Wang, H., Wu, T., Qi, G., Ruan, T.: On publishing chinese linked open schema. In: Proceedings of the 13th International Semantic Web Conference (ISWC 2014) (2014). (to appear)