SHELDON: Semantic Holistic framEwork for LinkeD ONtology Data Diego Reforgiato Recupero1,2(B) , Andrea Giovanni Nuzzolese1,2 , Sergio Consoli1,2 , Aldo Gangemi1,2 , and Valentina Presutti1,2 1 2
STLab-ISTC Consiglio Nazionale delle Ricerche, Catania, Italy LIPN Sorbonne-Cit´e, Universit´e de Paris-13, Villetaneuse, France
[email protected]
Abstract. SHELDON is a framework that builds upon a machine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current humanreadable web by using semantic best practices and technologies in a machine-processable form. Given a sentence in any language, it provides different semantic tasks as well as nice visualization tools which make use of the JavaScript infoVis Toolkit and a knowledge enrichment component on top of RelFinder. The system can be freely used at http:// wit.istc.cnr.it/stlab-tools/sheldon.
1
Introduction
Machine Reading relies on bootstrapped, self-supervised Natural Language Processing (NLP) performed on basic tasks, in order to extract knowledge from text. SHELDON generates RDF graph representations out of the knowledge extracted from text by tools dedicated to basic NLP tasks. Such graphs extend and improve NLP output, and are typically customized for application tasks. SHELDON includes and extends several components successfully evaluated in the past [1,4–11]. FRED [4,10], a tool for automatically producing RDF/OWL ontologies and linked data from text and represents the machine reader capability of SHELDON. The backbone deep semantic parsing is provided by Boxer [2] which uses a statistical parser (C&C) producing Combinatory Categorial Grammar trees. The basic NLP tasks performed by FRED include: A list of NLP tasks performed by FRED incudes (i) event detection (FRED uses DOLCE+DnS1 [3]), (ii) semantic role labeling with VerbNet2 and FrameNet roles, (iii) first- order logic representation of predicate-argument structures, (iv) logical operators scoping, (v) modality detection, (vi) tense representation, (vii) entity recognition using TAGME3 , (viii) word sense disambiguation using IMS and BabelNet4 . Also, SHELDON leverages LEGALO [9], a novel method for uncovering the 1 2 3 4
D.U.L. Ontology. http://www.ontologydesignpatterns.org/ont/dul/dul.owl T.V. project. http://verbs.colorado.edu/mpalmer/projects/verbnet.html http://tagme.di.unipi.it/ http://babelnet.org/
c Springer International Publishing Switzerland 2015 P. Lambrix et al. (Eds.): EKAW 2014 Satellite Events, LNAI 8982, pp. 136–139, 2015. DOI: 10.1007/978-3-319-17966-7 17
SHELDON: Semantic Holistic framEwork for LinkeD ONtology Data
137
intended semantics of links by tagging them with semantic relations. LEGALO implements a set of graph pattern-based rules for extracting, from FRED graphs, Semantic Web binary relations that capture the semantics of specific links. The rationale behind is that revealing the semantics of hyperlinks has a high potential impact on the amount of Web knowledge that can be published in machine readable form, keeping the binding with its corresponding natural language source. One more component of SHELDON performs semantic sentiment analysis. It is built on top of SENTILO [6,11], a domain-independent system that performs SA by hybridizing natural language processing techniques and semantic Web technologies. Given an opinion expressing sentence, SENTILO recognizes its holder, detects related topics and subtopics, links them to relevant situations/events referred to by it and evaluates the sentiment expressed. The automatic typing of DBpedia entities is one more component of SHELDON. The related component is based on TIPALO [5], which identifies the most appropriate types for an entity by interpreting its natural language definition. Bibliographic citations are the most used tools of academic communities for linking research, for instance by connecting scientific papers to related works or source of experimental data. One more component of SHELDON implements a strategy for the linking within scientific research articles feature. It is built on top of CITALO [8], a tool to infer automatically the function of citations by means of Semantic Web technologies and NLP techniques. A sentence containing a reference to a bibliographic entity and the CiTO ontology used to describe the nature of citations in scientific research articles are taken as input to infer the function of that citation. Data visualization is also a matter taking into account in SHELDON. Graph and the triple visualization of each component of SHELDON can be shown. An appealing data visualization component built on top of [1] relies on the JavaScript InfoVis Toolkit. Besides, it is possible to augment the identified relations between detected DBpedia entities using a SHELDON component that is built on top of RelFinder [7]. REST API are provided for each of the SHELDON component. Thus, everyone can build online end-user applications that integrate, visualize, analyze, combine and deduce the available knowledge at the desired level of granularity. The reader notices that end-users of SHELDON are developers and researchers that want to leverage semantics knowledge provided by SHELDON for their final applications.
2
SHELDON at Work
SHELDON ’s main interface shows a text box where one can type a sentence in any language and decide which semantic task to perform. The reader notices that that SHELDON will always provide the results in English. The Bing Translation APIS have been used and embedded within SHELDON. If the used language of the sentence is different than English, then an automatic language detection5 module will identify the target language. For example, the sentence: Riva del Garda `e una bella citt` a che fa parte dell’Italia would be 5
That makes use of the Language Identification Engine of Apache Stanbol.
138
D.R. Recupero et al.
a valid Italian sentence to be processed which would be correctly translated from Italian to English. If the machine reader capability is chosen, SHELDON outputs an RDF graph with several associated information (DBpedia entities, events and situations mapped within DOLCE, WordNet and VerbNet mapping, pronoun resolution). If the SA option is used, for the same sentence SHELDON returns a semantic SA ontology with scores for the positive adjective, beautiful, and for the entity Riva del Garda which was computed according to the propagation algorithm [11]. If the relation discovery choice is made, SHELDON returns the new (or aligned against existing repositories) relations between DBpedia entities, in our example Riva del Garda and Italy. Each DBpedia entity nodes displayed the graphs is clickable in order to automatically assign types in a new LOD-intensive graph. Fig. 1(a), (b) and (c) shows, respectively, the produced output of the machine reader feature, the semantic SA and the produced relations between the DBpedia entities recognized within the sentence. For each of the three main capabilities it is possible to show the complete list of RDF triples that SHELDON outputs by choosing any other view other than the Graphical View item.
(a)
(b)
(c) Fig. 1. Machine reader output (a), semantic link identification (b), and semantic SA output (c) for the Italian sentence Riva del Garda ` e una bella citt` a che fa parte dell’Italia
Within the options at the bottom of the produced graphs it is possible to export the graph as images, to see the augmented knowledge for the identified DBpedia entities from SHELDON using a nice GUI built on top of RelFinder and to navigate the graph through a nice visualization tool that builds upon the Semantic Scout [1] and that uses the JavaScript InfoVis Toolkit6 . 6
http://philogb.github.io/jit/
SHELDON: Semantic Holistic framEwork for LinkeD ONtology Data
3
139
Conclusions
We have shown SHELDON7 , a framework that provides several Semantic Web features for a sentence in any language. SHELDON augments the knowledge of entities identified within the sentence and includes appealing visualization tools.
References 1. Baldassarre, C., Daga, E., Gangemi, A., Gliozzo, A.M., Salvati, A., Troiani, G.: Semantic scout: making sense of organizational knowledge. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 272–286. Springer, Heidelberg (2010) 2. Bos, J.: Wide-coverage semantic analysis with boxer. In: Proceedings of the 2008 Conference on Semantics in Text Processing, STEP 2008, pp. 277–286. Association for Computational Linguistics, Stroudsburg (2008) 3. Gangemi, A.: Whats in a schema?, pp. 144–182. Cambridge University Press, Cambridge (2010) 4. Gangemi, A.: A comparison of knowledge extraction tools for the semantic web. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 351–366. Springer, Heidelberg (2013) 5. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of dbpedia entities. In: Cudr´e-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012) 6. Gangemi, A., Presutti, V., Recupero, D.R.: Frame-based detection of opinion holders and topics: A model and a tool. IEEE Comp. Int. Mag. 9(1), 20–30 (2014) 7. Heim, P., Hellmann, S., Lehmann, J., Lohmann, S., Stegemann, T.: RelFinder: revealing relationships in RDF knowledge bases. In: Chua, T.-S., Kompatsiaris, Y., M´erialdo, B., Haas, W., Thallinger, G., Bailer, W. (eds.) SAMT 2009. LNCS, vol. 5887, pp. 182–187. Springer, Heidelberg (2009) 8. Iorio, A.D., Nuzzolese, A.G., Peroni, S.: Towards the automatic identification of the nature of citations. In: Castro, A.G., Lange, C., Lord, P.W., Stevens, R. (eds) SePublica. CEUR Workshop Proceedings, vol. 994, pp. 63–74. CEUR-WS.org (2013) 9. Presutti, V., Consoli, S., Nuzzolese, A.G., Recupero, D.R., Gangemi, A., Bannour, I., Zargayouna, H.: Uncovering the semantics of wikipedia wikilinks. In: 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) (2014) 10. Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A., V¨ olker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 114–129. Springer, Heidelberg (2012) 11. Recupero, D.R., Presutti, V., Consoli, S., Gangemi, A., Nuzzolese, A.G.: Sentilo: Frame-based sentiment analysis. Cognitive Computation (2014) 7
The work described in this paper was performed with the support of the PRISMA (PiattafoRme cloud Interoperabili per SMArt-government) project, funded by the MIUR (Ministero dellIstruzione, dellUniversit‘a e della Ricerca).