Linguistic Conversion of Syntactic to Semantic Web

0 downloads 0 Views 3MB Size Report
explained the concept of converting an HTML page to RDFS/OWL page. This technique is ....
Linguistic Conversion of Syntactic to Semantic Web page G. Nagarajanl and K.K. Thyagarajan2 1

2

Research Scholar, Sathyabama University, Chennai, Tamil Nadu, India Professor, Dept. ol Information & Technology, RMK college ol Engineering & Technology,

Chennai, Tamil Nadu, India

Abstract. Information is knowledge. In earlier days one has to find a resource person or resource library to acquire knowledge. But today just by typing a keyword on a search engine all kind ofresources are available to us. Due to this mere advancement there are trillions of information available on net. So, in this era we are in need of search engine which also search with us by understanding the semantics of given query by the user. One such design is only possible only if we provide semantic to our ordinary HTML web page. In this paper we have explained the concept of converting an HTML page to RDFS/OWL page. This technique is incorporated along with natural language technology as we have to provide the Hyponym and Meronym of the given HTML pages.

Keywords: Ontology, OWL, RDFS, Name entity recognirion, probability Rdasoner.

1 Introduction liformation is the main source of intelligent. In today's revolutionary era the amount - information available on web is enormous. Just by typing any keyword on any ,euch engine will provide you with millions of information, but the amount of ::.evant information would be very few and the user again has to search manually on ' . those million result. So, this kind of search engine is not user friendly. The primary ::rroach of this paper is to design a Intelligent search engine which has to provide -,r the needed relevant information regarding the given query. Semantic search engine is the only key answer for this kind of search. As said in - '-1] here both the machine and the user tries to search some information on web. are many research papers regarding the design of a new semantic search -ere ; - r11€. In [6] even listed top five to ten Semantic Search Engine. The main drawback ' Semantic Search Engine is that the available Semantic Web page on web is very \s the concept of Semantic Web had started on around 2000, we have very few :;:rantic Web page. As the creation and design of Syntactic Web page is ease of :. also we have lot of in-built software for it still people are interested in creating -'::e Syntactic web page with general XHTML,XML,PHP,ecI. coding instead of . -:,ruct ontology forit.

rLe main focus of this paper is to design an web Intelligent Framework for

- .:rting a Syntactic web page to Semantic Web page. Thus if a framework is " - -'ble we can convert all those web pages and make it available for Semantic :-:.r Engine thus we can provide a way for efficient search engine where both the .: :nd Machine search an information from Web. .arathan et

a1.

.::Lk.com

(Eds.): Advances in Computing

& Inform. Technology, AISC 177, pp.201-213. @ Springer-Verlag

Berlin Heidelberg 2012

208

G. Nagarajan and K.K. Thyagarajan

This paper is design in such a way that first we listed some of the related work, then we start with providing the design of web Intelligent Framework, then in next session we explained the concept of converting HTML document to xML document, next the major conversion of XML to RDFS/OWL is explained then we concluded our work with future scope.

2

Related Work

In [10] discuss the way of converting the HTML to owl- using table. They consider the TABLE tag of HTML page and tried to convert to owl-. This won't produce any semantic to the ontology. In 111l they tried to convert the HTML to owl- using the

FRAME set tags they also tried to incorporate UML to identify the ciass and subclasses. In [12] the conversion is done by first annotating the web page. The annotation they consider is the semantic annotation thus they tried to provide semantic of the page. They use the tool called GATE to atalyze the semantic through natural language processing. In [13] the conversion is take place using the tag and they used GRDDL tool for conversion.

3

Web Intelligent Frame Work

As we search for intelligent information we need to design an intelligent system for this Syntactic to Semantic conversion. Fig.l shows the proposed framework, where the collection of Syntactic web pages are collected via a Web crawier as the output of an web crawler is list of URL we required a genius system to filter out the unwanted URL. Then with the available list of URL of inpur the XML is created with rhat enriry concern XML a conversion of XML to owl- is implemented and the crated ontology is collected in repository which can be used by an of the Semantic web search services. The conversion of HTML to xML and the XML to owl- is explained in detail in the forth coming session. The concept of web Crawler is already explained in [1] we are not specifying in this paper.

I.]s{'{lP 1A[.!$

r&TEtrYC TE 4IELtrSHIU n

asES

ffi.LffiISffi flTI& TBIW rtG I

Fig. 1. Web Intelligent Framework

{OH1T.B.IfIO}-

Ilrl

OI

!dle.*I.ri:fln1.

FffiC

Linguistic Convelsion of Syntactic to Semantic Web Page

209

4 HTML to XML Conversion The first phase of this Web Intelligent Framework is the conversation of all the web page collected from a web crawler to a standard xML fiies with name entity as the main entity. Name Entity Recognition is a concept of Natural Language Processing. In short it is called as NER. The main technology used here are patterns and Lexicons. For the given Corpus text NER classifies the entity as Person Name, Organization

Name, Location and Miscellaneous (Date, Time. Number, Percentage, Monetary expression, Number expression and Measurement expression.

IJT(ynL

L}(}CU Fd{EN

T

FRFF*t}t:ESEIzu*i

Letrtcan Rs

,&.

Fatiem

pasiisry

Fig. 2. HTML to XML Conversion

Figure 2 shows the general framework for converting HTML document to XML using Name Entity concept this technique is derived from [5].

Fig.3. Output ofthe conversion

210

G. Nagarajan and K.K. Thyagarajan

Figure 3 shown the output of XML creation of the given website which is relevani to Asian games. The concern entity relation XML for Organiztion is given below:



content="Media Services" pos="412 /> content.="ol1anpic CounciI" pos="\1-98" /> content="Press Conferences" pos= 3LL2" /> content="The Radio Management" pos="5807" /> content="News Coveraqe Tour" pos="5977" /> content= "Media Friends ' ' pos= " 5983 " /> < /mentions -organi zation>

5 XML to RDFS/OWL Conversion The main work on this phase of the project is the conversion of XML to OWL/RDFS.

Thus through this conversion we provide semantic

to the web page. As

the

converstion is take over automatically we need same format of well formatted XML file, that's the reason we use the generalized NER technique for XML conversion which provide same entity name tag. With this how we can convert is the main focus of this work. The syntactic rule of XML is converted to its concern Semantic via two main technique. One is Syntatic Analysis, where without seeing what inside with the use of XML's XSD files we can add the RDFS of that page [4]. The another approach is Semantic Analysis where we process the XML using Natural language processing and create the Ontology and Schema of the web page l7l.

5.1

Syntactic Analysis

Here we generally map the XSD element [4] and convert to OWL element for the mapping the strategy shown in Table 1 is used. Table 1. XSD to OWL XSD

owL

Xsd: elements,containing other elements

Owl:class,coupled with owl:ObjectProperties

or having at least one atffibute Xsd:elements.with neither sub-elements nor attdbutes

Owl : DatatypeProperties

Named xsd:complexType

Ow1:c1ass

Named xsd:SimpleTlpe

Ow1

:

Xsd:minOccurs,xsd:maxOccurs

Ow1

:minCardinality,owl :maxCardinality

Xsd: sequence,xsd:a11

Owl:intersectionOf

Xsd:choice

Combination of owl:intersectionOf, owl:unionOf and owl:complementOF

DatatlpeProperties

Linguistic Conversion of Syntactic to Semantic Web Page

211

An Ontologies main element are the owl classes, Object property, Data Property and all those constrain and cardinality element. Here as shown in Table 1 the XML element is converted. The main drawback of this approach is that some time an irrelevant data element would be tagged in OWL may produce irrelevant output.

5.2

Semantic Analysis

In this analysis the RDFS/OWL is generated using Natural Language

Processing

techniquesf9]. For a Semantic Web page we have to create both RDFS and OWL. RDFS which Resource Descriptor Framework is a kind a rules and logic regarding the content on the page. A human identifies and analysis any intelligent information only via logical reasoning. Likewise RDF produce logic to the web page. OWL, Web ontology language used to produce the ontology of the web page with this only we will have the whole conceptual idea of any general concept we can give accurate results.

Fig. 4. OWL Generator

Figure 4 show a general framework for generated OWL from the generated XML. To analyze the content of XML we uses Lexical Analyzer which analyze each entity XML tag and represent whether they are noun or verb or any other verbal notation. Once analyzed we use the concept of Probability Reasoner to determine the concept and relationship between them as ontology is of considering the concept and their relationship between them. We use Probability as they used to handle uncertainty with the help of deductive logic i.e using a set of hypothesis for reasoning. The primary relationship between the concept is to be identified are "is-a" and "part-of'relation. To identify this kind of relationship [8] to create a full structured ontology we have to determine the Hyponym and Meronym of the identified tag. There is an automatic way to determine and extract the Meronym with the following linguistic pattern

1) 2) 3)

Such NP as NP,*(orland) NP NP, NP* or other NP NP, including NP, orland NP

With this concept we can create an Ontology. The RDFS can be created using the Lexical Analyzer the frame work is shown in Figure 5, For the mapping the concept shown in Table 2 can be referred

212

G. Nagarajan and K.K. Thyagarajan

ffiBFS

Ge*eraArd Xh{L Fag*s

Ih:$applri-e

Fig. 5. RDFS Generator

Table 2. XML to RDF XML'S POS

RDF COMPONENTS

Verb(phrase)

Predicate

Noun(phrase) denoted at first

Subject

Noun(Phrase) denoted after predicate

Object

Noun with adjective (phrase)

Subclass of the original noun

6 Conclusion If

we have knowledge about what we are searching for, we can easy retrieve the desire information. The main drawback in information retrieval procedure in web technology is that the technology doesn't know the semantic and syntax of what the user searching for. This gives birth to the Semantic web Technology. In this paper we deal with the reusability technique of using converting the available HTML pages to an Ontology enriched Semantic web page. With the successful of this work we would like to design a semantic search Engine [14][15][16] for efficient information retrieval.

References

1.

2. 3. 4. 5. 6. 7.

Nagarajan, G., Thyagharajan, K.K.: A Novel Image Retrieval Approach for Semantic Web. Intemational Journal of Computer Applications (Jantary 2012) Nagarajan, G., Thyagharajan, K.K.: A Survey on the Ethical Implicarions of Semantic web rechnology. Joumal of Advanced Reasearch in Computer Engineering 4(1) (June 2010) Minu, R.I., Thyagharajan, K.K.: Evolution of Semantic Web and Its Ontology. In: Second Conference on Digital Convergence (2009) Bohring, H., Aure, S.: Mapping XML to OWL onrologies (2004) Zhu, J., Uren, V., Motta Espotter, E,; Adaptive named Recognition for web browsing (2004) Esmaili, K.S., Abolhassani, H.: A Categorization scheme for semantic web search engines (200s) Hassanzadeh, H., Keyvanpour, M.R.: A Machine learning based analytical framework for semantic annotaion requirements. Intemational Joumal of Web and Semantic Technology (201 1)

8.

Abeyruwan, S.W.: Prontoleam: unsupervised lexico semantic ontology generation using probabilistic methods. These of universtiy of miami (20i 0)

Linguistic Conversion of Syntactic to Semantic Web Page

9.

213

Lenci, A., et a1.: NLP based ontology learning from legal texts. A case srudy (2006) a1.: Towards ontology generation from tables. Kluwer academic publishers (2004) Benslimane, S., et al.: Towards ontology extration from data intensive web sites: An html forms based reverse engineering approach. Intemational Arab Joumal of Information Tecnology t2006) Mulfiopadhyay, D., et a1.: A New semantic web services to translate HTML pages to RDF. In: Int. Conference of IT (2007)

10. Tijerino, Y.A., et 11. 12.

13. Hwangbo, H., et a1.: Reusing of information constructed

in HTML

document:

a

conversion of HTML to OWL. In: Int. Conference on Control, Automation and Systems (2008) 14. Minu, R.I., Thyagharajan, K.K.: Automatic image classification using SVM Classifier. CiiT International Joumal of Data Mining Knowledge Engineering (July 2011) 15

Minu, R.I., Thyagharajan, K.K.: Scrutinizing Video and Video Retrieval

r6

Intemational Joumal of Soft Computing & Engineering 1(5), 27U275 (2011) Nagarajan, G., Thyagharajan, K.K.: A Machine leaming technique for Semantic Search Engine. In: ICMOC NI University will publish in Elsevier Procedio @pil2012)

Concept.