explained the concept of converting an HTML page to RDFS/OWL page. This technique is ....
Linguistic Conversion of Syntactic to Semantic Web page G. Nagarajanl and K.K. Thyagarajan2 1
2
Research Scholar, Sathyabama University, Chennai, Tamil Nadu, India Professor, Dept. ol Information & Technology, RMK college ol Engineering & Technology,
Chennai, Tamil Nadu, India
Abstract. Information is knowledge. In earlier days one has to find a resource person or resource library to acquire knowledge. But today just by typing a keyword on a search engine all kind ofresources are available to us. Due to this mere advancement there are trillions of information available on net. So, in this era we are in need of search engine which also search with us by understanding the semantics of given query by the user. One such design is only possible only if we provide semantic to our ordinary HTML web page. In this paper we have explained the concept of converting an HTML page to RDFS/OWL page. This technique is incorporated along with natural language technology as we have to provide the Hyponym and Meronym of the given HTML pages.
Keywords: Ontology, OWL, RDFS, Name entity recognirion, probability Rdasoner.
1 Introduction liformation is the main source of intelligent. In today's revolutionary era the amount - information available on web is enormous. Just by typing any keyword on any ,euch engine will provide you with millions of information, but the amount of ::.evant information would be very few and the user again has to search manually on ' . those million result. So, this kind of search engine is not user friendly. The primary ::rroach of this paper is to design a Intelligent search engine which has to provide -,r the needed relevant information regarding the given query. Semantic search engine is the only key answer for this kind of search. As said in - '-1] here both the machine and the user tries to search some information on web. are many research papers regarding the design of a new semantic search -ere ; - r11€. In [6] even listed top five to ten Semantic Search Engine. The main drawback ' Semantic Search Engine is that the available Semantic Web page on web is very \s the concept of Semantic Web had started on around 2000, we have very few :;:rantic Web page. As the creation and design of Syntactic Web page is ease of :. also we have lot of in-built software for it still people are interested in creating -'::e Syntactic web page with general XHTML,XML,PHP,ecI. coding instead of . -:,ruct ontology forit.
rLe main focus of this paper is to design an web Intelligent Framework for
- .:rting a Syntactic web page to Semantic Web page. Thus if a framework is " - -'ble we can convert all those web pages and make it available for Semantic :-:.r Engine thus we can provide a way for efficient search engine where both the .: :nd Machine search an information from Web. .arathan et
a1.
.::Lk.com
(Eds.): Advances in Computing
& Inform. Technology, AISC 177, pp.201-213. @ Springer-Verlag
Berlin Heidelberg 2012
208
G. Nagarajan and K.K. Thyagarajan
This paper is design in such a way that first we listed some of the related work, then we start with providing the design of web Intelligent Framework, then in next session we explained the concept of converting HTML document to xML document, next the major conversion of XML to RDFS/OWL is explained then we concluded our work with future scope.
2
Related Work
In [10] discuss the way of converting the HTML to owl- using table. They consider the TABLE tag of HTML page and tried to convert to owl-. This won't produce any semantic to the ontology. In 111l they tried to convert the HTML to owl- using the
FRAME set tags they also tried to incorporate UML to identify the ciass and subclasses. In [12] the conversion is done by first annotating the web page. The annotation they consider is the semantic annotation thus they tried to provide semantic of the page. They use the tool called GATE to atalyze the semantic through natural language processing. In [13] the conversion is take place using the tag and they used GRDDL tool for conversion.
3
Web Intelligent Frame Work
As we search for intelligent information we need to design an intelligent system for this Syntactic to Semantic conversion. Fig.l shows the proposed framework, where the collection of Syntactic web pages are collected via a Web crawier as the output of an web crawler is list of URL we required a genius system to filter out the unwanted URL. Then with the available list of URL of inpur the XML is created with rhat enriry concern XML a conversion of XML to owl- is implemented and the crated ontology is collected in repository which can be used by an of the Semantic web search services. The conversion of HTML to xML and the XML to owl- is explained in detail in the forth coming session. The concept of web Crawler is already explained in [1] we are not specifying in this paper.
I.]s{'{lP 1A[.!$
r&TEtrYC TE 4IELtrSHIU n
asES
ffi.LffiISffi flTI& TBIW rtG I
Fig. 1. Web Intelligent Framework
{OH1T.B.IfIO}-
Ilrl
OI
!dle.*I.ri:fln1.
FffiC
Linguistic Convelsion of Syntactic to Semantic Web Page
209
4 HTML to XML Conversion The first phase of this Web Intelligent Framework is the conversation of all the web page collected from a web crawler to a standard xML fiies with name entity as the main entity. Name Entity Recognition is a concept of Natural Language Processing. In short it is called as NER. The main technology used here are patterns and Lexicons. For the given Corpus text NER classifies the entity as Person Name, Organization
Name, Location and Miscellaneous (Date, Time. Number, Percentage, Monetary expression, Number expression and Measurement expression.
IJT(ynL
L}(}CU Fd{EN
T
FRFF*t}t:ESEIzu*i
Letrtcan Rs
,&.
Fatiem
pasiisry
Fig. 2. HTML to XML Conversion
Figure 2 shows the general framework for converting HTML document to XML using Name Entity concept this technique is derived from [5].
Fig.3. Output ofthe conversion
210
G. Nagarajan and K.K. Thyagarajan
Figure 3 shown the output of XML creation of the given website which is relevani to Asian games. The concern entity relation XML for Organiztion is given below:
content="Media Services" pos="412 /> content.="ol1anpic CounciI" pos="\1-98" /> content="Press Conferences" pos= 3LL2" /> content="The Radio Management" pos="5807" /> content="News Coveraqe Tour" pos="5977" /> content= "Media Friends ' ' pos= " 5983 " /> < /mentions -organi zation>
5 XML to RDFS/OWL Conversion The main work on this phase of the project is the conversion of XML to OWL/RDFS.
Thus through this conversion we provide semantic
to the web page. As
the
converstion is take over automatically we need same format of well formatted XML file, that's the reason we use the generalized NER technique for XML conversion which provide same entity name tag. With this how we can convert is the main focus of this work. The syntactic rule of XML is converted to its concern Semantic via two main technique. One is Syntatic Analysis, where without seeing what inside with the use of XML's XSD files we can add the RDFS of that page [4]. The another approach is Semantic Analysis where we process the XML using Natural language processing and create the Ontology and Schema of the web page l7l.
5.1
Syntactic Analysis
Here we generally map the XSD element [4] and convert to OWL element for the mapping the strategy shown in Table 1 is used. Table 1. XSD to OWL XSD
owL
Xsd: elements,containing other elements
Owl:class,coupled with owl:ObjectProperties
or having at least one atffibute Xsd:elements.with neither sub-elements nor attdbutes
Owl : DatatypeProperties
Named xsd:complexType
Ow1:c1ass
Named xsd:SimpleTlpe
Ow1
:
Xsd:minOccurs,xsd:maxOccurs
Ow1
:minCardinality,owl :maxCardinality
Xsd: sequence,xsd:a11
Owl:intersectionOf
Xsd:choice
Combination of owl:intersectionOf, owl:unionOf and owl:complementOF
DatatlpeProperties
Linguistic Conversion of Syntactic to Semantic Web Page
211
An Ontologies main element are the owl classes, Object property, Data Property and all those constrain and cardinality element. Here as shown in Table 1 the XML element is converted. The main drawback of this approach is that some time an irrelevant data element would be tagged in OWL may produce irrelevant output.
5.2
Semantic Analysis
In this analysis the RDFS/OWL is generated using Natural Language
Processing
techniquesf9]. For a Semantic Web page we have to create both RDFS and OWL. RDFS which Resource Descriptor Framework is a kind a rules and logic regarding the content on the page. A human identifies and analysis any intelligent information only via logical reasoning. Likewise RDF produce logic to the web page. OWL, Web ontology language used to produce the ontology of the web page with this only we will have the whole conceptual idea of any general concept we can give accurate results.
Fig. 4. OWL Generator
Figure 4 show a general framework for generated OWL from the generated XML. To analyze the content of XML we uses Lexical Analyzer which analyze each entity XML tag and represent whether they are noun or verb or any other verbal notation. Once analyzed we use the concept of Probability Reasoner to determine the concept and relationship between them as ontology is of considering the concept and their relationship between them. We use Probability as they used to handle uncertainty with the help of deductive logic i.e using a set of hypothesis for reasoning. The primary relationship between the concept is to be identified are "is-a" and "part-of'relation. To identify this kind of relationship [8] to create a full structured ontology we have to determine the Hyponym and Meronym of the identified tag. There is an automatic way to determine and extract the Meronym with the following linguistic pattern
1) 2) 3)
Such NP as NP,*(orland) NP NP, NP* or other NP NP, including NP, orland NP
With this concept we can create an Ontology. The RDFS can be created using the Lexical Analyzer the frame work is shown in Figure 5, For the mapping the concept shown in Table 2 can be referred
212
G. Nagarajan and K.K. Thyagarajan
ffiBFS
Ge*eraArd Xh{L Fag*s
Ih:$applri-e
Fig. 5. RDFS Generator
Table 2. XML to RDF XML'S POS
RDF COMPONENTS
Verb(phrase)
Predicate
Noun(phrase) denoted at first
Subject
Noun(Phrase) denoted after predicate
Object
Noun with adjective (phrase)
Subclass of the original noun
6 Conclusion If
we have knowledge about what we are searching for, we can easy retrieve the desire information. The main drawback in information retrieval procedure in web technology is that the technology doesn't know the semantic and syntax of what the user searching for. This gives birth to the Semantic web Technology. In this paper we deal with the reusability technique of using converting the available HTML pages to an Ontology enriched Semantic web page. With the successful of this work we would like to design a semantic search Engine [14][15][16] for efficient information retrieval.
References
1.
2. 3. 4. 5. 6. 7.
Nagarajan, G., Thyagharajan, K.K.: A Novel Image Retrieval Approach for Semantic Web. Intemational Journal of Computer Applications (Jantary 2012) Nagarajan, G., Thyagharajan, K.K.: A Survey on the Ethical Implicarions of Semantic web rechnology. Joumal of Advanced Reasearch in Computer Engineering 4(1) (June 2010) Minu, R.I., Thyagharajan, K.K.: Evolution of Semantic Web and Its Ontology. In: Second Conference on Digital Convergence (2009) Bohring, H., Aure, S.: Mapping XML to OWL onrologies (2004) Zhu, J., Uren, V., Motta Espotter, E,; Adaptive named Recognition for web browsing (2004) Esmaili, K.S., Abolhassani, H.: A Categorization scheme for semantic web search engines (200s) Hassanzadeh, H., Keyvanpour, M.R.: A Machine learning based analytical framework for semantic annotaion requirements. Intemational Joumal of Web and Semantic Technology (201 1)
8.
Abeyruwan, S.W.: Prontoleam: unsupervised lexico semantic ontology generation using probabilistic methods. These of universtiy of miami (20i 0)
Linguistic Conversion of Syntactic to Semantic Web Page
9.
213
Lenci, A., et a1.: NLP based ontology learning from legal texts. A case srudy (2006) a1.: Towards ontology generation from tables. Kluwer academic publishers (2004) Benslimane, S., et al.: Towards ontology extration from data intensive web sites: An html forms based reverse engineering approach. Intemational Arab Joumal of Information Tecnology t2006) Mulfiopadhyay, D., et a1.: A New semantic web services to translate HTML pages to RDF. In: Int. Conference of IT (2007)
10. Tijerino, Y.A., et 11. 12.
13. Hwangbo, H., et a1.: Reusing of information constructed
in HTML
document:
a
conversion of HTML to OWL. In: Int. Conference on Control, Automation and Systems (2008) 14. Minu, R.I., Thyagharajan, K.K.: Automatic image classification using SVM Classifier. CiiT International Joumal of Data Mining Knowledge Engineering (July 2011) 15
Minu, R.I., Thyagharajan, K.K.: Scrutinizing Video and Video Retrieval
r6
Intemational Joumal of Soft Computing & Engineering 1(5), 27U275 (2011) Nagarajan, G., Thyagharajan, K.K.: A Machine leaming technique for Semantic Search Engine. In: ICMOC NI University will publish in Elsevier Procedio @pil2012)
Concept.