A Novel Approach for Knowledge Mining from Graphs ...

19 downloads 48615 Views 301KB Size Report
the best matching result. .... An entity based search engine changes the .... SEO(Search Engine Optimization) is definitely Semantic Web in its rudimentary form.
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 9, Number 20 (2014) pp. 6699-6706 © Research India Publications http://www.ripublication.com

A Novel Approach for Knowledge Mining from Graphs using Semantic Search Hemamalini. S1, MichaelRaj T.F2, Prabu.M3 and Saravanan.N4 1,2,3,4

Asst. Prof., Department of CSE, SASTRA University, Tamil Nadu, India E-mail: [email protected], [email protected], 3 [email protected], [email protected]

Abstract Information retrieval is the process of finding the user’s required data from the vast collection of information. Most of our attempt for any new information begins with a keyword based attempt followed by further refinements. Such keyword based search is very primitive and it often retrieves contents that are less relevant. But finding the optimal solution for the information retrieval from graph is a complicated process and it requires more effort and techniques. We have proposed an approach for knowledge mining by processing the graph using the semantic search. . Knowledge graphs are designed to treat web pages as entities and are bound to perform entity-based search, which is basically a graph search technique. Graph-search is a classical problem of Artificial Intelligence. Though, Semantic Web still stands at the corridors of the Knowledge Representation war, Graph-Search can take it to fruitful venues. The GP-CLOSE which is a mining algorithm for knowledge discovery in text databases can be applied on the Knowledge graph of entities with modifications. The Graph searching algorithm AO* can then be used on the knowledge graph to search for the best matching result. This paper is an attempt to meet Semantic web at the earliest. Keywords: Semantic web, knowledge mining, Knowledge graph, Semantic search.

1. Introduction Web consists of pool of structured and unstructured data, various information retrieval techniques were implemented in finding the user’s required data from the vast

6700

Hemamalini. S et al

collection of information. Most of the retrieval methods used by Search engines are keyword-based, in which, the search is done by mechanically (syntactically) comparing the keyword and the terms used to index the document. A single word may have many meanings and one meaning may have many words. Because of this, the result of keyword-based search is very low in precision. This is the consequence of the lack of Knowledge representation and semantic processing capabilities. Thus, the key to the solution for this problem may be adding semantic information and refining the information processing capabilities. They are not different from the classical knowledge representation problem of AI.It has evolved through a lot of stages like Phrase-searching, Logical operator-based search, Page-rank based algorithms; Hit-count based algorithms, Wild-card searches and Near searches. But, most of these algorithms are still supplements to the exact keyword-match algorithms. The next kind of technique that is being adopted by Google and Facebook is the Knowledge Graphs (KG). The KG is shown in the figure 1 and it provides the salient features for the users about the searching things, like people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more[1].

Fig. 1: Knowledge Graph (KG) of places.

Semantic search is a process of information access, which uses various technologies like knowledge extraction and representation and reasoning. Semantically described contents can be easily searched and information can be retrieved from the results by applying the data mining concepts. Figure 2 shows the semantic relation among the contents.

A Novel Approach for Knowledge Mining from Graphs using Semantic Search

6701

Fig. 2: Semantic relations. Knowledge mining from the KG is a critical process can be done by processing the user’s query. Section II describes theliterature surveyand the overview ofthe emerging Semantic web,section III discusses the proposed method and Section IV and V describes results and conclusion.

2. Literature Survey Semantic Search tries to understand the searcher’s will and meaning of the terms in the searchspace, to get accurate and relevant results[2] It considers categories like: Context of search, Location, usage of words, synonyms, query generalization and specialization, matching of concepts and NLP queries. This kind of search is being used by search engines like Google and Bing[3]. Two types of such search have been distinguished by [4]: Navigational search- in which user uses the search engine as a navigational tool and exploratory search- in which searchers are either not familiar of their goal or unsure about how to achieve their goals or unsure even about what their goal is. So, they combine querying and browsing to learn and investigate side by side. Navigational search is not relevant to the context of semantic search, whereas exploratory search is the area of operation of semantic search. The attributes of semantic search (those qualities that make it distinct from nonsemantic search) are not all necessarily advantages by definition. For example, some attributes may improve search accuracy because of an exhaustive reiterative process but by effect over-consume time and/or resources (processing power). Accordingly, these ten attributes are merely salient features although the underlying assumption is that under perfect conditions they are generally preferable [5] Handling morphological variations, synonyms, generalizations, matching of concepts and knowledge, language queries, pointing to uninterrupted paragraph and the sentence with higher relevance, Progress customization, operating without relying on statistics, user behaviour, and other artificial means, validating own performance.

6702

Hemamalini. S et al

Another aspect of semantic search is usage of Ontologies. Ontologies enable the formal articulation of domain knowledge at a high level of expressiveness and could enable the user to specify his intent in more detail at query time [6]. 2.1 Ontology Ontology, which has its roots in philosophy, is a new method of knowledge representation. It is the systematic explanation or description of existing objects. Ontology is the explicit formal specification of the shared conceptual model. It has 4 layers: The conceptual model refers to the model obtained by abstracting some phenomena related concepts in the objective world. Ontology is a conceptual modelling tool. So, it can describe the information in the level of semantic and knowledge, provide semantic foundation of subject, exchange and share common understanding to the meaning of terms and also retrieve document pages with different manifestations but semantic similarity. An association pattern mining technique can be to find how relations are associated on RDF Metadata. Conceptual graphs have been used by them to extract knowledge from text databases. 2.2 Existing methods and drawbacks Ontology-based semantic retrieval model which consists of 5 modules: User interface module, semantic processing module, semantic retrieval module, document processing module and output of search results[7]. A keyword-based entity search mechanism, KeymanticES, which combines the flexibility of keyword-based search with ability to query metadata/knowledge base [8].In this they propose a 3-step query disambiguation approach which consists of Keyword semantic item mapping, goal entity class recognizing, and candidate query set generation along with candidate query ranking function. This work is focused on keyword-based semantic entity search in data spaces. Jorge and others suggests a disambiguation process based on dictionary-based methods to find the right sense of a given keyword, according to the context in which it appears [9]. There are some methods to enhance annotation tasks for entity disambiguation [10]. A method, SSI (structural semantic interconnections), is proposed to disambiguate entities by using the background knowledge provided by populated ontologies [11. Another disambiguation method for ontology population is suggested by [12] and semantic tagging of large corpora is performed by SemTag [13]. 2.3 Knowledge Graph The Knowledge Graph is a repository of around 18 billion facts about and relationships between different objects that have been derived from various sources like the CIA World Factbook, Freebase, and Wikipedia#. It is used by Google to enhance its search engine's search results. It provides structured and detailed information about the topic in addition to a list of links to other sites, so that users would be able to use this information to resolve their query, with much less effort [14]. Thus, A search engine's three primary functions will need to evolve and that search will need to Answer, Converse, and Anticipate have to be met in order that the search engine is better.Some of the features used by Google are structured schema, RDF and micro-format data, search logs and mark-up data from their database Freebase [15].

A Novel Approach for Knowledge Mining from Graphs using Semantic Search

6703

With these, the knowledge graph is able to meaningfully answer a query. In addition, Entity- based search enabled with Natural Language search helps to understand search terms better. An entity may be any concept, process, objects, businesses, products, movies, authors, people, places, events, etc. and may have a collection of documents associated with it. An entity based search engine changes the very concept of SEO. It would be about optimizing not against a page, but optimizing for an object. Knowledge graphs can be thought of as an extension or working model of Conceptual graphs. The mining algorithms used by [16] on Conceptual graphs can be implemented on Knowledge graphs as well.

3. Proposed Method Semantics extracted from collaborative tagging in social websites to provide a solution to the disambiguation problem [17]. Elaine Rich and others proposed a communitydiscovery algorithm on folksonomy networks suggested called the FastGreedyCommunityDiscovery algorithm for networks is used to discover the adjacent nodes [18]. Using this algorithm, sets of tags which are related to it in different contexts are extracted for an ambiguous word. Algorithm 1: Tag meaning disambiguation Input: Adjacency matrix C of the network of documents Output: A set T of sets of tags begin // Document clustering; X= FastGreedyCommunityDiscovery(C); T={ }; // Extract top 10 tags; for Xi ϵ X do Ti=Top10Tags(Xi

T=T U {Ti}; end // Merge similar sets of tags; merged=1; while merged=1 do merged=0; for Ti; Tj ϵ T and i != j do if overlap(Ti; Tj)≥αthen Xnew=Xi U Xj ; Tnew=Top10Tags(Xnew); T=T–{Ti ,Tj }; T=T U{Tnew}; merged=1; end end

6704

Hemamalini. S et al

return T; end The output of this algorithm would be disambiguated tags. This can be fed as input to the Closed Generalization Pattern Mining algorithm. It can be used to obtain group of closures on the patterns of keywords. Algorithm 2: GP-CLOSE Input: RDF database Output: Closed generalization closures 1. Visit node n of a search tree. 2. Generate the closed generalization closure c and insert in the frequent Generalization Closure C. 3. If the closure n can be considered then n and all its descendants can be pruned. 4. Recursively visit the child-closure of current tree-node n. 5. The output of this algorithm can then be fed as input to the AO* search. The Knowledge graph used to implement Semantic Web, that has now been clustered into closures can best be searched with the help of the AO* Search. It is the And-Or graph search. This search works on the basis of finding the optimal solution by considering the minimum cost path among the weight of the Or arcs and the weight of the sum of the and arcs. Since, search attempts are not void of logical operators like and or or, AO* search can naturally solve the problem of searching in a state space which is a graph of nodes,where nodes are web pages with annotations. The weights of the nodes or the edges may be according to the semantic relevance of the particular page. One step ahead of this would be to think of the nodes as entities to describe knowledge as the classical semantic nets or in the Object-oriented or Component-based fashion. Here node is the conceptualisation of the entity of a web page. The web page has to be annotated beforehand. The previous algorithm can be used to disambiguate the tags in the web page based on the data from social networking websites. The graph ‘G’ consisting of this initial node is fed as input to this modified AO* algorithm. FUTILITY is the feasible limit of cost of a node. If the estimated cost of a node is greater than FUTILITY, then the node is abandoned. Algorithm 3: AO* ALGORITHM Input: The Graph of nodes (disambiguated tags) Output: solution with optimum cost i.e., path to the required node. 1. Let G consists of the node representing the initial state call this node INIT. Compute h' (INIT). The cost estimation i.e., h would now be the semantic

A Novel Approach for Knowledge Mining from Graphs using Semantic Search

6705

relevance of the tags(percentage or degree of relevance to the particular keyword) of the web pages. 2. Loop until ( INIT is labeled SOLVED or h' (INIT) > FUTILITY))Repeat the following procedure. 3. Trace the marked arcs from INIT and select an unbounded node NODE. 4. Generate the successors of NODE . (This step is necessarily expansion of the graph by including nodes with tags that are relevant to tags in the current search). if there are no successors then assign FUTILITY as h' (NODE). This means that NODE is not solvable. If there are successors then for each one called SUCCESSOR, that is not also an ancestor of NODE do the following: 1. add SUCCESSOR to graph G. 2. if successor is a terminal node, mark it solved and assign zero to its h ' value. 3. If successor is not a terminal node, compute it h' value. 4. Propagate the newly discovered information up the graph by doing the following. Let S be a set of nodes that have been marked SOLVED. Initialize S to NODE. Until S is empty repeat the following procedure;  select a node from S call if CURRENT and remove it from S.  Compute h' of each of the arcs emerging from CURRENT, Assign minimum h' to CURRENT.  Mark the minimum cost path as the best out of CURRENT. Mark CURRENT SOLVED if all of the nodes connected to it through the new marked arc have been labeled SOLVED. 5. If CURRENT has been marked SOLVED or its h ' has just changed, its new status must be propagate backwards up the graph . hence all the ancestors of CURRENT are added to S.

4. Results and Analysis The Semantic Web implemented with the help of knowledge graph has to be searched or mined. The keywords presented for search are first disambiguated and the output of this step is fed as input to the GP-CLOSE algorithm. The output of this step is fed as input to the AO* search which shall finally return the optimal cost path to the required node, which is nothing but the relevant solution to the search.

5. Conclusion The talk of the day is “Semantic Web has killed SEO. Long live SEO”. But, SEO(Search Engine Optimization) is definitely Semantic Web in its rudimentary form. Semantic Web will never be void of SEO, at-least for the near future. So, this is a small attempt towards optimizing the current web searching techniques. Future work may be at the nature of the result. The result won’t be a single node. It has to be a collection of so many nodes with varying degrees of relevance from which the users may choose.

6706

Hemamalini. S et al

References [1] [2] [3] [4] [5] [6]

[7] [8] [9] [10]

[11] [12]

[13]

[14] [15] [16]

[17]

[18]

http://googleblog.blogspot.in/2012/05/introducing-knowledge-graph-thingsnot.html http://searchenginewatch.com/article/2285277/How-the-Semantic-WebChanges-Everything-for-Search http://www.stateofsearch.com/search-in-the-knowledge-graph-era/ John, Tony (March 15, 2012). "What is Semantic Search?".Techulator. Retrieved July 13, 2012. http://insidesearch.blogspot.ch/2012/12/get-smarter-answers-fromknowledge_4.html S. Dill, N. Eiron, D. Gibson, D. Gruhl, and R. Guha. Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In WWW'03. ACM, 2003. ChaoyangJi and Lingli Zhang,Design of Semantic Web Retrieval Model Based on Ontology-Advances in CSIE, Vol. 2, AISC 169, pp. 703–708. Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces, Dan Yang,De-Rong Shen Jorge Gracia, Eduardo Mena- Multiontology Semantic Disambiguation in Unstructured Web Contexts A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web Inf. Syst., 2005. http://searchenginewatch.com/article/2285277/How-the-Semantic-WebChanges-Everything-for-Search J. Hassell, B. Aleman-Meza, and I. B. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In International Semantic Web Conference, 2006. S. Dill, N. Eiron, D. Gibson, D. Gruhl, and R. Guha. Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In WWW'03. ACM, 2003. Guha, R.; McCool, Rob; Miller, Eric (May 24, 2003). "Semantic Search". WWW2003. Retrieved July 13, 2012 http://justinbriggs.org/entity-search-results-the-on-going-evolution-of-search Tao Jiang, Ah-Hwee Tan, Mining Generalized Associations of Semantic Relations from Textual Web Content, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 2, FEBRUARY 2007 Ching-man Au Yeung, Nicholas Gibbins, and Nigel Shadbolt. Web Search Disambiguation by Collaborative Tagging:M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E, 69:066133, 2004. Elaine Rich, Kevin Knight, “Artificial Intelligence”, TMH

Suggest Documents